ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-01-19 11:45:10 +08:00

Files

MkDev11 678a4f959c Fix: skip internal bookmark references in DOCX parsing (#12604 ) (#12611 )

### What problem does this PR solve?

Fixes #12604 - DOCX files containing hyperlinks to internal bookmarks
(e.g., `#_文档目录`) cause a `KeyError` during parsing:

```
KeyError: "There is no item named 'word/#_文档目录' in the archive"
```

This happens because python-docx incorrectly tries to read internal
bookmark references as files from the ZIP archive. Internal bookmarks
are relationship targets starting with `#` and are not actual files.

This PR extends the existing `load_from_xml_v2` workaround (which
already handles `NULL` targets) to also skip relationship targets
starting with `#`.

Related upstream issue:
https://github.com/python-openxml/python-docx/issues/902

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---
Contribution by Gittensor, see my contribution statistics at
https://gittensor.io/miners/details?githubId=94194147

2026-01-14 19:08:46 +08:00

advanced_rag

Feat: support tree structured deep-research policy. (#12559 )

2026-01-13 09:41:35 +08:00

app

Fix: skip internal bookmark references in DOCX parsing (#12604 ) (#12611 )

2026-01-14 19:08:46 +08:00

flow

refactor: remove debug print statements (#12598 )

2026-01-14 10:05:34 +08:00

llm

Fix enable_thinking parameter for Qwen3 models (#12603 )