Commit Graph

8 Commits

Author SHA1 Message Date
387b0b27c4 feat(parser): support external Docling server via DOCLING_SERVER_URL (#13527)
### What problem does this PR solve?

This PR adds support for parsing PDFs through an external Docling
server, so RAGFlow can connect to remote `docling serve` deployments
instead of relying only on local in-process Docling.

It addresses the feature request in
[#13426](https://github.com/infiniflow/ragflow/issues/13426) and aligns
with the external-server usage pattern already used by MinerU.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

### What is changed?

- Add external Docling server support in `DoclingParser`:
  - Use `DOCLING_SERVER_URL` to enable remote parsing mode.
- Try `POST /v1/convert/source` first, and fallback to
`/v1alpha/convert/source`.
- Keep existing local Docling behavior when `DOCLING_SERVER_URL` is not
set.
- Wire Docling env settings into parser invocation paths:
  - `rag/app/naive.py`
  - `rag/flow/parser/parser.py`
- Add Docling env hints in constants and update docs:
  - `docs/guides/dataset/select_pdf_parser.md`
  - `docs/guides/agent/agent_component_reference/parser.md`
  - `docs/faq.mdx`

### Why this approach?

This keeps the change focused on one issue and one capability (external
Docling connectivity), without introducing unrelated provider-model
plumbing.

### Validation

- Static checks:
  - `python -m py_compile` on changed Python files
  - `python -m ruff check` on changed Python files
- Functional checks:
  - Remote v1 endpoint path works
  - v1alpha fallback works
  - Local Docling path remains available when server URL is unset

### Related links

- Feature request: [Support external Docling server (issue
#13426)](https://github.com/infiniflow/ragflow/issues/13426)
- Compare view for this branch:
[main...feat/docling-server](https://github.com/infiniflow/ragflow/compare/main...spider-yamet:ragflow:feat/docling-server?expand=1)

##### Fixes [#13426](https://github.com/infiniflow/ragflow/issues/13426)
2026-03-12 17:09:03 +08:00
9577753c10 Refactor: improve the logic about docling parser extract box (#13215)
### What problem does this PR solve?
 improve the logic about docling parser extract box

### Type of change
- [x] Refactoring
2026-02-28 10:05:24 +08:00
4e48aba5c4 fix: update DoclingParser return type hint (#13243)
### What problem does this PR solve?

The _transfer_to_sections method was throwing a type hint violation
because it occasionally returns 3-item tuples instead of 2. Adjusted to
list[tuple[str, ...]] to prevent runtime crashes.

Error: 

20:53:21 Page(1~10): [ERROR]Internal server error while chunking:
Method
deepdoc.parser.docling_parser.DoclingParser._transfer_to_sections()
return [(1. JIRA Nasıl Kullanılır?, text,
@@1\t70.8\t194.9\t70.9\t85.5##), (1.1. Proje O...##)] violates type
hint list[tuple[str, str]], as list index
15 item tuple tuple (Gelen ekran
üzerinden alanları isterlerine göre doldurduğunuz taktirde Create
düğmesi i...##) length 3 != 2.
20:53:21 [ERROR][Exception]: Method
deepdoc.parser.docling_parser.DoclingParser._transfer_to_sections()
return [('1. JIRA Nasıl Kullanılır?', 'text',
'@@1\t70.8\t194.9\t70.9\t85.5##'), ('1.1. Proje O...##')] violates
type hint list[tuple[str, str]], as list index
15 item tuple tuple ('Gelen ekran
üzerinden alanları isterlerine göre doldurduğunuz taktirde Create
düğmesi i...##') length 3 != 2.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Enes Delibalta <enes.delibalta@pentanom.com>
2026-02-27 20:13:50 +08:00
0b5d1ebefa refactor: docling parser will close bytes io (#12280)
### What problem does this PR solve?

docling parser will close bytes io

### Type of change

- [x] Refactoring
2025-12-29 13:33:27 +08:00
d3d2ccc76c Feat: add more chunking method (#11413)
### What problem does this PR solve?

Feat: add more chunking method #11311

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-11-20 19:07:17 +08:00
fea157ba08 Fix: manual parser with mineru (#11336)
### What problem does this PR solve?

Fix: manual parser with mineru #11320
Fix: missing parameter in mineru #11334
Fix: add outlines parameter for pdf parsers

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-18 15:22:52 +08:00
8ef2f79d0a Fix:reset the agent component’s output (#11222)
### What problem does this PR solve?

change:
“After each dialogue turn, the agent component’s output is not reset.”

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2025-11-13 09:49:12 +08:00
0ff2042fc1 Feat: add Docling parser (#10759)
### What problem does this PR solve?
issue:
#3945
change:
add Docling parser

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-10-23 19:44:25 +08:00