ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-05-21 00:36:43 +08:00

Author	SHA1	Message	Date
writinwaters	d4147efc66	Docs: (#14492 ) ### What problem does this PR solve? Added v0.25.1 release notes ### Type of change - [x] Documentation Update	2026-04-29 20:29:58 +08:00
writinwaters	9280c64518	Docs: Updated Title chunker references (#14483 ) ### What problem does this PR solve? Updated Title chunker references ### Type of change - [x] Documentation Update	2026-04-29 19:37:24 +08:00
Wang Qi	b684c89950	Add backward compat APIs (#14427 ) ### What problem does this PR solve? Add backward compat APIs: ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-29 15:15:49 +08:00
writinwaters	0cf105da8d	Doc: Added a database schema and migration guide. (#14404 ) ### What problem does this PR solve? Added a database schema and migration guide. ### Type of change - [x] Documentation Update	2026-04-28 09:54:33 +08:00
buua436	0b46ab07c5	Refa: restore openai-compatible chat completions api (#14380 ) ### What problem does this PR solve? restore openai-compatible chat completions api ### Type of change - [x] Refactoring	2026-04-27 14:02:19 +08:00
buua436	a9e5724b46	Refa: unify document create flows under REST documents API (#14345 ) ### What problem does this PR solve? unify document create flows under REST documents API ### Type of change - [x] Refactoring	2026-04-27 10:18:16 +08:00
wdeveloper16	78188ce9e9	Feat: add OpenDataLoader PDF parser backend (#14058 ) (#14097 ) ### What problem does this PR solve? Closes #14058. RAGFlow supports multiple PDF parsing backends (DeepDOC, MinerU, Docling, TCADP, PaddleOCR). This PR adds OpenDataLoader ([opendataloader-project/opendataloader-pdf](https://github.com/opendataloader-project/opendataloader-pdf)) as a new optional backend, giving users a deterministic, local-first alternative with competitive table extraction accuracy. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update --- ### Changes #### Backend - `deepdoc/parser/opendataloader_parser.py` — new `OpenDataLoaderParser` class inheriting `RAGFlowPdfParser`. Implements `check_installation()` (guards Python package + Java 11+ runtime), `parse_pdf()` with JSON-first extraction (heading/paragraph/table/list/image/formula) and Markdown fallback, position-tag generation compatible with the shared `@@page\tx0\tx1\ty0\ty1##` format, and temp-dir lifecycle with cleanup. - `rag/app/naive.py` — new `by_opendataloader()` wrapper, registered in `PARSERS` dict, added to `chunk_token_num=0` override list. - `rag/flow/parser/parser.py` — `"opendataloader"` branch in the pipeline PDF handler + check validation list. #### Infrastructure - `docker/entrypoint.sh` — `ensure_opendataloader()` function: opt-in via `USE_OPENDATALOADER=true`, skips gracefully if Java is not on PATH. #### Frontend - `web/src/components/layout-recognize-form-field.tsx` — `OpenDataLoader` added to `ParseDocumentType` enum and parser dropdown. Cascades automatically to the pipeline editor's Parser component. #### Docs - `docs/guides/dataset/select_pdf_parser.md` — added OpenDataLoader entry and full env-var reference. --- ### Environment variables \| Variable \| Default \| Description \| \|---\|---\|---\| \| `USE_OPENDATALOADER` \| `false` \| Set `true` to install `opendataloader-pdf` on container startup \| \| `OPENDATALOADER_VERSION` \| latest \| Pin the PyPI release (e.g. `==2.2.1`) \| \| `OPENDATALOADER_HYBRID` \| _(unset)_ \| Enable hybrid AI mode (e.g. `docling-fast`) \| \| `OPENDATALOADER_IMAGE_OUTPUT` \| _(unset)_ \| `off` / `embedded` / `external` \| \| `OPENDATALOADER_OUTPUT_DIR` \| _(tmp)_ \| Persistent output dir; temp dir used + cleaned if unset \| \| `OPENDATALOADER_DELETE_OUTPUT` \| `1` \| `0` to retain intermediate files for debugging \| \| `OPENDATALOADER_SANITIZE` \| _(unset)_ \| `1` to filter prompt-injection patterns from output \| --- ### Dependencies - Runtime: `opendataloader-pdf` (PyPI, Apache 2.0) — opt-in, not added to `pyproject.toml` core deps. Installed by `ensure_opendataloader()` at container startup when `USE_OPENDATALOADER=true`. - System: Java 11+ on PATH (JVM is the underlying engine). The installer skips with a warning if `java` is not found. --- ### How to test Standalone parser: ```bash source .venv/bin/activate uv pip install opendataloader-pdf python3 -c " import sys; sys.path.insert(0, '.') from deepdoc.parser.opendataloader_parser import OpenDataLoaderParser p = OpenDataLoaderParser() print('available:', p.check_installation()) s, t = p.parse_pdf('path/to/test.pdf', parse_method='pipeline') print(f'sections={len(s)} tables={len(t)}') " ``` ### Benchmark vs Docling ``` file parser secs sections tables ---------------------------------------------------------------------- text-heavy.pdf docling 45.29 148 10 text-heavy.pdf opendataloader 3.14 559 0 table-heavy.pdf docling 7.05 76 3 table-heavy.pdf opendataloader 3.71 90 0 complex.pdf docling 42.67 114 8 complex.pdf opendataloader 3.51 180 0 ```	2026-04-25 00:33:02 +08:00
writinwaters	e5cfe7fb8f	Doc: Updated a 0.25-specific faq (#14365 ) ### What problem does this PR solve? Updated a 0.25 faq. ### Type of change - [x] Documentation Update	2026-04-24 20:57:32 +08:00
Wang Qi	7fb6a12067	Update API document (#14364 ) ### What problem does this PR solve? Update API document ### Type of change - [ ] Documentation Update	2026-04-24 20:36:47 +08:00
Mukunda Rao Katta	8a2f63e77d	docs: fix API key guide typo (#14352 ) Fixes a small typo in the RAGFlow API key guide: `This documents provides` -> `This document provides`.	2026-04-24 16:59:25 +08:00
Magicbook1108	c74aece63c	Feat: Agent api (#14157 ) ### What problem does this PR solve? 1. List agents Prev API: - `/v1/canvas/list GET` - `/api/v1/agents GET` Current API: `/api/v2/agents GET` 2. Get canvas template Prev API: `/v1/canvas/templates GET` Current API: `/api/v2/agents/templates GET` 3. Delete an agent Prev API: - `/v1/canvas/rm POST` - `/api/v1/agents/<agent_id> DELETE` Current API: `/api/v2/agents/<agent_id> DELETE` 4. Update an agent Prev API: - `/api/v1/agents/<agent_id> PUT` - `/v1/canvas/setting POST ` Current API: `/api/v2/agents/<agent_id> PATCH` 5. Create an agent Prev API: - `/v1/canvas/set POST` - `/api/v1/agents POST` Current API: `/api/v2/agents POST` 6. Get an agent Prev API: - `/v1/canvas/get/<canvas_id> GET ` Current API: `/api/v2/agents/<agent_id> GET` 7. Reset an agent Prev API: - `/v1/canvas/reset POST` Current API: `/api/v2/agents/<agent_id>/reset POST` 8. Upload a file to an agent Prev API: - `/v1/canvas/upload/<canvas_id> POST` Current API: `/api/v2/agents/<agent_id>/upload POST` 9. Input form Prev API: - `/v1/canvas/input_form GET` Current API: `/api/v2/agents/<agent_id>/components/<component_id>/input-form GET` 10. Debug an agent Prev API: - `/v1/canvas/debug POST` Current API: `/api/v2/agents/<agent_id>/components/<component_id>/debug POST` 11. Trace an agent Prev API: - `/v1/canvas/trace GET` Current API: `/api/v2/agents/<agent_id>/logs/<message_id> GET` 12. Get an agent version list Prev API: - `/v1/canvas/getlistversion/<canvas_id>` Current API: `/api/v2/agents/<agent_id>/versions GET` 13. Get a version of agent Prev API: - `/v1/canvas/getversion/<version_id>` Current API: `/api/v2/agents/<agent_id>/versions/<version_id> GET` 14. Test db connection Prev API: - `/v1/canvas/test_db_connect POST` Current API: `/api/v2/agents/test_db_connection` 15. Rerun the agent Prev API: - `/v1/canvas/rerun POST` Current API: `/api/v2/agents/rerun POST` 16. Get prompts Prev API: - `/v1/canvas/prompts GET` Current API: `/api/v2/agents/prompts GET` ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: chanx <1243304602@qq.com>	2026-04-24 10:02:22 +08:00
buua436	7817b0d779	Refa: migrate chunk APIs to RESTful routes (#14291 ) ### What problem does this PR solve? migrate chunk APIs to RESTful routes ### Type of change - [x] Refactoring	2026-04-23 14:17:23 +08:00
Wang Qi	01753b8f31	Refactor: API connectors (#14228 ) ### What problem does this PR solve? Refactor /api/v1/connectors to be more RESTful. ### Type of change - [x] Refactoring	2026-04-22 20:42:41 +08:00
writinwaters	1434f8ade8	Doc: two PDF parser optimizers are supported as of v0.25.0. (#14261 ) ### What problem does this PR solve? Multi-column layout detection is supported in v0.25.0 ### Type of change - [x] Documentation Update	2026-04-22 20:00:06 +08:00
ucloudnb666	f853a39b40	feat: Add Astraflow provider support (global + China endpoints) (#14270 ) ## Add Astraflow Provider Support This PR integrates [Astraflow](https://astraflow.ucloud.cn/) (by UCloud / 优刻得) as a new AI model provider in RAGFlow, with support for both global and China endpoints. ### About Astraflow Astraflow is an OpenAI-compatible AI model aggregation platform supporting 200+ models from major providers including DeepSeek, Qwen, GPT, Claude, Gemini, Llama, Mistral, and more. \| Variant \| Factory Name \| Endpoint \| Env Var \| \|---------\|-------------\|----------\|---------\| \| Global \| `Astraflow` \| `https://api-us-ca.umodelverse.ai/v1` \| `ASTRAFLOW_API_KEY` \| \| China \| `Astraflow-CN` \| `https://api.modelverse.cn/v1` \| `ASTRAFLOW_CN_API_KEY` \| - API key signup: https://astraflow.ucloud.cn/ --- ### Files Changed \| File \| Change \| \|------\|--------\| \| `rag/llm/__init__.py` \| Register `Astraflow` and `Astraflow-CN` in `SupportedLiteLLMProvider` enum, `FACTORY_DEFAULT_BASE_URL`, and `LITELLM_PROVIDER_PREFIX` \| \| `rag/llm/chat_model.py` \| Add `AstraflowChat` and `AstraflowCNChat` (OpenAI-compatible `Base` subclass) \| \| `rag/llm/embedding_model.py` \| Add `AstraflowEmbed` and `AstraflowCNEmbed` (subclasses of `OpenAIEmbed`) \| \| `rag/llm/rerank_model.py` \| Add `AstraflowRerank` and `AstraflowCNRerank` (subclasses of `OpenAI_APIRerank`) \| \| `rag/llm/cv_model.py` \| Add `AstraflowCV` and `AstraflowCNCV` (subclasses of `GptV4`) \| \| `rag/llm/tts_model.py` \| Add `AstraflowTTS` and `AstraflowCNTTS` (subclasses of `OpenAITTS`) \| \| `rag/llm/sequence2txt_model.py` \| Add `AstraflowSeq2txt` and `AstraflowCNSeq2txt` (subclasses of `GPTSeq2txt`) \| \| `conf/llm_factories.json` \| Register `Astraflow` and `Astraflow-CN` factories with a curated list of popular models \| --- ### Supported Model Types - ✅ Chat / LLM — DeepSeek-V3/R1, Qwen3, GPT-4o/4.1, Claude 3.5/3.7, Gemini 2.0/2.5 Flash, Llama 3.3/4, Mistral, and 200+ more - ✅ Text Embedding — text-embedding-3-small/large - ✅ Image / Vision (IMAGE2TEXT) — GPT-4o, GPT-4.1, Claude, Gemini, Llama-4, etc. - ✅ Text Re-Rank - ✅ TTS — tts-1 - ✅ Speech-to-Text (SPEECH2TEXT) — whisper-1 ### Implementation Notes - Uses the `openai/` LiteLLM prefix — consistent with other OpenAI-compatible aggregation platforms (SILICONFLOW, DeerAPI, CometAPI, OpenRouter, n1n, Avian, etc.) - `Astraflow` (global, rank 250) and `Astraflow-CN` (China, rank 249) are separate factory entries, allowing users to choose the optimal endpoint based on their region. - All model classes cleanly subclass existing base classes (`Base`, `OpenAIEmbed`, `OpenAI_APIRerank`, `GptV4`, `OpenAITTS`, `GPTSeq2txt`) with no custom logic needed — the provider is fully OpenAI-compatible. --------- Co-authored-by: user <user@xzaaaMacBook-Air.local>	2026-04-22 15:38:34 +08:00
Lynn	3ce1e44b2d	Fix: document and sdk support of searching message with user_id (#14283 ) ### What problem does this PR solve? Add document of search message with user_id, add sdk support. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update	2026-04-22 14:43:38 +08:00
writinwaters	69d8aed792	Doc: v0.25.0 release notes. (#14284 ) ### What problem does this PR solve? Added v0.25.0 release notes ### Type of change - [x] Documentation Update	2026-04-22 11:48:28 +08:00
buua436	6baf74afc1	Refa: align chat and search restful APIs (#14229 ) ### What problem does this PR solve? Refactor /api/v1/chats to be more RESTful. ### Type of change - [x] Refactoring --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-04-22 10:49:11 +08:00
Jin Hai	bfac0195df	Update release note (#14275 ) ### What problem does this PR solve? As title. ### Type of change - [x] Documentation Update Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-04-22 10:47:43 +08:00
writinwaters	779deadf76	Docs: User-level memory is supported in v0.25.0 (#14259 ) ### What problem does this PR solve? v0.25.0 supports linking User ID with conversations. ### Type of change - [x] Documentation Update	2026-04-21 18:59:00 +08:00
hyl64	b439f8a74d	docs: add DeepWiki developer guide page (#14244 ) Closes #14165 Add a short documentation page under Developer Guides introducing DeepWiki as a resource for developers doing secondary development or exploring RAGFlow's codebase internals. --------- Co-authored-by: hyl64 <hyl64@users.noreply.github.com>	2026-04-21 18:57:20 +08:00
Liu An	a33d0737cd	Docs: Update version references to v0.25.0 in READMEs and docs (#14257 ) ### What problem does this PR solve? - Update version tags in README files (including translations) from v0.24.0 to v0.25.0 - Modify Docker image references and documentation to reflect new version - Update version badges and image descriptions - Maintain consistency across all language variants of README files ### Type of change - [x] Documentation Update	2026-04-21 17:26:50 +08:00
writinwaters	0db2d544a9	Docs: 0.25.0 agent apps can be published. (#14252 ) ### What problem does this PR solve? Agent apps can be published. ### Type of change - [x] Documentation Update	2026-04-21 16:56:11 +08:00
writinwaters	8a874c7a09	Doc: Added Ingetrating Notion connector (#14163 ) ### What problem does this PR solve? Added How to integrate Notion to RAGFlow. ### Type of change - [x] Documentation Update	2026-04-16 20:06:02 +08:00
writinwaters	2520065c5a	Doc: Added Integrate Confluence (#14131 ) ### What problem does this PR solve? Added a guide on integrating Confluence as connector. ### Type of change - [x] Documentation Update	2026-04-15 18:38:36 +08:00
Jin Hai	a0a4029f01	Fix document (#14118 ) ### What problem does this PR solve? As title ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-04-15 11:35:16 +08:00
writinwaters	1c0c1f27ef	Doc: Updated FAQ (#14108 ) ### What problem does this PR solve? Updated frequently asked questions. ### Type of change - [x] Documentation Update	2026-04-14 18:42:16 +08:00
Magicbook1108	1376c004a9	Fix: update docs generator (#14070 ) ### What problem does this PR solve? Refactor: update docs generator ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) 1. Support multiple document generator components and correctly display messages in the message component. The document generator will not overwrite other messages. <img width="700" alt="Screenshot from 2026-04-13 13-56-17" src="https://github.com/user-attachments/assets/3f3e06e8-33ce-4df1-8b05-510c86af70a4" /> 2. Support Chinese content and ensure correct Markdown rendering in PDF and DOCX <img width="700" alt="image" src="https://github.com/user-attachments/assets/69bf1f7b-261d-48e5-a9f3-8e94462b90ed" /> 3. Simplify configuration page and support more output format <img height="700" alt="image" src="https://github.com/user-attachments/assets/8647374c-c055-4daa-ad71-cd9052eb138e" /> 4. Hide download from other components except for message <img width="700" alt="image" src="https://github.com/user-attachments/assets/a723dfcb-b60d-4eb5-b2f6-d41ca5955eb4" /> <img width="700" alt="image" src="https://github.com/user-attachments/assets/a8762ac4-807b-4f0b-9287-65f82f7c9c98" /> 5. Sanitize filename <img width="700" alt="image" src="https://github.com/user-attachments/assets/df49509f-37c0-40f9-b03d-bd6ce7fdefa8" /> 6. And more changes on usability	2026-04-14 15:24:43 +08:00
writinwaters	ef07faea80	Doc: Updated frequently asked questions and answers. (#14085 ) ### What problem does this PR solve? Updated frequently asked questions. ### Type of change - [x] Documentation Update	2026-04-13 20:26:16 +08:00
writinwaters	52442c8eb5	Docs: Added a guide on adding Github repo as data source (#14048 ) ### What problem does this PR solve? Added a guide on adding Github repo as data source ### Type of change - [x] Documentation Update	2026-04-10 21:32:26 +08:00
Jin Hai	fa75aee3b9	Refactor system API (#13958 ) ### What problem does this PR solve? - ping - token - log level ### Type of change - [x] Refactoring <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Refactor * System endpoints consolidated under /api/v1/system: ping, health check, and token management moved to the centralized API surface. * Token management unified at /api/v1/system/tokens with list/create/delete behavior. * Documentation * API reference updated to reflect the new /api/v1/system paths. * Tests * Client fixtures and test utilities updated to use /api/v1/system/tokens; one unit test for health/oceanbase status removed. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-04-08 15:26:18 +08:00
dataCenter430	62a1333cf2	Feat: expose parent-child chunking configuration via HTTP API and Python SDK (#13940 ) … ### What problem does this PR solve? Closes #13857 Parent-child chunking was introduced in v0.23.0 but is only configurable through the web UI. Users managing datasets programmatically cannot enable it via the HTTP API or Python SDK because `ParserConfig` uses `extra="forbid"`, rejecting the `children_delimiter` field at validation. ### What does this PR change? Adds a `parent_child` nested config to `ParserConfig`, following the same pattern as `raptor` and `graphrag`: ```json "parser_config": { "parent_child": { "use_parent_child": true, "children_delimiter": "\n" } } ``` - api/utils/validation_utils.py — new ParentChildConfig model, added to ParserConfig - api/utils/api_utils.py — naive defaults + flatten to children_delimiter for the execution layer - api/apps/services/dataset_api_service.py — flatten on the update path - test/testcases/configs.py — updated DEFAULT_PARSER_CONFIG - test/testcases/test_http_api/test_dataset_management/test_create_dataset.py — 4 valid + 2 invalid test cases No changes to the execution layer (rag/app/naive.py, rag/nlp/search.py). Existing UI flow via ext is unaffected. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Added parent-child chunking configuration for dataset creation and updates with new `use_parent_child` toggle and customizable `children_delimiter` setting to specify how parent chunks are split into child chunks. * Documentation * Updated HTTP and Python API references with parent-child chunking configuration details and examples. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-04-08 11:36:57 +08:00
auyua9	fa08fa2a17	docs: fix broken internal links in guides (#13935 ) ### What problem does this PR solve? This fixes two broken internal documentation links in the guides: - `docs/develop/mcp/launch_mcp_server.md` linked `./acquire_ragflow_api_key.md`, but the target page lives one level up as `../acquire_ragflow_api_key.md`. - `docs/guides/dataset/run_retrieval_test.md` linked `./construct_knowledge_graph.md`, but the actual page lives under `./advanced/construct_knowledge_graph.md`. These broken links make it harder to follow the MCP and retrieval-test docs from the local docs tree. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [x] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe):	2026-04-07 14:01:12 +08:00
writinwaters	6263857c1e	Agent templates regrouped and renamed (#13873 ) ### What problem does this PR solve? Regrouped and renamed agent templates to increase user engagement. ### Type of change - [x] Refactoring	2026-04-03 13:43:25 +08:00
Yongteng Lei	b7daf6285b	Refa: Chat conversations /convsersation API to RESTFul (#13893 ) ### What problem does this PR solve? Chat conversations /convsersation API to RESTFul. ### Type of change - [x] Refactoring	2026-04-02 20:49:23 +08:00
writinwaters	3b96cedece	Docs: Updated chat-specific APIs (#13888 ) ### What problem does this PR solve? Chat-specific API descriptions updated. ### Type of change - [x] Documentation Update	2026-04-02 14:15:09 +08:00
Yongteng Lei	b622c47ed6	Refa: Chats /chat API to RESTFul (#13881 ) ### What problem does this PR solve? Refactor Chats /chat API to RESTFul. ### Type of change - [x] Refactoring	2026-04-01 20:10:37 +08:00
Liu An	b1d28b5898	Revert "Refa: Chats /chat API to RESTFul (#13871 )" (#13877 ) ### What problem does this PR solve? This reverts commit `1a608ac411`. ### Type of change - [x] Other (please describe):	2026-04-01 11:05:29 +08:00
Yongteng Lei	1a608ac411	Refa: Chats /chat API to RESTFul (#13871 ) ### What problem does this PR solve? Chats /chat API to RESTFul. ### Type of change - [x] Refactoring	2026-04-01 10:50:22 +08:00
writinwaters	db5ab7bbe8	Docs: Image2text is supported by GPUStack. (#13856 ) ### What problem does this PR solve? Image2text is supported by GPUStack. #9515 ### Type of change - [x] Documentation Update	2026-03-30 20:39:02 +08:00
Heyang Wang	641b319647	feat: support reading tags via API (#12891 ) (#13732 ) ### What problem does this PR solve? Enable reading Tag Set tags via API (expose tag_kwd field). The result of the queried list chunks is as shown below: <img width="1422" height="818" alt="image" src="https://github.com/user-attachments/assets/abd1960a-fe34-489e-9d72-525f8e574938" /> ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: heyang.why <heyang.why@alibaba-inc.com>	2026-03-29 20:17:01 +08:00
Yongteng Lei	d19ca71b43	Refa: Searches /search API to RESTFul (#13770 ) ### What problem does this PR solve? Searches /search API to RESTFul ### Type of change - [x] Documentation Update - [x] Refactoring Co-authored-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-03-26 01:07:41 +08:00
Yongteng Lei	3d10e2075c	Refa: files /file API to RESTFul style (#13741 ) ### What problem does this PR solve? Files /file API to RESTFul style. ### Type of change - [x] Documentation Update - [x] Refactoring --------- Co-authored-by: writinwaters <cai.keith@gmail.com> Co-authored-by: Liu An <asiro@qq.com>	2026-03-24 19:24:41 +08:00
tmimmanuel	13d0df1562	feat: add Perplexity contextualized embeddings API as a new model provider (#13709 ) ### What problem does this PR solve? Adds Perplexity contextualized embeddings API as a new model provider, as requested in #13610. - `PerplexityEmbed` provider in `rag/llm/embedding_model.py` supporting both standard (`/v1/embeddings`) and contextualized (`/v1/contextualizedembeddings`) endpoints - All 4 Perplexity embedding models registered in `conf/llm_factories.json`: `pplx-embed-v1-0.6b`, `pplx-embed-v1-4b`, `pplx-embed-context-v1-0.6b`, `pplx-embed-context-v1-4b` - Frontend entries (enum, icon mapping, API key URL) in `web/src/constants/llm.ts` - Updated `docs/guides/models/supported_models.mdx` - 22 unit tests in `test/unit_test/rag/llm/test_perplexity_embed.py` Perplexity's API returns `base64_int8` encoded embeddings (not OpenAI-compatible), so this uses a custom `requests`-based implementation. Contextualized vs standard model is auto-detected from the model name. Closes #13610 ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update	2026-03-20 10:47:48 +08:00
writinwaters	bbd0cd80e4	Docs: Updated Add Google Drive as data source (#13684 ) ### What problem does this PR solve? Gave an editorial pass to the Add Google Drive document. ### Type of change - [x] Documentation Update	2026-03-18 21:05:25 +08:00
Yongteng Lei	ca6c3218c3	Refa: follow-up expose agent structured outputs in non-stream completions (#13524 ) ### What problem does this PR solve? Follow-up expose agent structured outputs in non-stream completions #13389. ### Type of change - [x] Documentation Update - [x] Refactoring --------- Co-authored-by: writinwaters <cai.keith@gmail.com>	2026-03-17 17:11:27 +08:00
Yongteng Lei	af7e24ba8c	Feat: add_chunk supports add image (#13629 ) ### What problem does this PR solve? Add_chunk supports add image. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-03-16 20:15:36 +08:00
Yingfeng	b686a60713	Switch from demo.ragflow.io to cloud.ragflow.io (#13624 ) ### What problem does this PR solve? Switch from demo.ragflow.io to cloud.ragflow.io ### Type of change - [x] Documentation Update	2026-03-16 14:44:39 +08:00
Ray Zhang	375f62a6c3	docs(migration): add project name (-p) usage to backup & migration guide (#13565 ) ## Summary - Add documentation for the `-p project_name` flag in the migration script, covering all steps (stop, backup, restore, start) - Add a note explaining how Docker volume name prefixes relate to the Compose project name - Update `docker-compose` to `docker compose` (Compose V2 syntax) for consistency - Fix `sh` to `bash` to match the script's shebang line This is the documentation follow-up to #12187 which added `-p` project name support to `docker/migration.sh`. ## Test plan - [ ] Verify the documentation renders correctly on the docs site - [ ] Confirm all example commands are accurate against the current `migration.sh`	2026-03-12 19:01:25 +08:00
NeedmeFordev	387b0b27c4	feat(parser): support external Docling server via DOCLING_SERVER_URL (#13527 ) ### What problem does this PR solve? This PR adds support for parsing PDFs through an external Docling server, so RAGFlow can connect to remote `docling serve` deployments instead of relying only on local in-process Docling. It addresses the feature request in [#13426](https://github.com/infiniflow/ragflow/issues/13426) and aligns with the external-server usage pattern already used by MinerU. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### What is changed? - Add external Docling server support in `DoclingParser`: - Use `DOCLING_SERVER_URL` to enable remote parsing mode. - Try `POST /v1/convert/source` first, and fallback to `/v1alpha/convert/source`. - Keep existing local Docling behavior when `DOCLING_SERVER_URL` is not set. - Wire Docling env settings into parser invocation paths: - `rag/app/naive.py` - `rag/flow/parser/parser.py` - Add Docling env hints in constants and update docs: - `docs/guides/dataset/select_pdf_parser.md` - `docs/guides/agent/agent_component_reference/parser.md` - `docs/faq.mdx` ### Why this approach? This keeps the change focused on one issue and one capability (external Docling connectivity), without introducing unrelated provider-model plumbing. ### Validation - Static checks: - `python -m py_compile` on changed Python files - `python -m ruff check` on changed Python files - Functional checks: - Remote v1 endpoint path works - v1alpha fallback works - Local Docling path remains available when server URL is unset ### Related links - Feature request: [Support external Docling server (issue #13426)](https://github.com/infiniflow/ragflow/issues/13426) - Compare view for this branch: [main...feat/docling-server](https://github.com/infiniflow/ragflow/compare/main...spider-yamet:ragflow:feat/docling-server?expand=1) ##### Fixes [#13426](https://github.com/infiniflow/ragflow/issues/13426)	2026-03-12 17:09:03 +08:00

1 2 3 4 5 ...

666 Commits