ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-05-27 03:05:59 +08:00

Author	SHA1	Message	Date
Yongteng Lei	3c80a0ae09	Fix: support vLLM's new reasoning field (#13493 ) ### What problem does this PR solve? Support vLLM's new reasoning field ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-10 21:13:14 +08:00
yzy	07c9cf6cbe	Fix: return structured JSON output for non-streaming agent API (#13389 ) ### What problem does this PR solve? Previously, when an Agent component was configured with structured output, the non-streaming /agents/{agent_id}/completions API never returned the structured field in its response. The root cause: the non-streaming code path only collected message events to build full_content, then returned the workflow_finished payload — which only contains the output of the last component in the execution path (typically a Message component). Any structured output set by upstream components (e.g., Agent or LLM) was silently discarded. This PR fixes the non-streaming handler to iterate node_finished events and collect structured output from intermediate components. If any component produced a non-empty structured value, it is included in the final response under data.structured. The streaming path is unaffected, as it already exposes node_finished events to the caller. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-10 19:22:04 +08:00
Heyang Wang	08f83ff331	Feat: Support get aggregated parsing status to dataset via the API (#13481 ) ### What problem does this PR solve? Support getting aggregated parsing status to dataset via the API Issue: #12810 ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: heyang.why <heyang.why@alibaba-inc.com>	2026-03-10 18:05:45 +08:00
Liu An	68a623154a	Fix: bin directory cannot be copied to docker image introduced by #13444 (#13502 ) ### What problem does this PR solve? bin directory cannot be copied to docker image introduced by ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-10 17:31:20 +08:00
chanx	f14b53c764	feat(admin): Implemented default administrator initialization and login functionality. (#13504 ) ### What problem does this PR solve? feat(admin): Implemented default administrator initialization and login functionality. Added support for default administrator configuration, including super user nickname, email, and password. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-10 17:30:21 +08:00
balibabu	81461b4505	Fix: The number of deleted session prompts is displayed incorrectly. #13499 (#13500 ) ### What problem does this PR solve? Fix: The number of deleted session prompts is displayed incorrectly. #13499 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-10 16:01:31 +08:00
Magicbook1108	675810e0cf	Refact: optimize confluence performance (#13497 ) ### What problem does this PR solve? Refact: optimize confluence performance #13494 ### Type of change - [x] Refactoring	2026-03-10 15:02:24 +08:00
Alexander Vostres	9ba43ae4ee	Fix "Coordinate lower is less than upper" error with MinerU (#13483 ) ### What problem does this PR solve? Fixes #6004 #7142 #11959 Unlike #9207 we actually normalize the coordinates here ### Type of change - [X] Bug Fix (non-breaking change which fixes an issue)	2026-03-10 15:02:01 +08:00
balibabu	aaf900cf16	Feat: Display release status in agent version history. (#13479 ) ### What problem does this PR solve? Feat: Display release status in agent version history. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: balibabu <assassin_cike@163.com>	2026-03-10 14:25:27 +08:00
Idriss Sbaaoui	249b78561b	Fix missmatch docnm_kwd in raptor chunks (#13451 ) ### What problem does this PR solve? issue #13393 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-10 14:24:33 +08:00
qinling0210	185ab0d4ef	Fix delete_document_metadata (#13496 ) ### What problem does this PR solve? Avoid getting doc in function delete_document_metadata as the doc might have been removed. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-10 13:44:24 +08:00
Magicbook1108	7143954b48	Fix: chats_openai in none stream condition (#13495 ) ### What problem does this PR solve? Fix: chats_openai in none stream condition #13453 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-10 13:44:17 +08:00
qinling0210	7c92f51133	Fix retrieval function when metadata_condtion is specified in retrieval API (#13473 ) ### What problem does this PR solve? Fix https://github.com/infiniflow/ragflow/issues/13388 The following command returns empty when there is doc with the meta data ``` curl --request POST \ --url http://localhost:9222/api/v1/retrieval \ --header 'Content-Type: application/json' \ --header 'Authorization: Bearer ragflow-fO3mPFePfLgUYg8-9gjBVVXbvHqrvMPLGaW0P86PvAk' \ --data '{ "question": "any question", "dataset_ids": ["9bb4f0591b8811f18a4a84ba59049aa3"], "metadata_condition": { "logic": "and", "conditions": [ { "name": "character", "comparison_operator": "is", "value": "刘备" } ] } }' ``` When metadata_condtion is specified in the retrieval API, it is converted to doc_ids and doc_ids is passed to retrieval function. In retrieval funciton, when doc_ids is explicitly provided , we should bypass threshold. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-10 11:57:32 +08:00
tunsuy	292a1a8566	fix: detect and fallback garbled PDF text to OCR (#13366 ) (#13404 ) ## Problem When PDF fonts lack ToUnicode/CMap mappings, pdfplumber (pdfminer) cannot map CIDs to correct Unicode characters, outputting PUA characters (U+E000~U+F8FF) or `(cid:xxx)` placeholders. The original code fully trusted pdfplumber text without any garbled detection, causing garbled output in the final parsed result. Relates to #13366 ## Solution ### 1. Garbled text detection functions - `_is_garbled_char(ch)`: Detects PUA characters (BMP/Plane 15/16), replacement character U+FFFD, control characters, and unassigned/surrogate codepoints - `_is_garbled_text(text, threshold)`: Calculates garbled ratio and detects `(cid:xxx)` patterns ### 2. Box-level fallback (in `__ocr()`) When a text box has ≥50% garbled characters, discard pdfplumber text and fallback to OCR recognition. ### 3. Page-level detection (in `__images__()`) Sample characters from each page; if garbled rate ≥30%, clear all pdfplumber characters for that page, forcing full OCR. ### 4. Layout recognizer CID filtering Filter out `(cid:xxx)` patterns in `layout_recognizer.py` text processing to prevent them from polluting layout analysis. ## Testing - 29 unit tests covering: normal CJK/English text, PUA characters, CID patterns, mixed text, boundary thresholds, edge cases - All 85 existing project unit tests pass without regression	2026-03-10 11:20:31 +08:00
Jin Hai	7f6a9e8ee9	Update ext field type of heartbeat message (#13490 ) ### What problem does this PR solve? As title ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-03-10 10:49:39 +08:00
chanx	02108772d8	refactor: Moves the LLM factory initialization logic to the `dao` package. (#13476 ) ### What problem does this PR solve? refactor: Moves the LLM factory initialization logic to the `dao` package. Removes the `init_data` package and integrates the LLM factory initialization functionality into the `dao` package. Adds a `utility` package to provide general utility functions. Updates `server_main.go` to use the new initialization path. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-03-10 10:35:55 +08:00
atian8179	88a40b95a2	fix: include missing modules in ragflow-cli PyPI package (#13457 ) ## Problem The `ragflow-cli` PyPI package (v0.24.0) is missing `http_client.py`, `ragflow_client.py`, and `user.py`, causing import errors when installed from PyPI. ## Root Cause `pyproject.toml` only lists `ragflow_cli` and `parser` in `[tool.setuptools] py-modules`. ## Fix Add the three missing modules to `py-modules`. Fixes #13456 Co-authored-by: atian8179 <atian8179@users.noreply.github.com>	2026-03-10 10:02:21 +08:00
Jin Hai	4fe706876c	Service list and minio status (#13480 ) ### What problem does this PR solve? 1. Resolve standard user can access admin service 2. Get RAGFlow service status 3. Fix minio status fetching ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-03-10 09:56:43 +08:00
writinwaters	4f507c0058	Docs: Updated Switch chunk availability (#13482 ) ### What problem does this PR solve? A quick editorial pass. ### Type of change - [x] Documentation Update	2026-03-09 21:14:45 +08:00
Yongteng Lei	7484298c82	Refa: convert download_img to async (#13477 ) ### What problem does this PR solve? Convert download_img to async. ### Type of change - [x] Refactoring - [x] Performance Improvement	2026-03-09 19:00:17 +08:00
Jin Hai	52bcd98d29	Add scheduled tasks (#13470 ) ### What problem does this PR solve? 1. RAGFlow server will send heartbeat periodically. 2. This PR will including: - Scheduled task - API server message sending - Admin server API to receive the message. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-03-09 17:48:29 +08:00
Jin Hai	c732a1c8e0	Refactor the go_binding to binding (#13469 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-03-09 15:52:31 +08:00
chanx	25ace613b0	feat: Added LLM factory initialization functionality and knowledge base related API interfaces (#13472 ) ### What problem does this PR solve? feat: Added LLM factory initialization functionality and knowledge base related API interfaces refactor(dao): Refactored the TenantLLMDAO query method feat(handler): Implemented knowledge base related API endpoints feat(service): Added LLM API key setting functionality feat(model): Extended the knowledge base model definition feat(config): Added default user LLM configuration ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-09 15:52:14 +08:00
Stephen Hu	d0465ba909	refactor: improve paddle ocr logic (#13467 ) ### What problem does this PR solve? improve paddle ocr logic ### Type of change - [x] Refactoring	2026-03-09 14:16:57 +08:00
天海蒼灆	3ce236c4e3	Feat: add switch_chunks endpoint to manage chunk availability (#13435 ) ### What problem does this commit solve? This commit introduces a new API endpoint `/datasets/<dataset_id>/documents/<document_id>/chunks/switch` that allows users to switch the availability status of specified chunks in a document as same as chunk_app.py ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-09 12:36:45 +08:00
guptas6est	32d31284cc	Fix: upgrade pypdf to 6.7.5 and migrate from deprecated pypdf2 to fix CVE-2026-28804 and CVE-2023-36464 (#13454 ) ### What problem does this PR solve? This PR addresses security vulnerabilities in PDF processing dependencies identified by Trivy security scan: 1. CVE-2026-28804 (MEDIUM): pypdf 6.7.4 vulnerable to inefficient decoding of ASCIIHexDecode streams 2. CVE-2023-36464 (MEDIUM): pypdf2 3.0.1 susceptible to infinite loop when parsing malformed comments Since pypdf2 is deprecated with no available fixes, this PR migrates all pypdf2 usage to the actively maintained pypdf library (version 6.7.5), which resolves both vulnerabilities. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-09 12:06:00 +08:00
JiangNan	2634cfc06f	Fix: undefined variable and wrong method name in agent components (#13462 ) ## Summary This PR fixes two runtime bugs in agent components: Bug 1: `agent/component/invoke.py` — `NameError` in POST + `clean_html` path The POST method's `clean_html` branch uses the variable `sections` without ever defining it. Both the GET and PUT branches correctly call `sections = HtmlParser()(None, response.content)` before referencing `sections`, but this line was missing from the POST branch (copy-paste omission). This causes a `NameError` whenever a user configures an Invoke component with `method="post"` and `clean_html=True`. Bug 2: `agent/component/data_operations.py` — `AttributeError` in `_recursive_eval` The `_recursive_eval` method recursively calls `self.recursive_eval()` (without the leading underscore) instead of `self._recursive_eval()`. Since the method is defined as `_recursive_eval`, this causes an `AttributeError` at runtime when the `literal_eval` operation processes nested dicts or lists. ## Test plan - [ ] Configure an Invoke node with `method=post` and `clean_html=True`, verify HTML is parsed correctly without `NameError` - [ ] Configure a DataOperations node with `operations=literal_eval` on nested data, verify no `AttributeError` --------- Signed-off-by: JiangNan <1394485448@qq.com>	2026-03-09 11:09:47 +08:00
Jin Hai	610c1b507d	Add more API of admin server of go (#13403 ) ### What problem does this PR solve? Add APIs to admin server. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-03-09 10:44:53 +08:00
Eden	ab6ca75245	fix(agent): ensure database connections are properly closed in ExeSQL tool (#13427 ) ## Summary Fix a database connection and cursor resource leak in the ExeSQL agent tool. When SQL execution raises an exception (for example syntax error or missing table), the existing code path skips `cursor.close()` and `db.close()`, causing database connections to accumulate over time. This can eventually lead to connection exhaustion in long-running agent workflows. ## Root Cause The cleanup logic for database cursors and connections is placed after the SQL execution loop without `try/finally` protection. If an exception occurs during `cursor.execute()`, `fetchmany()`, or result processing, the cleanup code is not reached and the connection remains open. The same issue also exists in the IBM DB2 execution path where `ibm_db.close(conn)` may be skipped when exceptions occur. ## Fix - Wrap SQL execution logic in `try/finally` blocks to guarantee resource cleanup. - Ensure `cursor.close()` and `db.close()` are always executed. - Add explicit `db.close()` when `db.cursor()` creation fails. - Remove redundant close calls in early-return branches since `finally` now handles cleanup. ## Impact - No change to normal execution behavior. - Ensures database resources are always released when errors occur. - Prevents connection leaks in long-running workflows. - Only affects `agent/tools/exesql.py`. ## Testing Manual test scenarios: 1. Valid SQL execution 2. SQL syntax error 3. Query against a non-existing table 4. Execution cancellation during query In all scenarios the database cursor and connection are properly closed. Code quality checks: - `ruff check` passed - No new warnings introduced	2026-03-09 10:36:02 +08:00
Liu An	89e495e1bc	Chore: update release workflow configuration (#13466 ) ### What problem does this PR solve? update release workflow configuration ### Type of change - [x] Update CI	2026-03-09 10:32:51 +08:00
Heyang Wang	c217b8f3d8	Feat: add DingTalk AI Table connector and integration for data synch… (#13413 ) ### What problem does this PR solve? Add DingTalk AI Table connector and integration for data synchronization Issue #13400 ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: wangheyang <wangheyang@corp.netease.com>	2026-03-06 21:13:23 +08:00
Jimmy Ben Klieve	094eae3cf5	refactor(ui): adjust dataset page styles (#13452 ) ### What problem does this PR solve? - Adjust UI styles in Dataset pages. - Adjust several shared components styles - Modify files and directory structure in `src/layouts` ### Type of change - [x] Refactoring	2026-03-06 21:13:14 +08:00
Liu An	7166a7e50e	Test: adjust test priority markers for API tests (#13450 ) ### What problem does this PR solve? Changed test priority markers from p1/p2 to p3 in three test files: - test_table_parser_dataset_chat.py: Adjusted priority for table parser dataset chat test - test_delete_chunks.py: Updated priority for chunk deletion test with invalid IDs - test_retrieval_chunks.py: Modified priority for chunks retrieval pagination test These changes demote the priority of specific test cases to p3, indicating they are lower priority tests that can run later in the test suite execution. ### Type of change - [x] Test update	2026-03-06 20:17:39 +08:00
chanx	ae4645e01b	Fix: Add folder upload #9743 (#13448 ) ### What problem does this PR solve? Fix: Add folder upload #9743 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-06 20:17:29 +08:00
balibabu	82a616589b	Feat: Add PublishConfirmDialog (#13447 ) ### What problem does this PR solve? Feat: Add PublishConfirmDialog ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-06 20:17:21 +08:00
Achieve3318	45cf24cd2f	feat(memory): implement get_highlight for OceanBase memory (#13449 ) ### What problem does this PR solve? ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-06 20:17:11 +08:00
Jin Hai	01a100bb29	Fix data models (#13444 ) ### What problem does this PR solve? Since database model is updated in python version, go server also need to update ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-03-06 20:05:10 +08:00
OliverW	3ed91345aa	fix(auth): return HTTP 401 for token-auth failures (#13420 ) Follow-up to #12488 #13386 ### What problem does this PR solve? Previously, token authentication failures returned HTTP 200 with an error code in the response body. This PR updates `token_required` to raise `Unauthorized` and relies on the global error handler to return a structured JSON response with HTTP 401 status. The response body structure (`code`, `message`, `data`) remains unchanged to preserve compatibility with the official SDK. Frontend logic has been updated to handle HTTP 401 responses in addition to checking `data.code`. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-06 18:18:14 +08:00
Yongteng Lei	51be1f1442	Refa: empty ids means no-op operation (#13439 ) ### What problem does this PR solve? Empty ids means no-op operation. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Documentation Update - [x] Refactoring --------- Co-authored-by: writinwaters <cai.keith@gmail.com>	2026-03-06 18:16:42 +08:00
Zhichang Yu	7781c51a21	Revert aliyun registry to registry.cn-hangzhou.aliyuncs.com (#13445 ) ## Summary - Revert aliyun registry from `infiniflow-registry.cn-shanghai.cr.aliyuncs.com` back to `registry.cn-hangzhou.aliyuncs.com` ## Test plan - [ ] Verify the docker/.env file contains the correct registry URL 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 18:03:35 +08:00
Magicbook1108	826af383b4	Fix: paddle ocr missing outlines (#13441 ) ### What problem does this PR solve? Fix: paddle ocr missing outlines #13422 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-06 17:19:51 +08:00
Jin Hai	2504c3adde	Fix docker file (#13438 ) ### What problem does this PR solve? To copy infinity/resource into docker images ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-03-06 16:56:12 +08:00
chanx	81fd1811b8	Feat：Using Go to implement user registration logic (#13431 ) ### What problem does this PR solve? Feat：Using Go to implement user registration logic ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-06 16:42:49 +08:00
Achieve3318	37eb533fea	Feat(memory): implement get_aggregation for OceanBase memory (#13428 ) ### What problem does this PR solve? - Add aggregation_utils.aggregate_by_field for pure aggregation logic - Wire OBConnection.get_aggregation to use it (unwrap tuple, pass messages) - Add unit tests for aggregate_by_field (no DB/heavy deps) ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-06 12:51:22 +08:00
BitToby	383986dc5f	fix: re-chunk documents when data source content is updated (#12918 ) Closes: #12889 ### What problem does this PR solve? When syncing external data sources (e.g., Jira, Confluence, Google Drive), updated documents were not being re-chunked. The raw content was correctly updated in blob storage, but the vector database retained stale chunks, causing search results to return outdated information. Root cause: The task digest used for chunk reuse optimization was calculated only from parser configuration fields (`parser_id`, `parser_config`, `kb_id`, etc.), without any content-dependent fields. When a document's content changed but the parser configuration remained the same, the system incorrectly reused old chunks instead of regenerating new ones. Example scenario: 1. User syncs a Jira issue: "Meeting scheduled for Monday" 2. User updates the Jira issue to: "Meeting rescheduled to Friday" 3. User triggers sync again 4. Raw content panel shows updated text ✓ 5. Chunk panel still shows old text "Monday" ✗ Solution: 1. Include `update_time` and `size` in the chunking config, so the task digest changes when document content is updated 2. Track updated documents separately in `upload_document()` and return them for processing 3. Process updated documents through the re-parsing pipeline to regenerate chunks [1.webm](https://github.com/user-attachments/assets/d21d4dcd-e189-4d39-8700-053bae0ca5a0) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-06 12:48:47 +08:00
Lynn	0214257886	Fix: init func (#13430 ) ### What problem does this PR solve? Fix update_cnt add error in init_data. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-06 11:42:31 +08:00
balibabu	6849d35bf5	Feat: Optimize the style of the chat page. (#13429 ) ### What problem does this PR solve? Feat: Optimize the style of the chat page. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-06 11:42:25 +08:00
Jonah Hartmann	6023eb27ac	feat: add Ragcon provider (#13425 ) ### What problem does this PR solve? This PR aims to extend the list of possible providers. Adds new Provider "RAGcon" within the Ollama Modal. It provides all model types except OCR via Openai-compatible endpoints. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Jakob <16180662+hauberj@users.noreply.github.com>	2026-03-06 09:37:27 +08:00
guptas6est	c35b210c3a	fix(security): upgrade requests to 2.32.5 in agent/sandbox to fix CVE-2024-47081 (#13424 ) ### What problem does this PR solve? This PR remediates CVE-2024-47081 (MEDIUM severity) in the agent/sandbox component by upgrading the requests library from version 2.32.3 to 2.32.5. The vulnerability allows .netrc credentials to leak via malicious URLs. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-06 09:29:18 +08:00
guptas6est	aa57bcf92a	fix: upgrade urllib3 to 2.6.3 to resolve CVE-2025-66418, CVE-2025-66471, CVE-2026-21441 (#13423 ) ### What problem does this PR solve? This PR remediates three HIGH severity vulnerabilities in urllib3 affecting the admin client and Python SDK: - CVE-2025-66418: Unbounded decompression chain leads to resource exhaustion - CVE-2025-66471: Streaming API improperly handles highly compressed data - CVE-2026-21441: Decompression-bomb safeguard bypass when following HTTP redirects Trivy security scan identified urllib3 v2.5.0 as vulnerable in both `admin/client/uv.lock` and `sdk/python/uv.lock`. This PR updates urllib3 to v2.6.3 to eliminate these security risks. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-06 09:29:10 +08:00

1 2 3 4 5 ...

5471 Commits