ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-04-29 06:47:47 +08:00

Author	SHA1	Message	Date
chanx	25ace613b0	feat: Added LLM factory initialization functionality and knowledge base related API interfaces (#13472 ) ### What problem does this PR solve? feat: Added LLM factory initialization functionality and knowledge base related API interfaces refactor(dao): Refactored the TenantLLMDAO query method feat(handler): Implemented knowledge base related API endpoints feat(service): Added LLM API key setting functionality feat(model): Extended the knowledge base model definition feat(config): Added default user LLM configuration ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-09 15:52:14 +08:00
Stephen Hu	d0465ba909	refactor: improve paddle ocr logic (#13467 ) ### What problem does this PR solve? improve paddle ocr logic ### Type of change - [x] Refactoring	2026-03-09 14:16:57 +08:00
天海蒼灆	3ce236c4e3	Feat: add switch_chunks endpoint to manage chunk availability (#13435 ) ### What problem does this commit solve? This commit introduces a new API endpoint `/datasets/<dataset_id>/documents/<document_id>/chunks/switch` that allows users to switch the availability status of specified chunks in a document as same as chunk_app.py ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-09 12:36:45 +08:00
guptas6est	32d31284cc	Fix: upgrade pypdf to 6.7.5 and migrate from deprecated pypdf2 to fix CVE-2026-28804 and CVE-2023-36464 (#13454 ) ### What problem does this PR solve? This PR addresses security vulnerabilities in PDF processing dependencies identified by Trivy security scan: 1. CVE-2026-28804 (MEDIUM): pypdf 6.7.4 vulnerable to inefficient decoding of ASCIIHexDecode streams 2. CVE-2023-36464 (MEDIUM): pypdf2 3.0.1 susceptible to infinite loop when parsing malformed comments Since pypdf2 is deprecated with no available fixes, this PR migrates all pypdf2 usage to the actively maintained pypdf library (version 6.7.5), which resolves both vulnerabilities. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-09 12:06:00 +08:00
JiangNan	2634cfc06f	Fix: undefined variable and wrong method name in agent components (#13462 ) ## Summary This PR fixes two runtime bugs in agent components: Bug 1: `agent/component/invoke.py` — `NameError` in POST + `clean_html` path The POST method's `clean_html` branch uses the variable `sections` without ever defining it. Both the GET and PUT branches correctly call `sections = HtmlParser()(None, response.content)` before referencing `sections`, but this line was missing from the POST branch (copy-paste omission). This causes a `NameError` whenever a user configures an Invoke component with `method="post"` and `clean_html=True`. Bug 2: `agent/component/data_operations.py` — `AttributeError` in `_recursive_eval` The `_recursive_eval` method recursively calls `self.recursive_eval()` (without the leading underscore) instead of `self._recursive_eval()`. Since the method is defined as `_recursive_eval`, this causes an `AttributeError` at runtime when the `literal_eval` operation processes nested dicts or lists. ## Test plan - [ ] Configure an Invoke node with `method=post` and `clean_html=True`, verify HTML is parsed correctly without `NameError` - [ ] Configure a DataOperations node with `operations=literal_eval` on nested data, verify no `AttributeError` --------- Signed-off-by: JiangNan <1394485448@qq.com>	2026-03-09 11:09:47 +08:00
Jin Hai	610c1b507d	Add more API of admin server of go (#13403 ) ### What problem does this PR solve? Add APIs to admin server. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-03-09 10:44:53 +08:00
Eden	ab6ca75245	fix(agent): ensure database connections are properly closed in ExeSQL tool (#13427 ) ## Summary Fix a database connection and cursor resource leak in the ExeSQL agent tool. When SQL execution raises an exception (for example syntax error or missing table), the existing code path skips `cursor.close()` and `db.close()`, causing database connections to accumulate over time. This can eventually lead to connection exhaustion in long-running agent workflows. ## Root Cause The cleanup logic for database cursors and connections is placed after the SQL execution loop without `try/finally` protection. If an exception occurs during `cursor.execute()`, `fetchmany()`, or result processing, the cleanup code is not reached and the connection remains open. The same issue also exists in the IBM DB2 execution path where `ibm_db.close(conn)` may be skipped when exceptions occur. ## Fix - Wrap SQL execution logic in `try/finally` blocks to guarantee resource cleanup. - Ensure `cursor.close()` and `db.close()` are always executed. - Add explicit `db.close()` when `db.cursor()` creation fails. - Remove redundant close calls in early-return branches since `finally` now handles cleanup. ## Impact - No change to normal execution behavior. - Ensures database resources are always released when errors occur. - Prevents connection leaks in long-running workflows. - Only affects `agent/tools/exesql.py`. ## Testing Manual test scenarios: 1. Valid SQL execution 2. SQL syntax error 3. Query against a non-existing table 4. Execution cancellation during query In all scenarios the database cursor and connection are properly closed. Code quality checks: - `ruff check` passed - No new warnings introduced	2026-03-09 10:36:02 +08:00
Liu An	89e495e1bc	Chore: update release workflow configuration (#13466 ) ### What problem does this PR solve? update release workflow configuration ### Type of change - [x] Update CI	2026-03-09 10:32:51 +08:00
Heyang Wang	c217b8f3d8	Feat: add DingTalk AI Table connector and integration for data synch… (#13413 ) ### What problem does this PR solve? Add DingTalk AI Table connector and integration for data synchronization Issue #13400 ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: wangheyang <wangheyang@corp.netease.com>	2026-03-06 21:13:23 +08:00
Jimmy Ben Klieve	094eae3cf5	refactor(ui): adjust dataset page styles (#13452 ) ### What problem does this PR solve? - Adjust UI styles in Dataset pages. - Adjust several shared components styles - Modify files and directory structure in `src/layouts` ### Type of change - [x] Refactoring	2026-03-06 21:13:14 +08:00
Liu An	7166a7e50e	Test: adjust test priority markers for API tests (#13450 ) ### What problem does this PR solve? Changed test priority markers from p1/p2 to p3 in three test files: - test_table_parser_dataset_chat.py: Adjusted priority for table parser dataset chat test - test_delete_chunks.py: Updated priority for chunk deletion test with invalid IDs - test_retrieval_chunks.py: Modified priority for chunks retrieval pagination test These changes demote the priority of specific test cases to p3, indicating they are lower priority tests that can run later in the test suite execution. ### Type of change - [x] Test update	2026-03-06 20:17:39 +08:00
chanx	ae4645e01b	Fix: Add folder upload #9743 (#13448 ) ### What problem does this PR solve? Fix: Add folder upload #9743 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-06 20:17:29 +08:00
balibabu	82a616589b	Feat: Add PublishConfirmDialog (#13447 ) ### What problem does this PR solve? Feat: Add PublishConfirmDialog ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-06 20:17:21 +08:00
Achieve3318	45cf24cd2f	feat(memory): implement get_highlight for OceanBase memory (#13449 ) ### What problem does this PR solve? ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-06 20:17:11 +08:00
Jin Hai	01a100bb29	Fix data models (#13444 ) ### What problem does this PR solve? Since database model is updated in python version, go server also need to update ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-03-06 20:05:10 +08:00
OliverW	3ed91345aa	fix(auth): return HTTP 401 for token-auth failures (#13420 ) Follow-up to #12488 #13386 ### What problem does this PR solve? Previously, token authentication failures returned HTTP 200 with an error code in the response body. This PR updates `token_required` to raise `Unauthorized` and relies on the global error handler to return a structured JSON response with HTTP 401 status. The response body structure (`code`, `message`, `data`) remains unchanged to preserve compatibility with the official SDK. Frontend logic has been updated to handle HTTP 401 responses in addition to checking `data.code`. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-06 18:18:14 +08:00
Yongteng Lei	51be1f1442	Refa: empty ids means no-op operation (#13439 ) ### What problem does this PR solve? Empty ids means no-op operation. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Documentation Update - [x] Refactoring --------- Co-authored-by: writinwaters <cai.keith@gmail.com>	2026-03-06 18:16:42 +08:00
Zhichang Yu	7781c51a21	Revert aliyun registry to registry.cn-hangzhou.aliyuncs.com (#13445 ) ## Summary - Revert aliyun registry from `infiniflow-registry.cn-shanghai.cr.aliyuncs.com` back to `registry.cn-hangzhou.aliyuncs.com` ## Test plan - [ ] Verify the docker/.env file contains the correct registry URL 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-06 18:03:35 +08:00
Magicbook1108	826af383b4	Fix: paddle ocr missing outlines (#13441 ) ### What problem does this PR solve? Fix: paddle ocr missing outlines #13422 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-06 17:19:51 +08:00
Jin Hai	2504c3adde	Fix docker file (#13438 ) ### What problem does this PR solve? To copy infinity/resource into docker images ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-03-06 16:56:12 +08:00
chanx	81fd1811b8	Feat：Using Go to implement user registration logic (#13431 ) ### What problem does this PR solve? Feat：Using Go to implement user registration logic ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-06 16:42:49 +08:00
Achieve3318	37eb533fea	Feat(memory): implement get_aggregation for OceanBase memory (#13428 ) ### What problem does this PR solve? - Add aggregation_utils.aggregate_by_field for pure aggregation logic - Wire OBConnection.get_aggregation to use it (unwrap tuple, pass messages) - Add unit tests for aggregate_by_field (no DB/heavy deps) ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-06 12:51:22 +08:00
BitToby	383986dc5f	fix: re-chunk documents when data source content is updated (#12918 ) Closes: #12889 ### What problem does this PR solve? When syncing external data sources (e.g., Jira, Confluence, Google Drive), updated documents were not being re-chunked. The raw content was correctly updated in blob storage, but the vector database retained stale chunks, causing search results to return outdated information. Root cause: The task digest used for chunk reuse optimization was calculated only from parser configuration fields (`parser_id`, `parser_config`, `kb_id`, etc.), without any content-dependent fields. When a document's content changed but the parser configuration remained the same, the system incorrectly reused old chunks instead of regenerating new ones. Example scenario: 1. User syncs a Jira issue: "Meeting scheduled for Monday" 2. User updates the Jira issue to: "Meeting rescheduled to Friday" 3. User triggers sync again 4. Raw content panel shows updated text ✓ 5. Chunk panel still shows old text "Monday" ✗ Solution: 1. Include `update_time` and `size` in the chunking config, so the task digest changes when document content is updated 2. Track updated documents separately in `upload_document()` and return them for processing 3. Process updated documents through the re-parsing pipeline to regenerate chunks [1.webm](https://github.com/user-attachments/assets/d21d4dcd-e189-4d39-8700-053bae0ca5a0) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-06 12:48:47 +08:00
Lynn	0214257886	Fix: init func (#13430 ) ### What problem does this PR solve? Fix update_cnt add error in init_data. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-06 11:42:31 +08:00
balibabu	6849d35bf5	Feat: Optimize the style of the chat page. (#13429 ) ### What problem does this PR solve? Feat: Optimize the style of the chat page. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-06 11:42:25 +08:00
Jonah Hartmann	6023eb27ac	feat: add Ragcon provider (#13425 ) ### What problem does this PR solve? This PR aims to extend the list of possible providers. Adds new Provider "RAGcon" within the Ollama Modal. It provides all model types except OCR via Openai-compatible endpoints. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Jakob <16180662+hauberj@users.noreply.github.com>	2026-03-06 09:37:27 +08:00
guptas6est	c35b210c3a	fix(security): upgrade requests to 2.32.5 in agent/sandbox to fix CVE-2024-47081 (#13424 ) ### What problem does this PR solve? This PR remediates CVE-2024-47081 (MEDIUM severity) in the agent/sandbox component by upgrading the requests library from version 2.32.3 to 2.32.5. The vulnerability allows .netrc credentials to leak via malicious URLs. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-06 09:29:18 +08:00
guptas6est	aa57bcf92a	fix: upgrade urllib3 to 2.6.3 to resolve CVE-2025-66418, CVE-2025-66471, CVE-2026-21441 (#13423 ) ### What problem does this PR solve? This PR remediates three HIGH severity vulnerabilities in urllib3 affecting the admin client and Python SDK: - CVE-2025-66418: Unbounded decompression chain leads to resource exhaustion - CVE-2025-66471: Streaming API improperly handles highly compressed data - CVE-2026-21441: Decompression-bomb safeguard bypass when following HTTP redirects Trivy security scan identified urllib3 v2.5.0 as vulnerable in both `admin/client/uv.lock` and `sdk/python/uv.lock`. This PR updates urllib3 to v2.6.3 to eliminate these security risks. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-06 09:29:10 +08:00
Jimmy Ben Klieve	ef4cbe72a3	refactor(ui): adjust global navigation bar style (#13419 ) ### What problem does this PR solve? Renovate global navigation bar, align styles to the design. (May causes minor layout issues in sub-pages, will check and fix soon) ### Type of change - [x] Refactoring	2026-03-05 20:47:29 +08:00
leonardlin	9e0e128ce5	Add checksum/values annotation to ragflow.yaml (#13409 ) Add checksum annotation for values in ragflow.yaml ### What problem does this PR solve? This PR is about this ticket: #13408 Ragflow helm charts do not include the Values.yaml in the list of watched changes. If you update the Values.yaml for an existing deployment, helm will not detect it and not update the deployment. This PR fixes that. ### Type of change - [X] Bug Fix (non-breaking change which fixes an issue)	2026-03-05 20:27:38 +08:00
writinwaters	963e31e9b5	Refact: Updated the doc structure. (#13414 ) ### What problem does this PR solve? Updated the doc structure. ### Type of change - [x] Documentation Update	2026-03-05 19:04:56 +08:00
Idriss Sbaaoui	d90d6026af	Playwright : new chat multi model test (#13402 ) ### What problem does this PR solve? new test for chat multiple model and other chat parameters under playwright ### Type of change - [x] Other (please describe): new test/ data-testid	2026-03-05 18:51:57 +08:00
Yongteng Lei	d9785ea2ce	Fix: Alibaba cloud OSS config issue (#13406 ) ### What problem does this PR solve? Alibaba Could OSS config issue #13390. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-05 18:13:45 +08:00
chanx	8b534c895e	Fix: UI Placeholder and Hint Optimization (#13416 ) ### What problem does this PR solve? Fix: UI Placeholder and Hint Optimization ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-05 18:13:19 +08:00
chanx	35fc5edc93	feat: Adds the tenant model ID field to the interface definition. (#13274 ) ### What problem does this PR solve? feat: Adds the tenant model ID field to the interface definition ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-05 17:27:34 +08:00
Lynn	62cb292635	Feat/tenant model (#13072 ) ### What problem does this PR solve? Add id for table tenant_llm and apply in LLMBundle. ### Type of change - [x] Refactoring --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com> Co-authored-by: Liu An <asiro@qq.com>	2026-03-05 17:27:17 +08:00
Magicbook1108	47540a4147	Feat: published agent version control (#13410 ) ### What problem does this PR solve? Feat: published agent version control ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-05 17:26:39 +08:00
guptas6est	8c9b080499	fix: update axios to 1.13.5+ to remediate CVE-2026-25639 DoS vulnerability (#13380 ) ### What problem does this PR solve? This PR remediates CVE-2026-25639, a HIGH severity Denial of Service vulnerability in axios caused by __proto__ pollution in the mergeConfig function. The vulnerability affects both the web frontend and the sandbox nodejs environment. Trivy security scan identified axios versions below 1.13.5 as vulnerable. This PR updates axios to secure versions (1.13.6 in web, 1.13.5 in sandbox) to eliminate the security risk. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-05 17:26:04 +08:00
Yongteng Lei	f13a1fb007	Refa: improve model verification ux (#13392 ) ### What problem does this PR solve? Improve model verification UX. #13395 ### Type of change - [x] Refactoring --------- Co-authored-by: Liu An <asiro@qq.com>	2026-03-05 17:23:47 +08:00
Liu An	3124fa955e	chore: add bin and internal dirs to .gitignore for Go server build output (#13407 ) ### What problem does this PR solve? add bin and internal dirs to .gitignore for Go server build output	2026-03-05 15:52:01 +08:00
tunsuy	e1f1184b01	test: add unit tests for graphrag/utils.py (87 test cases) (#13328 ) Add comprehensive unit tests for `graphrag/utils.py`, covering 15 functions/classes with 87 test cases. Tested functions: - clean_str, dict_has_keys_with_types, perform_variable_replacements - get_from_to, compute_args_hash, is_float_regex - GraphChange dataclass - handle_single_entity_extraction, handle_single_relationship_extraction - graph_merge, tidy_graph - split_string_by_multi_markers, pack_user_ass_to_openai_messages - is_continuous_subsequence, merge_tuples, flat_uniq_list All 327 existing + new tests pass with no regressions.	2026-03-05 15:30:43 +08:00
Jin Hai	3e3b665b89	RAGFlow admin server go version (#13394 ) ### What problem does this PR solve? 1. init go admin server 2. refactor api server router 3. add benchmark CI to 450s time limit 4. remove docker builder container after building ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-03-05 15:18:40 +08:00
Liu An	6f5bd4d2e9	feat: add bin and internal dirs to .gitignore for Go server build output (#13391 ) ### What problem does this PR solve? add bin and internal dirs to .gitignore for Go server build output	2026-03-05 14:26:40 +08:00
天海蒼灆	118f737b3a	Feat:Enhance chunk management by adding support for 'available', 'tag_kwd' and 'tag_feas' (#13383 ) ### What problem does this PR solve? Enhance chunk management by adding support for 'available', 'tag_kwd' and 'tag_feas' fields in list, add, and update chunk functions just like chunk_app.py.This improves data handling and flexibility in chunk processing. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-05 13:45:39 +08:00
orbcom-pedroferreira	61209ff3bf	Feat: File uploads for future conversations on SDK API (#13378 ) ### What problem does this PR solve? This PR aims to: 1. Enable file uploads for the public API, similarly to what /document/upload_info accomplishes for the frontend; 2. Enable files sent to the /chat/:chat_id/completions endpoint to be used within the conversation. We classify the first item as a new future, while classifying the second one as a bug fix. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) The work related to this PR was co-authored by [Bruno Ferreira](https://github.com/brunopferreira): Custom Solutions Manager @ [Orbcom](https://orbcom.pt/) [Pedro Ferreira](https://github.com/sirj0k3r): Lead Software Developer @ [Orbcom](https://orbcom.pt/) [Pedro Cardoso](https://github.com/pedromiguel4560): Associate Software Developer @ [Orbcom](https://orbcom.pt/) This PR replaces #13248 --------- Co-authored-by: Pedro Cardoso <pedrocardoso@orbcom.pt> Co-authored-by: Pedro Ferreira <pedroferreira@orbcom.pt>	2026-03-04 22:26:58 +08:00
tunsuy	020068dd16	Fix: preserve field boundaries in chunked documents from MySQL… (#13369 ) ### What problem does this PR solve? When multiple columns are used as content columns in RDBMS connector, the generated document text gets chunked by TxtParser which strips newline delimiters during merge. This causes field names and values from different columns to be concatenated without any separator, making the content unreadable. Changes: - txt_parser.py: restore newline separator when merging adjacent text segments within a chunk, so that split sections are not directly concatenated - rdbms_connector.py: use double newline between fields and place field value on a new line after the field name bracket, giving TxtParser clearer boundaries to work with Closes #13001 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: tunsuytang <tunsuytang@tencent.com>	2026-03-04 21:42:02 +08:00
writinwaters	9deb3a6249	Refact: Fine tweaks to the doc structure. (#13379 ) ### What problem does this PR solve? Fine tweaks to the doc structure. ### Type of change - [x] Documentation Update	2026-03-04 21:30:28 +08:00
balibabu	be231faec0	Feat: Write the row and column numbers into the element's data attribute for easy code location. (#13368 ) ### What problem does this PR solve? Feat: Write the row and column numbers into the element's data attribute for easy code location. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Liu An <asiro@qq.com>	2026-03-04 20:50:58 +08:00
Idriss Sbaaoui	b3a7332c08	playwright : add data-testids for new test (#13364 ) ### What problem does this PR solve? add data-testids for new test ### Type of change - [x] Other (please describe): add data-testids for new test	2026-03-04 19:28:36 +08:00
Yao Wei	c99b53064d	fix: remove company info from resume_summary to prevent over-retrieval (#13358 ) ### What problem does this PR solve? Problem: When searching for a specific company name like(Daofeng Technology), the search would incorrectly return unrelated resumes containing generic terms like (Technology) in their company names Root Cause: The `corporation_name_tks` field was included in the identity fields that are redundantly written to every chunk. This caused common words like "科技" to match across all chunks, leading to over-retrieval of irrelevant resumes. Solution: Remove `corporation_name_tks` from the `_IDENTITY_FIELDS` list. Company information is still preserved in the "Work Overview" chunk where it belongs, allowing proper company-based searches while preventing false positives from generic terms. --------- Co-authored-by: Aron.Yao <yaowei@192.168.1.68> Co-authored-by: Aron.Yao <yaowei@yaoweideMacBook-Pro.local> Co-authored-by: Liu An <asiro@qq.com>	2026-03-04 19:24:49 +08:00

1 2 3 4 5 ...

5449 Commits