ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-05-03 08:47:48 +08:00

Author	SHA1	Message	Date
zagnaan	59f4c51222	fix(entrypoint): Preserve $ in passwords during template expansion (#12509 ) ### What problem does this PR solve? Fix shell variable expansion to preserve $ in password defaults when env vars are unset. Fixes Azure RDS auto-rotated passwords (that contain $) being truncated during template processing. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-15 19:30:33 +08:00
chanx	8c1fbfb130	Fix：Some bugs (#12648 ) ### What problem does this PR solve? Fix: Modified and optimized the metadata condition card component. Fix: Use startOfDay and endOfDay to ensure the date range includes a full day. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-15 19:28:22 +08:00
Kevin Hu	cec06bfb5d	Fix: empty chunk issue. (#12638 ) #12570 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-15 17:46:21 +08:00
writinwaters	2167e3a3c0	Docs: Added share memory (#12647 ) ### Type of change - [x] Documentation Update	2026-01-15 17:21:36 +08:00
liuxiaoyusky	2ea8dddef6	fix(infinity): Use comma separator for important_kwd to preserve mult… (#12618 ) ## Problem The \`important_kwd\` field in Infinity connector was using mismatched separators: - Storage: \`list2str(v)\` uses space as default separator - Reading: \`v.split()\` splits by all whitespace This causes multi-word keywords like \`\"Senior Fund Manager\"\` to be incorrectly split into \`[\"Senior\", \"Fund\", \"Manager\"]\`. ## Solution Use comma \`,\` as separator for both storing and reading, consistent with: 1. The LLM output format in \`keyword_prompt.md\` (\"delimited by ENGLISH COMMA\") 2. The \`cached.split(\",\")\` in \`task_executor.py\` ## Changes - \`insert()\`: \`list2str(v)\` → \`list2str(v, \",\")\` - \`update()\`: \`list2str(v)\` → \`list2str(v, \",\")\` - \`get_fields()\`: \`v.split()\` → \`v.split(\",\") if v else []\` ## Impact This bug affects: - Python-level reranking weight calculation (\`important_kwd * 5\`) - API response keyword display - Search precision due to fragmented keywords	2026-01-15 15:32:40 +08:00
longbingljw	18867daba7	chore: bump pyobvector from 0.2.18 to 0.2.22 (#12640 ) ### What problem does this PR solve? Update ob client ### Type of change - [x] Other (please describe):dependency upgrade	2026-01-15 15:21:34 +08:00
longbingljw	d68176326d	feat: add oceanbase mount to gitignore (#12642 ) ### What problem does this PR solve? feat: add oceanbase mount to .gitignore ### Type of change - [x] Refactoring	2026-01-15 15:20:40 +08:00
balibabu	d531bd4f1a	Fix: Editing the agent greeting causes the greeting to be continuously added to the message list. #12635 (#12636 ) ### What problem does this PR solve? Fix: Editing the agent greeting causes the greeting to be continuously added to the message list. #12635 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-15 14:55:19 +08:00
Vedant Madane	ac936005e6	fix: ensure deleted chunks are not returned in retrieval (#12520 ) (#12546 ) ## Summary Fixes #12520 - Deleted chunks should not appear in retrieval/reference results. ## Changes ### Core Fix - api/apps/chunk_app.py: Include \doc_id\ in delete condition to properly scope the delete operation ### Improved Error Handling - api/db/services/document_service.py: Better separation of concerns with individual try-catch blocks and proper logging for each cleanup operation ### Doc Store Updates - rag/utils/es_conn.py: Updated delete query construction to support compound conditions - rag/utils/opensearch_conn.py: Same updates for OpenSearch compatibility ### Tests - test/testcases/.../test_retrieval_chunks.py: Added \TestDeletedChunksNotRetrievable\ class with regression tests - test/unit/test_delete_query_construction.py: Unit tests for delete query construction ## Testing - Added regression tests that verify deleted chunks are not returned by retrieval API - Tests cover single chunk deletion and batch deletion scenarios	2026-01-15 14:45:55 +08:00
Pegasus	d8192f8f17	Fix: validate regex pattern in split_with_pattern to prevent crash (#12633 ) ### What problem does this PR solve? Fix regex pattern validation in split_with_pattern (#12605) - Add try-except block to validate user-provided regex patterns before use - Gracefully fallback to single chunk when invalid regex is provided - Prevent server crash during DOCX parsing with malformed delimiters ## Problem Parsing DOCX files with custom regex delimiters crashes with `re.error: nothing to repeat at position 9` when users provide invalid regex patterns. Closes #12605 ## Solution Validate and compile regex pattern before use. On invalid pattern, log warning and return content as single chunk instead of crashing. ## Changes - `rag/nlp/__init__.py`: Add regex validation in `split_with_pattern()` function ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=42954461	2026-01-15 14:24:51 +08:00
Kevin Hu	eb35e2b89f	Fix: async invocation isssue. (#12634 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-15 14:22:16 +08:00
MkDev11	97b983fd0b	fix: add fallback parser list for empty parser_ids (#12632 ) ### What problem does this PR solve? Fixes #12570 - The slicing method dropdown was empty when deploying RAGFlow v0.23.1 from source code. The issue occurred because `parser_ids` from the tenant info was empty or undefined, causing `useSelectParserList` to return an empty array. This PR adds a fallback to a default parser list when `parser_ids` is empty, ensuring the dropdown always has options. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --- Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=94194147	2026-01-15 14:05:25 +08:00
Magicbook1108	b40a7b2e7d	Feat: Hash doc id to avoid duplicate name. (#12573 ) ### What problem does this PR solve? Feat: Hash doc id to avoid duplicate name. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-15 14:02:15 +08:00
Kevin Hu	9a10558f80	Refa: async retrieval process. (#12629 ) ### Type of change - [x] Refactoring - [x] Performance Improvement	2026-01-15 12:28:49 +08:00
SID	f82628c40c	Fix: langfuse connection error handling #12621 (#12626 ) ## Description Fixes connection error handling when langfuse service is unavailable. The application now gracefully handles connection failures instead of crashing. ## Changes - Wrapped `langfuse.auth_check()` calls in try-except blocks in: - `api/db/services/dialog_service.py` - `api/db/services/tenant_llm_service.py` ## Problem When langfuse service is unavailable or connection is refused, `langfuse.auth_check()` throws `httpx.ConnectError: [Errno 111] Connection refused`, causing the application to crash during document parsing or dialog operations. ## Solution Added try-except blocks around `langfuse.auth_check()` calls to catch connection errors and gracefully skip langfuse tracing instead of crashing. The application continues functioning normally even when langfuse is unavailable. ## Related Issue Fixes #12621 --- Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=158349177	2026-01-15 11:23:15 +08:00
chanx	7af98328f5	Fix: the styles of the multi-select component and the filter pop-up. (#12628 ) ### What problem does this PR solve? Fix: Fix the styles of the multi-select component and the filter pop-up. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-15 10:53:18 +08:00
MkDev11	678a4f959c	Fix: skip internal bookmark references in DOCX parsing (#12604 ) (#12611 ) ### What problem does this PR solve? Fixes #12604 - DOCX files containing hyperlinks to internal bookmarks (e.g., `#_文档目录`) cause a `KeyError` during parsing: ``` KeyError: "There is no item named 'word/#_文档目录' in the archive" ``` This happens because python-docx incorrectly tries to read internal bookmark references as files from the ZIP archive. Internal bookmarks are relationship targets starting with `#` and are not actual files. This PR extends the existing `load_from_xml_v2` workaround (which already handles `NULL` targets) to also skip relationship targets starting with `#`. Related upstream issue: https://github.com/python-openxml/python-docx/issues/902 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --- Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=94194147	2026-01-14 19:08:46 +08:00
Kevin Hu	15a8bb2e9c	Fix: chunk list async issue. (#12615 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-14 17:32:07 +08:00
Pegasus	b091ff2730	Fix enable_thinking parameter for Qwen3 models (#12603 ) ### Issue When using Qwen3 models (`qwen3-32b`, `qwen3-max`) through the Tongyi-Qianwen provider for non-streaming calls (e.g., knowledge graph generation), the API fails with: Closes #12424 ``` parameter.enable_thinking must be set to false for non-streaming calls ``` ### Root Cause In `LiteLLMBase.async_chat()`, the `extra_body={"enable_thinking": False}` was set in `kwargs` but never forwarded to `_construct_completion_args()`. ### What problem does this PR solve? Pass merged kwargs to `_construct_completion_args()` using `{gen_conf, **kwargs}` to safely handle potential duplicate parameters. ### Changes - `rag/llm/chat_model.py`: Forward kwargs containing `extra_body` to `_construct_completion_args()` in `async_chat()` _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Contribution by Gittensor, see my contribution statistics at https://gittensor.io/miners/details?githubId=42954461	2026-01-14 16:35:46 +08:00
6ba3i	5b22f94502	Feat: Benchmark CLI additions and documentation (#12536 ) ### What problem does this PR solve? This PR adds a dedicated HTTP benchmark CLI for RAGFlow chat and retrieval endpoints so we can measure latency/QPS. ### Type of change - [x] Documentation Update - [x] Other (please describe): Adds a CLI benchmarking tool for chat/retrieval latency/QPS --------- Co-authored-by: Liu An <asiro@qq.com>	2026-01-14 13:49:16 +08:00
Yongteng Lei	a7671583b3	Feat: add CN regions for AWS (#12610 ) ### What problem does this PR solve? Add CN regions for AWS. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-14 12:34:55 +08:00
balibabu	d32fa02d97	Fix: Unable to copy category node. #12607 (#12609 ) ### What problem does this PR solve? Fix: Unable to copy category node. #12607 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-14 11:45:31 +08:00
lys1313013	f72a35188d	refactor: remove debug print statements (#12598 ) ### What problem does this PR solve? This PR eliminates unnecessary debug print statements that were left in hot paths of the codebase. ### Type of change - [x] Refactoring	2026-01-14 10:05:34 +08:00
6ba3i	ea619dba3b	Added to the HTTP API test suite (#12556 ) ### What problem does this PR solve? This PR adds missing HTTP API test coverage for dataset graph/GraphRAG/RAPTOR tasks, metadata summary, chat completions, agent sessions/completions, and related questions. It also introduces minimal HTTP test helpers to exercise these endpoints consistently with the existing suite. ### Type of change - [x] Other (please describe): Test coverage (HTTP API tests) --------- Co-authored-by: Liu An <asiro@qq.com>	2026-01-14 10:02:30 +08:00
writinwaters	36b0835740	Docs: Use memory (#12599 ) ### What problem does this PR solve? ### Type of change - [x] Documentation Update	2026-01-14 09:40:31 +08:00
6ba3i	0795616b34	Align p3 HTTP/SDK tests with current backend behavior (#12563 ) ### What problem does this PR solve? Updates pre-existing HTTP API and SDK tests to align with current backend behavior (validation errors, 404s, and schema defaults). This ensures p3 regression coverage is accurate without changing production code. ### Type of change - [x] Other (please describe): align p3 HTTP/SDK tests with current backend behavior --------- Co-authored-by: Liu An <asiro@qq.com>	2026-01-13 19:22:47 +08:00
Yongteng Lei	941651a16f	Fix: wrong input trace in Category component (#12590 ) ### What problem does this PR solve? Wrong input trace in Category component ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-13 17:54:57 +08:00
He Wang	360114ed42	fix(ob_conn): avoid reusing SQLAlchemy Column objects in DDL (#12588 ) ### What problem does this PR solve? When there are multiple users, parsing a document for a new user can trigger the reuse of column objects, leading to the error `sqlalchemy.exc.ArgumentError: Column object 'id' already assigned to Table xxx`. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-13 17:39:20 +08:00
chanx	ffedb2c6d3	Feat: The MetadataFilterConditions component supports adding values via search. (#12585 ) ### What problem does this PR solve? Feat: The MetadataFilterConditions component supports adding values via search. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-13 17:03:25 +08:00
LIRUI YU	947e63ca14	Fixed typos and added pptx preview for frontend (#12577 ) ### What problem does this PR solve? Previously, we added support for previewing PPT and PPTX files in the backend. Now, we are adding it to the frontend, so when the slides in the chat interface are referenced, they will no longer be blank. ### Type of change - Bug Fix (non-breaking change which fixes an issue)	2026-01-13 17:02:36 +08:00
He Wang	34d74d9928	fix: add uv-aarch64-unknown-linux-gnu.tar.gz to deps image (#12516 ) ### What problem does this PR solve? Add uv-aarch64-unknown-linux-gnu.tar.gz to support building ARM64 Docker images. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Liu An <asiro@qq.com>	2026-01-13 15:37:32 +08:00
balibabu	accae95126	Feat: Exported Agent JSON Should Include Conversation Variables Configuration #11796 (#12579 ) ### What problem does this PR solve? Feat: Exported Agent JSON Should Include Conversation Variables Configuration #11796 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-13 15:35:45 +08:00
Yongteng Lei	68e5c86e9c	Fix: image not displaying thumbnails when using pipeline (#12574 ) ### What problem does this PR solve? Fix image not displaying thumbnails when using pipeline. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-13 12:54:13 +08:00
Yongteng Lei	64c75d558e	Fix: zip extraction vulnerabilities in MinerU and TCADP (#12527 ) ### What problem does this PR solve? Fix zip extraction vulnerabilities: - Block symlink entries in zip files. - Reject encrypted zip entries. - Prevent absolute path attacks (including Windows paths). - Block path traversal attempts (../). - Stop zip slip exploits (directory escape). - Use streaming for memory-safe file handling. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-13 12:24:50 +08:00
LIRUI YU	41c84fd78f	Add MIME types for PPT and PPTX files (#12562 ) Otherwise, slide files cannot be opened in Chat module ### What problem does this PR solve? Backend Reason (API): In the api/utils/web_utils.py file of the backend, the CONTENT_TYPE_MAP dictionary is missing ppt and pptx. MIME type mapping. This means that when the frontend requests a PPTX file, the backend cannot correctly inform the browser that it is a PPTX file, resulting in the file being displayed incorrectly. Type identification error. ### Type of change - Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2026-01-13 12:17:49 +08:00
LGRY	d76912ab15	Fix: Use uv pip install for Docling installation (#12567 ) Fixes #12440 ### What problem does this PR solve? The current implementation uses `python3 -m pip` which can fail in certain environments. This change leverages `uv pip install` instead, which aligns with the project's existing tooling. ### Type of change - Removed the ensurepip line (not needed since uv manages pip) - Changed python3 to "$PY" for consistency with the rest of the script - Changed python3 -m pip install to uv pip install Co-authored-by: Gongzi <gongzi@192.168.0.100>	2026-01-13 11:48:42 +08:00
Lin Manhui	4fe3c24198	feat: PaddleOCR PDF parser supports thumnails and positions (#12565 ) ### What problem does this PR solve? 1. PaddleOCR PDF parser supports thumnails and positions. 2. Add FAQ documentation for PaddleOCR PDF parser. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-13 09:51:08 +08:00
Kevin Hu	44bada64c9	Feat: support tree structured deep-research policy. (#12559 ) ### What problem does this PR solve? #12558 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-13 09:41:35 +08:00
Jimmy Ben Klieve	867ec94258	revert white-space changes in docs (#12557 ) ### What problem does this PR solve? Trailing white-spaces in commit `6814ace1aa` got automatically trimmed by code editor may causes documentation typesetting broken. Mostly for double spaces for soft line breaks. ### Type of change - [x] Documentation Update	2026-01-13 09:41:02 +08:00
chanx	fd0a1fde6b	Feat: Enhanced metadata functionality (#12560 ) ### What problem does this PR solve? Feat: Enhanced metadata functionality - Metadata filtering supports searching. - Values can be directly modified. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-12 19:05:33 +08:00
Lynn	653001b14f	Doc: python sdk document (#12554 ) ### What problem does this PR solve? Add python sdk document for memory api. ### Type of change - [x] Documentation Update	2026-01-12 15:31:02 +08:00
chanx	d4f8c724ed	Fix:Automatically enable metadata and optimize parser dialog logic (#12553 ) ### What problem does this PR solve? Fix:Automatically enable metadata and optimize parser dialog logic ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-12 15:29:50 +08:00
Jin Hai	a7dd3b7e9e	Add time cost when start servers (#12552 ) ### What problem does this PR solve? - API server - Ingestion server - Data sync server - Admin server ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-01-12 12:48:23 +08:00
Stephen Hu	638c510468	refactor: introduce common normalize method in rerank base class (#12550 ) ### What problem does this PR solve? introduce common normalize method in rerank base class ### Type of change - [x] Refactoring	2026-01-12 11:07:11 +08:00
Zhizhou Li	ff11e3171e	Feat: SandBox docker CLI error in ARM CPU #12433 (#12434 ) ### What problem does this PR solve? Add multi-architecture support for Sandbox Updated Dockerfile to support multiple architectures for Docker Sandbox installation. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-12 11:06:33 +08:00
Zhichang Yu	030d6ba004	CI collect ragflow log (#12543 ) ### What problem does this PR solve? As title ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [x] Other (please describe): CI	2026-01-10 09:52:32 +08:00
lys1313013	b226e06e2d	refactor: remove debug print statements (#12534 ) ### What problem does this PR solve? refactor: remove debug print statements ### Type of change - [x] Refactoring	2026-01-09 19:23:50 +08:00
Lin Manhui	2e09db02f3	feat: add paddleocr parser (#12513 ) ### What problem does this PR solve? Add PaddleOCR as a new PDF parser. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-09 17:48:45 +08:00
Haiyang.Pu	6abf55c048	Feat: support openapi (#12521 ) ### What problem does this PR solve? Support OpenAPI interface description. The issue of not supporting the Swagger interface after upgrading the system framework from Flask to Quart has been resolved. Resolved https://github.com/infiniflow/ragflow/issues/5264 ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: puhaiyang <“761396462@qq.com”>	2026-01-09 17:48:20 +08:00
Lynn	f9d4179bf2	Feat：memory sdk (#12538 ) ### What problem does this PR solve? Move memory and message apis to /api, and add sdk support. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-01-09 17:45:58 +08:00

1 2 3 4 5 ...

5060 Commits