ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-03-17 04:47:56 +08:00

Author	SHA1	Message	Date
Josh	a353c7bdd7	Fix: avoid empty doc filter in knowledge retrieval (#13484 ) ## Summary Fix knowledge-base chat retrieval when no individual document IDs are selected. ## Root Cause `async_chat()` initialized `doc_ids` as an empty list when the request did not explicitly select documents. That empty list was then forwarded into retrieval as an active `doc_id` filter, effectively becoming `doc_id IN []` and suppressing all chunk matches. ## Changes - treat missing selected document IDs as `None` instead of `[]` - keep explicit document filtering when IDs are actually provided - add regression coverage for the shared chat retrieval path ## Validation - `python3 -m py_compile api/db/services/dialog_service.py test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py` - `.venv/bin/python -m pytest test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py` - manually verified that chat completions again inject retrieved knowledge into the prompt --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-03-12 16:03:30 +08:00
Josh	2d2d3cdbcf	Fix document metadata loading for paged listings (#13515 ) ## Summary - scope normal document-list metadata lookups to the current page's document IDs - keep the `return_empty_metadata=True` path dataset-wide because it needs full knowledge of docs that already have metadata - add unit tests for both paged listing paths and the unchanged empty-metadata behavior ## Why `DocumentService.get_list()` and the normal `get_by_kb_id()` path were calling `DocMetadataService.get_metadata_for_documents(None, kb_id)`, which loads metadata for the entire dataset on every page request. That becomes especially problematic on large datasets. The metadata scan path paginates through the full metadata index without an explicit sort, while the ES helper only switches to `search_after` beyond `10000` results when a sort is present. In practice this can lead to unnecessary full-dataset metadata work, slower document-list loading, and unreliable `meta_fields` in list responses for large KBs. This change keeps the existing empty-metadata filter behavior intact, but scopes normal list responses to metadata for the current page only.	2026-03-11 13:42:16 +08:00
Heyang Wang	08f83ff331	Feat: Support get aggregated parsing status to dataset via the API (#13481 ) ### What problem does this PR solve? Support getting aggregated parsing status to dataset via the API Issue: #12810 ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: heyang.why <heyang.why@alibaba-inc.com>	2026-03-10 18:05:45 +08:00
Idriss Sbaaoui	2508c46c8f	Playwright : add new test for configuration tab in datasets (#13365 ) ### What problem does this PR solve? this pr adds new tests, for the full configuration tab in datasests ### Type of change - [x] Other (please describe): new tests	2026-03-04 19:10:06 +08:00
Liu An	7715bad04e	refactor: reorganize unit test files into appropriate directories (#13343 ) ### What problem does this PR solve? Move test files from utils/ to their corresponding functional directories: - api/db/ for database related tests - api/utils/ for API utility tests - rag/utils/ for RAG utility tests ### Type of change - [x] Refactoring	2026-03-04 11:02:56 +08:00

5 Commits