ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-03-28 01:50:38 +08:00

Author	SHA1	Message	Date
NeedmeFordev	c3f79dbcb0	fix(jira): prevent missed incremental updates after issue edits (#13674 ) ### What problem does this PR solve? Fixes [#13505](https://github.com/infiniflow/ragflow/issues/13505): Jira incremental sync could miss updated issues after initial sync, especially near time boundaries. Root cause: - Jira JQL uses minute-level precision for `updated` filters. - Incremental windows had no overlap buffer, so boundary updates could be skipped. - Sync log cursor tracking used a backward-facing update for `poll_range_start`. - Existing-doc updates in `upload_document` lacked a KB ownership guard for doc-id collisions. What changed: - Added Jira incremental overlap buffer (`time_buffer_seconds`, defaulting to `JIRA_SYNC_TIME_BUFFER_SECONDS`) when building JQL lower-bound time. - Preserved second-level post-filtering to avoid duplicate reprocessing while still catching boundary updates. - Improved Jira sync logging to include start/end window and overlap configuration. - Updated sync cursor tracking in `increase_docs` to keep `poll_range_start` moving forward with max update time. - Added KB ID safety check before updating existing document records in `upload_document`. Verification performed: - Python syntax compile checks passed for modified files. - Manual verification flow: 1. Run full Jira sync. 2. Edit an already-indexed Jira issue. 3. Run next incremental sync. 4. Confirm updated content is re-ingested into KB. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-03-18 23:31:05 +08:00
Josh	a353c7bdd7	Fix: avoid empty doc filter in knowledge retrieval (#13484 ) ## Summary Fix knowledge-base chat retrieval when no individual document IDs are selected. ## Root Cause `async_chat()` initialized `doc_ids` as an empty list when the request did not explicitly select documents. That empty list was then forwarded into retrieval as an active `doc_id` filter, effectively becoming `doc_id IN []` and suppressing all chunk matches. ## Changes - treat missing selected document IDs as `None` instead of `[]` - keep explicit document filtering when IDs are actually provided - add regression coverage for the shared chat retrieval path ## Validation - `python3 -m py_compile api/db/services/dialog_service.py test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py` - `.venv/bin/python -m pytest test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py` - manually verified that chat completions again inject retrieved knowledge into the prompt --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-03-12 16:03:30 +08:00
Josh	2d2d3cdbcf	Fix document metadata loading for paged listings (#13515 ) ## Summary - scope normal document-list metadata lookups to the current page's document IDs - keep the `return_empty_metadata=True` path dataset-wide because it needs full knowledge of docs that already have metadata - add unit tests for both paged listing paths and the unchanged empty-metadata behavior ## Why `DocumentService.get_list()` and the normal `get_by_kb_id()` path were calling `DocMetadataService.get_metadata_for_documents(None, kb_id)`, which loads metadata for the entire dataset on every page request. That becomes especially problematic on large datasets. The metadata scan path paginates through the full metadata index without an explicit sort, while the ES helper only switches to `search_after` beyond `10000` results when a sort is present. In practice this can lead to unnecessary full-dataset metadata work, slower document-list loading, and unreliable `meta_fields` in list responses for large KBs. This change keeps the existing empty-metadata filter behavior intact, but scopes normal list responses to metadata for the current page only.	2026-03-11 13:42:16 +08:00
Heyang Wang	08f83ff331	Feat: Support get aggregated parsing status to dataset via the API (#13481 ) ### What problem does this PR solve? Support getting aggregated parsing status to dataset via the API Issue: #12810 ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: heyang.why <heyang.why@alibaba-inc.com>	2026-03-10 18:05:45 +08:00
Idriss Sbaaoui	2508c46c8f	Playwright : add new test for configuration tab in datasets (#13365 ) ### What problem does this PR solve? this pr adds new tests, for the full configuration tab in datasests ### Type of change - [x] Other (please describe): new tests	2026-03-04 19:10:06 +08:00
Liu An	7715bad04e	refactor: reorganize unit test files into appropriate directories (#13343 ) ### What problem does this PR solve? Move test files from utils/ to their corresponding functional directories: - api/db/ for database related tests - api/utils/ for API utility tests - rag/utils/ for RAG utility tests ### Type of change - [x] Refactoring	2026-03-04 11:02:56 +08:00

6 Commits