ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-05-21 00:36:43 +08:00

Files

Josh 2d2d3cdbcf Fix document metadata loading for paged listings (#13515 )

## Summary
- scope normal document-list metadata lookups to the current page's
document IDs
- keep the `return_empty_metadata=True` path dataset-wide because it
needs full knowledge of docs that already have metadata
- add unit tests for both paged listing paths and the unchanged
empty-metadata behavior

## Why
`DocumentService.get_list()` and the normal `get_by_kb_id()` path were
calling `DocMetadataService.get_metadata_for_documents(None, kb_id)`,
which loads metadata for the entire dataset on every page request.

That becomes especially problematic on large datasets. The metadata scan
path paginates through the full metadata index without an explicit sort,
while the ES helper only switches to `search_after` beyond `10000`
results when a sort is present. In practice this can lead to unnecessary
full-dataset metadata work, slower document-list loading, and unreliable
`meta_fields` in list responses for large KBs.

This change keeps the existing empty-metadata filter behavior intact,
but scopes normal list responses to metadata for the current page only.

2026-03-11 13:42:16 +08:00

Fix document metadata loading for paged listings (#13515 )

2026-03-11 13:42:16 +08:00

utils

refactor: reorganize unit test files into appropriate directories (#13343 )

2026-03-04 11:02:56 +08:00