ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-05-03 08:47:48 +08:00

Author	SHA1	Message	Date
Attili-sys	24af0875e5	Feat/configurable metadata display (#13464 ) ### What problem does this PR solve? Currently, RAGFlow's Search and Chat interfaces display only raw vectorized text chunks during retrieval, without contextual information about their source documents. Users cannot see document titles, page numbers, upload dates, or custom metadata fields that would help them understand and trust the retrieved results. This PR introduces an optional metadata display feature that enriches retrieved chunks with document-level metadata in both the Search tab and Chatbot interface. Key improvements: - Search results: Display document metadata as styled badges beneath chunk snippets - Chat citations: Show metadata in citation popovers and reference lists for better source context - LLM context: Metadata is injected into the LLM prompt to enable more accurate, citation-aware responses - External API support: Applications using RAGFlow's SDK retrieval endpoints (`/v1/retrieval`, `/v1/searchbots/retrieval_test`) can opt-in via request parameters - User control: Multi-select dropdown UI allows users to choose which metadata fields to display Implementation approach: - ✅ Reuses existing `DocMetadataService` infrastructure (no new database tables or indices) - ✅ Settings stored in existing JSON configuration fields (`search_config.reference_metadata`, `prompt_config.reference_metadata`) - ✅ No database migrations required - ✅ Disabled by default (fully opt-in and backward-compatible) - ✅ Dynamic metadata field selection populated from actual document metadata keys - ✅ Fixed critical bug where Python's builtin `set()` was shadowed by a route handler function Modified endpoints (all backward-compatible): - `POST /v1/retrieval` (Public SDK) - `POST /v1/searchbots/retrieval_test` (Searchbots) - `POST /v1/chunk/retrieval_test` (UI/Internal) - Chat completions endpoints (via `extra_body.reference_metadata` or `prompt_config`) ### Type of change - [x] New Feature (non-breaking change which adds functionality) ###Images - <img width="879" height="1275" alt="image" src="https://github.com/user-attachments/assets/95b2d731-31ae-45a1-b081-bf5893f52aeb" /> <br><br> <br><br> <img width="1532" height="362" alt="image" src="https://github.com/user-attachments/assets/9cebc65b-b7a7-459f-b25e-3b13fa9b638e" /> <br><br> <br><br> <img width="2586" height="1320" alt="image" src="https://github.com/user-attachments/assets/2153d493-d899-461f-a7a9-041391e07776" /> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Attili-sys <Attili-sys@users.noreply.github.com> Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>	2026-04-30 23:13:27 +08:00
euvre	4dcc42e0e1	feat(api): add unified index API and dataset management endpoints (#14222 ) ### What problem does this PR solve? ## Summary Refactor the dataset API layer into a clean service/REST separation pattern, add a unified `/index` API for graph/raptor/mindmap operations, and introduce several new dataset management endpoints with full test coverage. ## Changes ### Service Layer (`dataset_api_service.py`) - Added `trace_index(dataset_id, tenant_id, index_type)` — unified trace function for all index types - Added `run_index`, `delete_index` service functions - Added `get_dataset`, `get_ingestion_summary`, `list_ingestion_logs`, `get_ingestion_log` - Added `run_embedding`, `list_tags`, `aggregate_tags`, `delete_tags`, `rename_tag` - Added `get_flattened_metadata`, `get_auto_metadata`, `update_auto_metadata` ### REST API Layer (`dataset_api.py`) New unified routes: \| Method \| Route \| Description \| \|--------\|-------\|-------------\| \| POST \| `/datasets/<id>/index?type=graph\\|raptor\\|mindmap` \| Run index task \| \| GET \| `/datasets/<id>/index?type=graph\\|raptor\\|mindmap` \| Trace index task \| \| DELETE \| `/datasets/<id>/<index_type>` \| Delete index \| \| GET \| `/datasets/<id>` \| Get dataset details \| \| GET \| `/datasets/<id>/ingestions/summary` \| Ingestion summary \| \| GET \| `/datasets/<id>/ingestions` \| List ingestion logs \| \| GET \| `/datasets/<id>/ingestions/<log_id>` \| Get single ingestion log \| \| POST \| `/datasets/<id>/embedding` \| Run embedding \| \| GET \| `/datasets/<id>/tags` \| List tags \| \| GET \| `/datasets/tags/aggregation` \| Aggregate tags across datasets \| \| DELETE \| `/datasets/<id>/tags` \| Delete tags \| \| PUT \| `/datasets/<id>/tags` \| Rename tag \| \| GET \| `/datasets/metadata/flattened` \| Get flattened metadata \| \| GET/PUT \| `/datasets/<id>/metadata/config` \| New metadata config path \| Removed routes (replaced by unified `/index`): - `POST /datasets/<id>/mindmap` - `GET /datasets/<id>/mindmap` Preserved legacy routes (backward compatibility): - `/run_graphrag`, `/trace_graphrag`, `/run_raptor`, `/trace_raptor` - `/auto_metadata` GET/PUT ### Test Suite - Updated `common.py` helpers: added `trace_index`, removed `run_mindmap`/`trace_mindmap` - Added 7 new test files with 39 test cases total: \| Test File \| Cases \| \|-----------\|-------\| \| `test_get_dataset.py` \| 4 \| \| `test_ingestion_summary.py` \| 2 \| \| `test_ingestion_logs.py` \| 5 \| \| `test_index_api.py` \| 14 \| \| `test_embedding.py` \| 2 \| \| `test_tags.py` \| 8 \| \| `test_flattened_metadata.py` \| 4 \| - Deleted `test_mindmap_tasks.py` (covered by unified index tests) ## Design Decisions 1. Unified `/index?type=...` — single endpoint replaces 3 separate route pairs for graph/raptor/mindmap 2. Backward compatibility — old routes (`/run_graphrag`, `/run_raptor`, `/auto_metadata`) preserved alongside new paths 3. `_VALID_INDEX_TYPES = {"graph", "raptor", "mindmap"}` — input validation via constant set 4. `_INDEX_TYPE_TO_TASK_ID_FIELD` — maps index type to KB model task ID field for clean dispatch ## Files Changed - `api/apps/restful_apis/dataset_api.py` - `api/apps/services/dataset_api_service.py` - `sdk/python/ragflow_sdk/modules/dataset.py` - `test/testcases/test_http_api/common.py` - `test/testcases/test_http_api/test_dataset_management/` (7 new files) ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring --------- Signed-off-by: noob <yixiao121314@outlook.com>	2026-04-27 09:38:01 +08:00
akie	a98b64326c	Add warning log when metadata query hits 10000 result limit (#14109 ) ## What problem does this PR solve? Add a warning log when `get_flatted_meta_by_kbs` returns 10,000 results, which indicates the query limit has been reached and metadata may be silently truncated. ## Type of change - [x] Improvement (non-breaking change which improves observability)	2026-04-14 20:04:32 +08:00
qinling0210	1be07a0a34	Fix "Result window is too large" during meta data search (#13521 ) ### What problem does this PR solve? Fix https://github.com/infiniflow/ragflow/issues/13210#issuecomment-3982878498 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-12 18:59:56 +08:00
qinling0210	1815f5950b	Call get_flatted_meta_by_kbs in dify retrieval (#13509 ) ### What problem does this PR solve? Fix https://github.com/infiniflow/ragflow/issues/13388 Call get_flatted_meta_by_kbs in dify retrieval. Remove get_meta_by_kbs. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-11 13:42:24 +08:00
qinling0210	185ab0d4ef	Fix delete_document_metadata (#13496 ) ### What problem does this PR solve? Avoid getting doc in function delete_document_metadata as the doc might have been removed. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-10 13:44:24 +08:00
qinling0210	8b6d363a98	Use pagination in _search_metadata (#13238 ) ### What problem does this PR solve? Fix [#13210](https://github.com/infiniflow/ragflow/issues/13210) Remove limit in _search_metadata, use pagination in _search_metadata. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-27 11:24:49 +08:00
He Wang	394ff16b66	fix: OceanBase metadata not returned in document list API (#13209 ) ### What problem does this PR solve? Fix #13144. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-25 15:29:17 +08:00
Levi	6d6c54db19	fix(metadata): handle unhashable list values in metadata split (#13116 ) ### What problem does this PR solve? This PR fixes missing metadata on documents synced from the Moodle connector, especially for Book modules. Background: - Moodle Book metadata includes fields like `chapters`, which is a `list[dict]`. - During metadata normalization in `DocMetadataService._split_combined_values`, list deduplication used `dict.fromkeys(...)`. - `dict.fromkeys(...)` fails for unhashable values (like `dict`), causing metadata update to fail. - Result: documents were imported, but metadata was not saved for affected module types (notably Books). What this PR changes: - Replaces hash-based list deduplication with `dedupe_list(...)`, which safely handles unhashable list items while preserving order. - This allows Book metadata (and other complex list metadata) to be persisted correctly. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Contribution during my time at RAGcon GmbH.	2026-02-12 19:48:51 +08:00
qinling0210	205ae769bb	Fix "metadata table not exists" (#12949 ) ### What problem does this PR solve? Fix "metadata table not exists" when updating a meta data. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-03 17:28:10 +08:00
qinling0210	212d6f3660	Fix metadata in get_list() (#12906 ) ### What problem does this PR solve? test_update_document.py failed as metadata is not included in the response of get_list(), fix the issue. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-30 14:06:49 +08:00
qinling0210	9a5208976c	Put document metadata in ES/Infinity (#12826 ) ### What problem does this PR solve? Put document metadata in ES/Infinity. Index name of meta data: ragflow_doc_meta_{tenant_id} ### Type of change - [x] Refactoring	2026-01-28 13:29:34 +08:00

12 Commits