ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-05-22 17:08:23 +08:00

Author	SHA1	Message	Date
Yingfeng	4ee0702aed	Feat: add skills space to context engine (#13908 ) ### What problem does this PR solve? issue #13714 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-04-30 12:36:03 +08:00
buua436	47129fdd08	Fix: optimize file batch delete (#14473 ) ### What problem does this PR solve? optimize file batch delete ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-30 11:00:39 +08:00
euvre	6dd38eca6a	fix: file logs not displayed in dataset ingestion page (#14479 ) ### What problem does this PR solve? ## Summary Fixed a bug where the File Logs tab in the dataset ingestion page always showed "No logs" even after files were parsed successfully. ## Root Cause Both the File Logs and Dataset Logs tabs on the frontend called the same backend endpoint `/datasets/{dataset_id}/ingestions`. However, the backend only queried `get_dataset_logs_by_kb_id`, which hard-filtered records by `document_id == GRAPH_RAPTOR_FAKE_DOC_ID` (dataset-level logs). As a result, real file-level logs were never returned, causing the table to appear empty. ## Changes ### Backend - `api/apps/restful_apis/dataset_api.py` - Added two new query parameters to `list_ingestion_logs`: - `log_type` — `"file"` or `"dataset"` (default: `"dataset"`) - `keywords` — search keyword for filtering by document / task name - `api/apps/services/dataset_api_service.py` - Updated `list_ingestion_logs` signature to accept `log_type` and `keywords`. - Added conditional routing: - When `log_type == "file"`, call `PipelineOperationLogService.get_file_logs_by_kb_id` - Otherwise, call `PipelineOperationLogService.get_dataset_logs_by_kb_id` - `api/db/services/pipeline_operation_log_service.py` - Extended `get_dataset_logs_by_kb_id` with an optional `keywords` parameter so dataset logs can also be searched. ### Frontend - `web/src/pages/dataset/dataset-overview/hook.ts` - Removed the separate API function switching (`listPipelineDatasetLogs` vs `listDataPipelineLogDocument`). - Unified both tabs to call `listDataPipelineLogDocument` with the new `log_type` query parameter (`"file"` or `"dataset"`). - Ensured `keywords` and filter values are passed through correctly. ## Behavior After Fix \| Tab \| `log_type` \| Returned Records \| Searchable Field \| \|---\|---\|---\|---\| \| File Logs \| `file` \| Real document-level logs \| `document_name` (file name) \| \| Dataset Logs \| `dataset` \| GraphRAG / RAPTOR / MindMap logs \| `document_name` (task type) \| ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: noob <yixiao121314@outlook.com> Co-authored-by: Wang Qi <wangq8@outlook.com> Co-authored-by: Yingfeng Zhang <yingfeng.zhang@gmail.com>	2026-04-29 22:10:24 +08:00
Wang Qi	5018459112	Fix metadata config (#14480 ) ### What problem does this PR solve? Fix metadata config ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-29 21:09:54 +08:00
balibabu	ce933357c6	Fix: Dataset: When configuring the "general chunk method," options such as chunk size and parent-child slicing are unavailable. (#14459 ) ### What problem does this PR solve? Fix: Dataset: When configuring the "general chunk method," options such as chunk size and parent-child slicing are unavailable. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: balibabu <assassin_cike@163.com>	2026-04-29 14:37:48 +08:00
euvre	35f6d81b73	Refactor: migrate chunk retrieval_test and knowledge_graph to REST API endpoints (#14402 ) ### What problem does this PR solve? ## Summary Migrate two web API endpoints to REST-style HTTP API endpoints, following the pattern established in #14222: \| Old Endpoint \| New Endpoint \| \|---\|---\| \| `POST /v1/chunk/retrieval_test` \| `POST /api/v1/datasets/<dataset_id>/search` \| \| `GET /v1/chunk/knowledge_graph` \| `GET /api/v1/datasets/<dataset_id>/graph` \|	2026-04-28 20:00:26 +08:00
Jack	c81081f8ef	Refactor: Doc change parser (#14327 ) ### What problem does this PR solve? Before migration Web API: POST /v1/document/change_parser HTTP API: PATCH /api/v1/datasets/<dataset_id>/documents After consolidation, Restful API PATCH /api/v1/datasets/<dataset_id>/documents ### Type of change - [x] Refactoring	2026-04-27 23:42:57 +08:00
euvre	4dcc42e0e1	feat(api): add unified index API and dataset management endpoints (#14222 ) ### What problem does this PR solve? ## Summary Refactor the dataset API layer into a clean service/REST separation pattern, add a unified `/index` API for graph/raptor/mindmap operations, and introduce several new dataset management endpoints with full test coverage. ## Changes ### Service Layer (`dataset_api_service.py`) - Added `trace_index(dataset_id, tenant_id, index_type)` — unified trace function for all index types - Added `run_index`, `delete_index` service functions - Added `get_dataset`, `get_ingestion_summary`, `list_ingestion_logs`, `get_ingestion_log` - Added `run_embedding`, `list_tags`, `aggregate_tags`, `delete_tags`, `rename_tag` - Added `get_flattened_metadata`, `get_auto_metadata`, `update_auto_metadata` ### REST API Layer (`dataset_api.py`) New unified routes: \| Method \| Route \| Description \| \|--------\|-------\|-------------\| \| POST \| `/datasets/<id>/index?type=graph\\|raptor\\|mindmap` \| Run index task \| \| GET \| `/datasets/<id>/index?type=graph\\|raptor\\|mindmap` \| Trace index task \| \| DELETE \| `/datasets/<id>/<index_type>` \| Delete index \| \| GET \| `/datasets/<id>` \| Get dataset details \| \| GET \| `/datasets/<id>/ingestions/summary` \| Ingestion summary \| \| GET \| `/datasets/<id>/ingestions` \| List ingestion logs \| \| GET \| `/datasets/<id>/ingestions/<log_id>` \| Get single ingestion log \| \| POST \| `/datasets/<id>/embedding` \| Run embedding \| \| GET \| `/datasets/<id>/tags` \| List tags \| \| GET \| `/datasets/tags/aggregation` \| Aggregate tags across datasets \| \| DELETE \| `/datasets/<id>/tags` \| Delete tags \| \| PUT \| `/datasets/<id>/tags` \| Rename tag \| \| GET \| `/datasets/metadata/flattened` \| Get flattened metadata \| \| GET/PUT \| `/datasets/<id>/metadata/config` \| New metadata config path \| Removed routes (replaced by unified `/index`): - `POST /datasets/<id>/mindmap` - `GET /datasets/<id>/mindmap` Preserved legacy routes (backward compatibility): - `/run_graphrag`, `/trace_graphrag`, `/run_raptor`, `/trace_raptor` - `/auto_metadata` GET/PUT ### Test Suite - Updated `common.py` helpers: added `trace_index`, removed `run_mindmap`/`trace_mindmap` - Added 7 new test files with 39 test cases total: \| Test File \| Cases \| \|-----------\|-------\| \| `test_get_dataset.py` \| 4 \| \| `test_ingestion_summary.py` \| 2 \| \| `test_ingestion_logs.py` \| 5 \| \| `test_index_api.py` \| 14 \| \| `test_embedding.py` \| 2 \| \| `test_tags.py` \| 8 \| \| `test_flattened_metadata.py` \| 4 \| - Deleted `test_mindmap_tasks.py` (covered by unified index tests) ## Design Decisions 1. Unified `/index?type=...` — single endpoint replaces 3 separate route pairs for graph/raptor/mindmap 2. Backward compatibility — old routes (`/run_graphrag`, `/run_raptor`, `/auto_metadata`) preserved alongside new paths 3. `_VALID_INDEX_TYPES = {"graph", "raptor", "mindmap"}` — input validation via constant set 4. `_INDEX_TYPE_TO_TASK_ID_FIELD` — maps index type to KB model task ID field for clean dispatch ## Files Changed - `api/apps/restful_apis/dataset_api.py` - `api/apps/services/dataset_api_service.py` - `sdk/python/ragflow_sdk/modules/dataset.py` - `test/testcases/test_http_api/common.py` - `test/testcases/test_http_api/test_dataset_management/` (7 new files) ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring --------- Signed-off-by: noob <yixiao121314@outlook.com>	2026-04-27 09:38:01 +08:00
Wang Qi	61d756e1b5	Fix #14213 create folder does not accept FOLDER (#14276 ) ### What problem does this PR solve? As description. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-22 11:55:10 +08:00
Jack	939933649a	Refactor: Consolidation WEB API & HTTP API for document list_docs (#14176 ) ### What problem does this PR solve? Before consolidation Web API: POST /v1/document/list Http API - GET /api/v1/datasets/<dataset_id>/documents After consolidation, Restful API -- GET /api/v1/datasets/<dataset_id>/documents ### Type of change - [x] Refactoring	2026-04-20 14:54:40 +08:00
Lynn	c3387cd5b8	Fix: parent child config (#14199 ) ### What problem does this PR solve? Correctly set and display parent-child config in parser_config, and allow to pass `tenant_id` in PATCH `/api/v1/chats`. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-17 23:02:42 +08:00
Jack	bc5f78996b	Consolidateion of document upload API (#14106 ) ### What problem does this PR solve? Consolidation WEB API & HTTP API for document upload Before consolidation Web API: POST /v1/document/upload Http API - POST /api/v1/datasets/<dataset_id>/documents After consolidation, Restful API -- POST /api/v1/datasets/<dataset_id>/documents ### Type of change - [x] Refactoring	2026-04-15 11:27:43 +08:00
Qi Wang	57aec2e65d	Fix bug: run Knowledge graph or RAPTOR, it will update an existing task (#14102 ) ### What problem does this PR solve? It fixed the bug: https://github.com/infiniflow/ragflow/issues/14101 When run Knowledge graph or RAPTOR, the last document running status will be wrongly set, see below: It should never touch existing document result. ![Image](https://github.com/user-attachments/assets/14fe1f9e-0541-4093-8111-ed0bd25b87ba) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-04-14 16:37:41 +08:00
Jack	577c96bf2a	Refactor: Merge document update API (#13962 ) ### What problem does this PR solve? Refactor: merge document.rename into document.update_document ### Type of change - [x] Refactoring <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Added a unified document update API (PUT) supporting name, metadata, parser/chunk settings, and status changes. * Breaking Changes * Legacy single-parameter rename endpoint removed; renames now require dataset + document identifiers. * `/list` now reads dataset id from a different query parameter. * Validation / Bug Fixes * Stricter meta_fields and parser-config validation; unauthenticated requests return 401. * Frontend * UI now sends dataset id when saving document names. * Tests * Numerous unit and HTTP tests adjusted or removed to match new API and validations. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: MkDev11 <94194147+MkDev11@users.noreply.github.com> Co-authored-by: mkdev11 <YOUR_GITHUB_ID+MkDev11@users.noreply.github.com> Co-authored-by: mkdev11 <MkDev11@users.noreply.github.com> Co-authored-by: Qi Wang <wangq8@outlook.com> Co-authored-by: dataCenter430 <161712630+dataCenter430@users.noreply.github.com> Co-authored-by: balibabu <cike8899@users.noreply.github.com>	2026-04-09 11:17:38 +08:00
dataCenter430	62a1333cf2	Feat: expose parent-child chunking configuration via HTTP API and Python SDK (#13940 ) … ### What problem does this PR solve? Closes #13857 Parent-child chunking was introduced in v0.23.0 but is only configurable through the web UI. Users managing datasets programmatically cannot enable it via the HTTP API or Python SDK because `ParserConfig` uses `extra="forbid"`, rejecting the `children_delimiter` field at validation. ### What does this PR change? Adds a `parent_child` nested config to `ParserConfig`, following the same pattern as `raptor` and `graphrag`: ```json "parser_config": { "parent_child": { "use_parent_child": true, "children_delimiter": "\n" } } ``` - api/utils/validation_utils.py — new ParentChildConfig model, added to ParserConfig - api/utils/api_utils.py — naive defaults + flatten to children_delimiter for the execution layer - api/apps/services/dataset_api_service.py — flatten on the update path - test/testcases/configs.py — updated DEFAULT_PARSER_CONFIG - test/testcases/test_http_api/test_dataset_management/test_create_dataset.py — 4 valid + 2 invalid test cases No changes to the execution layer (rag/app/naive.py, rag/nlp/search.py). Existing UI flow via ext is unaffected. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * New Features * Added parent-child chunking configuration for dataset creation and updates with new `use_parent_child` toggle and customizable `children_delimiter` setting to specify how parent chunks are split into child chunks. * Documentation * Updated HTTP and Python API references with parent-child chunking configuration details and examples. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-04-08 11:36:57 +08:00
Magicbook1108	69264b3a70	Feat: Refact pipeline (#13826 ) ### What problem does this PR solve? ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring --------- Co-authored-by: Zhichang Yu <yuzhichang@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-03 19:26:45 +08:00
Lynn	8d4a3d0dfe	Fix: create dataset with chunk_method or pipeline (#13814 ) ### What problem does this PR solve? Allow create datasets with parse_type == 1/None and chunk_method, or parse_type == 2 and pipeline_id. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-26 20:43:53 +08:00
Yongteng Lei	3d10e2075c	Refa: files /file API to RESTFul style (#13741 ) ### What problem does this PR solve? Files /file API to RESTFul style. ### Type of change - [x] Documentation Update - [x] Refactoring --------- Co-authored-by: writinwaters <cai.keith@gmail.com> Co-authored-by: Liu An <asiro@qq.com>	2026-03-24 19:24:41 +08:00
Lynn	4bb1acaa5b	Refactor: dataset / kb API to RESTFul style (#13690 ) ### What problem does this PR solve? 1. Split dataset api to gateway and service, and modify web UI to use restful http api. 2. Old KB releated APIs are commented. ### Type of change - [x] Refactoring --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-03-19 14:41:36 +08:00
Jin Hai	986dcf1cc8	Revert "Refactor: dataset / kb API to RESTFul style" (#13646 ) Reverts infiniflow/ragflow#13619	2026-03-17 12:09:48 +08:00
Lynn	1db5409d82	Refactor: dataset / kb API to RESTFul style (#13619 ) ### What problem does this PR solve? 1. Split dataset api to gateway and service, and modify web UI to use restful http api. 2. Old KB releated APIs are commented. ### Type of change - [x] Refactoring	2026-03-16 22:51:34 +08:00
Jin Hai	a2d72202cf	Revert "Refactor dataset / kb API to RESTFul style" (#13614 ) Reverts infiniflow/ragflow#13263	2026-03-16 10:44:38 +08:00
Lynn	7c32e206be	Refactor dataset / kb API to RESTFul style (#13263 ) ### What problem does this PR solve? 1. Split dataset api to gateway and service, and modify web UI to use restful http api. 2. Old KB releated APIs are commented. ### Type of change - [x] Refactoring	2026-03-13 20:02:35 +08:00
Lynn	02070bab2a	Feat: record user_id in memory (#13585 ) ### What problem does this PR solve? Get user_id from canvas and record it. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-13 15:38:35 +08:00
Lynn	62cb292635	Feat/tenant model (#13072 ) ### What problem does this PR solve? Add id for table tenant_llm and apply in LLMBundle. ### Type of change - [x] Refactoring --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com> Co-authored-by: Liu An <asiro@qq.com>	2026-03-05 17:27:17 +08:00
Magicbook1108	1027916bfe	Fix: inconsistent state handling for multi-user single-canvas access (#13267 ) ### What problem does this PR solve? <img width="700" alt="image" src="https://github.com/user-attachments/assets/1db7412e-4554-44bc-84ba-16421949aacc" /> ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-02-28 15:09:21 +08:00
Lynn	6e7bcf58bc	Refactor: split message apis to gateway and service (#13126 ) ### What problem does this PR solve? Split message apis to gateway and service ### Type of change - [x] Refactoring	2026-02-12 14:43:52 +08:00
Lynn	30d5fc1a07	Refactor: split memory API into gateway and service layers (#13111 ) ### What problem does this PR solve? Decouple the memory API into a gateway layer (for routing/param parse) and a service layer (for business logic). ### Type of change - [x] Refactoring	2026-02-12 10:11:50 +08:00

28 Commits