Commit Graph

28 Commits

Author SHA1 Message Date
4ee0702aed Feat: add skills space to context engine (#13908)
### What problem does this PR solve?

issue #13714

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-04-30 12:36:03 +08:00
47129fdd08 Fix: optimize file batch delete (#14473)
### What problem does this PR solve?

optimize file batch delete

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-30 11:00:39 +08:00
6dd38eca6a fix: file logs not displayed in dataset ingestion page (#14479)
### What problem does this PR solve?

## Summary

Fixed a bug where the **File Logs** tab in the dataset ingestion page
always showed "No logs" even after files were parsed successfully.

## Root Cause

Both the **File Logs** and **Dataset Logs** tabs on the frontend called
the same backend endpoint `/datasets/{dataset_id}/ingestions`. However,
the backend only queried `get_dataset_logs_by_kb_id`, which
hard-filtered records by `document_id == GRAPH_RAPTOR_FAKE_DOC_ID`
(dataset-level logs). As a result, real file-level logs were never
returned, causing the table to appear empty.

## Changes

### Backend

- **`api/apps/restful_apis/dataset_api.py`**
  - Added two new query parameters to `list_ingestion_logs`:
    - `log_type` — `"file"` or `"dataset"` (default: `"dataset"`)
    - `keywords` — search keyword for filtering by document / task name

- **`api/apps/services/dataset_api_service.py`**
- Updated `list_ingestion_logs` signature to accept `log_type` and
`keywords`.
  - Added conditional routing:
- When `log_type == "file"`, call
`PipelineOperationLogService.get_file_logs_by_kb_id`
- Otherwise, call
`PipelineOperationLogService.get_dataset_logs_by_kb_id`

- **`api/db/services/pipeline_operation_log_service.py`**
- Extended `get_dataset_logs_by_kb_id` with an optional `keywords`
parameter so dataset logs can also be searched.

### Frontend

- **`web/src/pages/dataset/dataset-overview/hook.ts`**
- Removed the separate API function switching (`listPipelineDatasetLogs`
vs `listDataPipelineLogDocument`).
- Unified both tabs to call `listDataPipelineLogDocument` with the new
`log_type` query parameter (`"file"` or `"dataset"`).
  - Ensured `keywords` and filter values are passed through correctly.

## Behavior After Fix

| Tab | `log_type` | Returned Records | Searchable Field |
|---|---|---|---|
| File Logs | `file` | Real document-level logs | `document_name` (file
name) |
| Dataset Logs | `dataset` | GraphRAG / RAPTOR / MindMap logs |
`document_name` (task type) |
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: noob <yixiao121314@outlook.com>
Co-authored-by: Wang Qi <wangq8@outlook.com>
Co-authored-by: Yingfeng Zhang <yingfeng.zhang@gmail.com>
2026-04-29 22:10:24 +08:00
5018459112 Fix metadata config (#14480)
### What problem does this PR solve?

Fix metadata config

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-29 21:09:54 +08:00
ce933357c6 Fix: Dataset: When configuring the "general chunk method," options such as chunk size and parent-child slicing are unavailable. (#14459)
### What problem does this PR solve?

Fix: Dataset: When configuring the "general chunk method," options such
as chunk size and parent-child slicing are unavailable.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: balibabu <assassin_cike@163.com>
2026-04-29 14:37:48 +08:00
35f6d81b73 Refactor: migrate chunk retrieval_test and knowledge_graph to REST API endpoints (#14402)
### What problem does this PR solve?

## Summary

Migrate two web API endpoints to REST-style HTTP API endpoints,
following the pattern established in #14222:

| Old Endpoint | New Endpoint |
|---|---|
| `POST /v1/chunk/retrieval_test` | `POST
/api/v1/datasets/<dataset_id>/search` |
| `GET /v1/chunk/knowledge_graph` | `GET
/api/v1/datasets/<dataset_id>/graph` |
2026-04-28 20:00:26 +08:00
c81081f8ef Refactor: Doc change parser (#14327)
### What problem does this PR solve?

Before migration
Web API: POST /v1/document/change_parser
HTTP API: PATCH /api/v1/datasets/<dataset_id>/documents

After consolidation, Restful API
PATCH /api/v1/datasets/<dataset_id>/documents

### Type of change

- [x] Refactoring
2026-04-27 23:42:57 +08:00
4dcc42e0e1 feat(api): add unified index API and dataset management endpoints (#14222)
### What problem does this PR solve?

## Summary

Refactor the dataset API layer into a clean service/REST separation
pattern, add a unified `/index` API for graph/raptor/mindmap operations,
and introduce several new dataset management endpoints with full test
coverage.

## Changes

### Service Layer (`dataset_api_service.py`)

- Added `trace_index(dataset_id, tenant_id, index_type)` — unified trace
function for all index types
- Added `run_index`, `delete_index` service functions
- Added `get_dataset`, `get_ingestion_summary`, `list_ingestion_logs`,
`get_ingestion_log`
- Added `run_embedding`, `list_tags`, `aggregate_tags`, `delete_tags`,
`rename_tag`
- Added `get_flattened_metadata`, `get_auto_metadata`,
`update_auto_metadata`

### REST API Layer (`dataset_api.py`)

**New unified routes:**

| Method | Route | Description |
|--------|-------|-------------|
| POST | `/datasets/<id>/index?type=graph\|raptor\|mindmap` | Run index
task |
| GET | `/datasets/<id>/index?type=graph\|raptor\|mindmap` | Trace index
task |
| DELETE | `/datasets/<id>/<index_type>` | Delete index |
| GET | `/datasets/<id>` | Get dataset details |
| GET | `/datasets/<id>/ingestions/summary` | Ingestion summary |
| GET | `/datasets/<id>/ingestions` | List ingestion logs |
| GET | `/datasets/<id>/ingestions/<log_id>` | Get single ingestion log
|
| POST | `/datasets/<id>/embedding` | Run embedding |
| GET | `/datasets/<id>/tags` | List tags |
| GET | `/datasets/tags/aggregation` | Aggregate tags across datasets |
| DELETE | `/datasets/<id>/tags` | Delete tags |
| PUT | `/datasets/<id>/tags` | Rename tag |
| GET | `/datasets/metadata/flattened` | Get flattened metadata |
| GET/PUT | `/datasets/<id>/metadata/config` | New metadata config path
|

**Removed routes (replaced by unified `/index`):**

- `POST /datasets/<id>/mindmap`
- `GET /datasets/<id>/mindmap`

**Preserved legacy routes (backward compatibility):**

- `/run_graphrag`, `/trace_graphrag`, `/run_raptor`, `/trace_raptor`
- `/auto_metadata` GET/PUT

### Test Suite

- Updated `common.py` helpers: added `trace_index`, removed
`run_mindmap`/`trace_mindmap`
- Added 7 new test files with 39 test cases total:

| Test File | Cases |
|-----------|-------|
| `test_get_dataset.py` | 4 |
| `test_ingestion_summary.py` | 2 |
| `test_ingestion_logs.py` | 5 |
| `test_index_api.py` | 14 |
| `test_embedding.py` | 2 |
| `test_tags.py` | 8 |
| `test_flattened_metadata.py` | 4 |

- Deleted `test_mindmap_tasks.py` (covered by unified index tests)

## Design Decisions

1. **Unified `/index?type=...`** — single endpoint replaces 3 separate
route pairs for graph/raptor/mindmap
2. **Backward compatibility** — old routes (`/run_graphrag`,
`/run_raptor`, `/auto_metadata`) preserved alongside new paths
3. **`_VALID_INDEX_TYPES = {"graph", "raptor", "mindmap"}`** — input
validation via constant set
4. **`_INDEX_TYPE_TO_TASK_ID_FIELD`** — maps index type to KB model task
ID field for clean dispatch

## Files Changed

- `api/apps/restful_apis/dataset_api.py`
- `api/apps/services/dataset_api_service.py`
- `sdk/python/ragflow_sdk/modules/dataset.py`
- `test/testcases/test_http_api/common.py`
- `test/testcases/test_http_api/test_dataset_management/` (7 new files)
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring

---------

Signed-off-by: noob <yixiao121314@outlook.com>
2026-04-27 09:38:01 +08:00
61d756e1b5 Fix #14213 create folder does not accept FOLDER (#14276)
### What problem does this PR solve?

As description.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-22 11:55:10 +08:00
939933649a Refactor: Consolidation WEB API & HTTP API for document list_docs (#14176)
### What problem does this PR solve?

Before consolidation
Web API: POST /v1/document/list
Http API - GET /api/v1/datasets/<dataset_id>/documents

After consolidation, Restful API -- GET
/api/v1/datasets/<dataset_id>/documents

### Type of change

- [x] Refactoring
2026-04-20 14:54:40 +08:00
c3387cd5b8 Fix: parent child config (#14199)
### What problem does this PR solve?

Correctly set and display parent-child config in parser_config, and
allow to pass `tenant_id` in PATCH `/api/v1/chats`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-17 23:02:42 +08:00
bc5f78996b Consolidateion of document upload API (#14106)
### What problem does this PR solve?

Consolidation WEB API & HTTP API for document upload

Before consolidation
Web API: POST /v1/document/upload
Http API - POST /api/v1/datasets/<dataset_id>/documents

After consolidation, Restful API -- POST
/api/v1/datasets/<dataset_id>/documents

### Type of change

- [x] Refactoring
2026-04-15 11:27:43 +08:00
57aec2e65d Fix bug: run Knowledge graph or RAPTOR, it will update an existing task (#14102)
### What problem does this PR solve?

It fixed the bug: https://github.com/infiniflow/ragflow/issues/14101
When run Knowledge graph or RAPTOR, the last document running status
will be wrongly set, see below:
It should never touch existing document result.

![Image](https://github.com/user-attachments/assets/14fe1f9e-0541-4093-8111-ed0bd25b87ba)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-14 16:37:41 +08:00
577c96bf2a Refactor: Merge document update API (#13962)
### What problem does this PR solve?

Refactor: merge document.rename into document.update_document

### Type of change

- [x] Refactoring


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added a unified document update API (PUT) supporting name, metadata,
parser/chunk settings, and status changes.

* **Breaking Changes**
* Legacy single-parameter rename endpoint removed; renames now require
dataset + document identifiers.
  * `/list` now reads dataset id from a different query parameter.

* **Validation / Bug Fixes**
* Stricter meta_fields and parser-config validation; unauthenticated
requests return 401.

* **Frontend**
  * UI now sends dataset id when saving document names.

* **Tests**
* Numerous unit and HTTP tests adjusted or removed to match new API and
validations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: MkDev11 <94194147+MkDev11@users.noreply.github.com>
Co-authored-by: mkdev11 <YOUR_GITHUB_ID+MkDev11@users.noreply.github.com>
Co-authored-by: mkdev11 <MkDev11@users.noreply.github.com>
Co-authored-by: Qi Wang <wangq8@outlook.com>
Co-authored-by: dataCenter430 <161712630+dataCenter430@users.noreply.github.com>
Co-authored-by: balibabu <cike8899@users.noreply.github.com>
2026-04-09 11:17:38 +08:00
62a1333cf2 Feat: expose parent-child chunking configuration via HTTP API and Python SDK (#13940)
…
### What problem does this PR solve?

Closes #13857

Parent-child chunking was introduced in v0.23.0 but is only configurable
through the web UI. Users managing datasets programmatically cannot
enable it via the HTTP API or Python SDK because `ParserConfig` uses
`extra="forbid"`, rejecting the `children_delimiter` field at
validation.

### What does this PR change?

Adds a `parent_child` nested config to `ParserConfig`, following the
same pattern as `raptor` and `graphrag`:

```json
"parser_config": {
  "parent_child": {
    "use_parent_child": true,
    "children_delimiter": "\n"
  }
}
```

- api/utils/validation_utils.py — new ParentChildConfig model, added to
ParserConfig
- api/utils/api_utils.py — naive defaults + flatten to
children_delimiter for the execution layer
- api/apps/services/dataset_api_service.py — flatten on the update path
- test/testcases/configs.py — updated DEFAULT_PARSER_CONFIG
-
test/testcases/test_http_api/test_dataset_management/test_create_dataset.py
— 4 valid + 2 invalid test cases

No changes to the execution layer (rag/app/naive.py, rag/nlp/search.py).
Existing UI flow via ext is unaffected.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added parent-child chunking configuration for dataset creation and
updates with new `use_parent_child` toggle and customizable
`children_delimiter` setting to specify how parent chunks are split into
child chunks.

* **Documentation**
* Updated HTTP and Python API references with parent-child chunking
configuration details and examples.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 11:36:57 +08:00
69264b3a70 Feat: Refact pipeline (#13826)
### What problem does this PR solve?

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring

---------

Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-03 19:26:45 +08:00
8d4a3d0dfe Fix: create dataset with chunk_method or pipeline (#13814)
### What problem does this PR solve?

Allow create datasets with parse_type == 1/None and chunk_method, or
parse_type == 2 and pipeline_id.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-26 20:43:53 +08:00
3d10e2075c Refa: files /file API to RESTFul style (#13741)
### What problem does this PR solve?

Files /file API to RESTFul style.

### Type of change

- [x] Documentation Update
- [x] Refactoring

---------

Co-authored-by: writinwaters <cai.keith@gmail.com>
Co-authored-by: Liu An <asiro@qq.com>
2026-03-24 19:24:41 +08:00
4bb1acaa5b Refactor: dataset / kb API to RESTFul style (#13690)
### What problem does this PR solve?

1. Split dataset api to gateway and service, and modify web UI to use
restful http api.
2. Old KB releated APIs are commented.

### Type of change

- [x] Refactoring

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-19 14:41:36 +08:00
986dcf1cc8 Revert "Refactor: dataset / kb API to RESTFul style" (#13646)
Reverts infiniflow/ragflow#13619
2026-03-17 12:09:48 +08:00
1db5409d82 Refactor: dataset / kb API to RESTFul style (#13619)
### What problem does this PR solve?

1. Split dataset api to gateway and service, and modify web UI to use
restful http api.
2. Old KB releated APIs are commented.

### Type of change

- [x] Refactoring
2026-03-16 22:51:34 +08:00
a2d72202cf Revert "Refactor dataset / kb API to RESTFul style" (#13614)
Reverts infiniflow/ragflow#13263
2026-03-16 10:44:38 +08:00
7c32e206be Refactor dataset / kb API to RESTFul style (#13263)
### What problem does this PR solve?

1. Split dataset api to gateway and service, and modify web UI to use
restful http api.
2. Old KB releated APIs are commented.

### Type of change

- [x] Refactoring
2026-03-13 20:02:35 +08:00
02070bab2a Feat: record user_id in memory (#13585)
### What problem does this PR solve?

Get user_id from canvas and record it.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-13 15:38:35 +08:00
62cb292635 Feat/tenant model (#13072)
### What problem does this PR solve?

Add id for table tenant_llm and apply in LLMBundle.

### Type of change

- [x] Refactoring

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
Co-authored-by: Liu An <asiro@qq.com>
2026-03-05 17:27:17 +08:00
1027916bfe Fix: inconsistent state handling for multi-user single-canvas access (#13267)
### What problem does this PR solve?

<img width="700" alt="image"
src="https://github.com/user-attachments/assets/1db7412e-4554-44bc-84ba-16421949aacc"
/>

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-02-28 15:09:21 +08:00
6e7bcf58bc Refactor: split message apis to gateway and service (#13126)
### What problem does this PR solve?

Split message apis to gateway and service

### Type of change

- [x] Refactoring
2026-02-12 14:43:52 +08:00
30d5fc1a07 Refactor: split memory API into gateway and service layers (#13111)
### What problem does this PR solve?

Decouple the memory API into a gateway layer (for routing/param parse)
and a service layer (for business logic).

### Type of change

- [x] Refactoring
2026-02-12 10:11:50 +08:00