mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-04-28 14:27:49 +08:00
Feat: Support get aggregated parsing status to dataset via the API (#13481)
### What problem does this PR solve? Support getting aggregated parsing status to dataset via the API Issue: #12810 ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: heyang.why <heyang.why@alibaba-inc.com>
This commit is contained in:
@ -835,14 +835,14 @@ Failure:
|
||||
|
||||
### List datasets
|
||||
|
||||
**GET** `/api/v1/datasets?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}`
|
||||
**GET** `/api/v1/datasets?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}&include_parsing_status={include_parsing_status}`
|
||||
|
||||
Lists datasets.
|
||||
|
||||
#### Request
|
||||
|
||||
- Method: GET
|
||||
- URL: `/api/v1/datasets?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}`
|
||||
- URL: `/api/v1/datasets?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}&include_parsing_status={include_parsing_status}`
|
||||
- Headers:
|
||||
- `'Authorization: Bearer <YOUR_API_KEY>'`
|
||||
|
||||
@ -854,6 +854,13 @@ curl --request GET \
|
||||
--header 'Authorization: Bearer <YOUR_API_KEY>'
|
||||
```
|
||||
|
||||
```bash
|
||||
# List datasets with parsing status
|
||||
curl --request GET \
|
||||
--url 'http://{address}/api/v1/datasets?include_parsing_status=true' \
|
||||
--header 'Authorization: Bearer <YOUR_API_KEY>'
|
||||
```
|
||||
|
||||
##### Request parameters
|
||||
|
||||
- `page`: (*Filter parameter*)
|
||||
@ -870,6 +877,13 @@ curl --request GET \
|
||||
The name of the dataset to retrieve.
|
||||
- `id`: (*Filter parameter*)
|
||||
The ID of the dataset to retrieve.
|
||||
- `include_parsing_status`: (*Filter parameter*)
|
||||
Whether to include document parsing status counts in the response. Defaults to `false`. When set to `true`, each dataset object in the response will include the following additional fields:
|
||||
- `unstart_count`: Number of documents not yet started parsing.
|
||||
- `running_count`: Number of documents currently being parsed.
|
||||
- `cancel_count`: Number of documents whose parsing was cancelled.
|
||||
- `done_count`: Number of documents that have been successfully parsed.
|
||||
- `fail_count`: Number of documents whose parsing failed.
|
||||
|
||||
#### Response
|
||||
|
||||
@ -917,6 +931,49 @@ Success:
|
||||
}
|
||||
```
|
||||
|
||||
Success (with `include_parsing_status=true`):
|
||||
|
||||
```json
|
||||
{
|
||||
"code": 0,
|
||||
"data": [
|
||||
{
|
||||
"avatar": null,
|
||||
"cancel_count": 0,
|
||||
"chunk_count": 30,
|
||||
"chunk_method": "qa",
|
||||
"create_date": "2026-03-09T18:57:13",
|
||||
"create_time": 1773053833094,
|
||||
"created_by": "928f92a210b911f1ac4cc39e0b8fa3ad",
|
||||
"description": null,
|
||||
"document_count": 1,
|
||||
"done_count": 1,
|
||||
"embedding_model": "text-embedding-v2@Tongyi-Qianwen",
|
||||
"fail_count": 0,
|
||||
"id": "ba6586c21ba611f1a3dc476f0709e75e",
|
||||
"language": "English",
|
||||
"name": "Test Dataset",
|
||||
"parser_config": {
|
||||
"graphrag": { "use_graphrag": false },
|
||||
"llm_id": "deepseek-chat@DeepSeek",
|
||||
"raptor": { "use_raptor": false }
|
||||
},
|
||||
"permission": "me",
|
||||
"running_count": 0,
|
||||
"similarity_threshold": 0.2,
|
||||
"status": "1",
|
||||
"tenant_id": "928f92a210b911f1ac4cc39e0b8fa3ad",
|
||||
"token_num": 1746,
|
||||
"unstart_count": 0,
|
||||
"update_date": "2026-03-09T18:59:32",
|
||||
"update_time": 1773053972723,
|
||||
"vector_similarity_weight": 0.3
|
||||
}
|
||||
],
|
||||
"total_datasets": 1
|
||||
}
|
||||
```
|
||||
|
||||
Failure:
|
||||
|
||||
```json
|
||||
|
||||
@ -266,7 +266,8 @@ RAGFlow.list_datasets(
|
||||
orderby: str = "create_time",
|
||||
desc: bool = True,
|
||||
id: str = None,
|
||||
name: str = None
|
||||
name: str = None,
|
||||
include_parsing_status: bool = False
|
||||
) -> list[DataSet]
|
||||
```
|
||||
|
||||
@ -301,6 +302,16 @@ The ID of the dataset to retrieve. Defaults to `None`.
|
||||
|
||||
The name of the dataset to retrieve. Defaults to `None`.
|
||||
|
||||
##### include_parsing_status: `bool`
|
||||
|
||||
Whether to include document parsing status counts in each returned `DataSet` object. Defaults to `False`. When set to `True`, each `DataSet` object will include the following additional attributes:
|
||||
|
||||
- `unstart_count`: `int` Number of documents not yet started parsing.
|
||||
- `running_count`: `int` Number of documents currently being parsed.
|
||||
- `cancel_count`: `int` Number of documents whose parsing was cancelled.
|
||||
- `done_count`: `int` Number of documents that have been successfully parsed.
|
||||
- `fail_count`: `int` Number of documents whose parsing failed.
|
||||
|
||||
#### Returns
|
||||
|
||||
- Success: A list of `DataSet` objects.
|
||||
@ -322,6 +333,13 @@ dataset = rag_object.list_datasets(id = "id_1")
|
||||
print(dataset[0])
|
||||
```
|
||||
|
||||
##### List datasets with parsing status
|
||||
|
||||
```python
|
||||
for dataset in rag_object.list_datasets(include_parsing_status=True):
|
||||
print(dataset.done_count, dataset.fail_count, dataset.running_count)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### Update dataset
|
||||
|
||||
Reference in New Issue
Block a user