Feat: Support get aggregated parsing status to dataset via the API (#13481)

### What problem does this PR solve?

Support getting aggregated parsing status to dataset via the API

Issue: #12810

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: heyang.why <heyang.why@alibaba-inc.com>
This commit is contained in:
Heyang Wang
2026-03-10 18:05:45 +08:00
committed by GitHub
parent 68a623154a
commit 08f83ff331
7 changed files with 654 additions and 309 deletions

View File

@ -835,14 +835,14 @@ Failure:
### List datasets
**GET** `/api/v1/datasets?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}`
**GET** `/api/v1/datasets?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}&include_parsing_status={include_parsing_status}`
Lists datasets.
#### Request
- Method: GET
- URL: `/api/v1/datasets?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}`
- URL: `/api/v1/datasets?page={page}&page_size={page_size}&orderby={orderby}&desc={desc}&name={dataset_name}&id={dataset_id}&include_parsing_status={include_parsing_status}`
- Headers:
- `'Authorization: Bearer <YOUR_API_KEY>'`
@ -854,6 +854,13 @@ curl --request GET \
--header 'Authorization: Bearer <YOUR_API_KEY>'
```
```bash
# List datasets with parsing status
curl --request GET \
--url 'http://{address}/api/v1/datasets?include_parsing_status=true' \
--header 'Authorization: Bearer <YOUR_API_KEY>'
```
##### Request parameters
- `page`: (*Filter parameter*)
@ -870,6 +877,13 @@ curl --request GET \
The name of the dataset to retrieve.
- `id`: (*Filter parameter*)
The ID of the dataset to retrieve.
- `include_parsing_status`: (*Filter parameter*)
Whether to include document parsing status counts in the response. Defaults to `false`. When set to `true`, each dataset object in the response will include the following additional fields:
- `unstart_count`: Number of documents not yet started parsing.
- `running_count`: Number of documents currently being parsed.
- `cancel_count`: Number of documents whose parsing was cancelled.
- `done_count`: Number of documents that have been successfully parsed.
- `fail_count`: Number of documents whose parsing failed.
#### Response
@ -917,6 +931,49 @@ Success:
}
```
Success (with `include_parsing_status=true`):
```json
{
"code": 0,
"data": [
{
"avatar": null,
"cancel_count": 0,
"chunk_count": 30,
"chunk_method": "qa",
"create_date": "2026-03-09T18:57:13",
"create_time": 1773053833094,
"created_by": "928f92a210b911f1ac4cc39e0b8fa3ad",
"description": null,
"document_count": 1,
"done_count": 1,
"embedding_model": "text-embedding-v2@Tongyi-Qianwen",
"fail_count": 0,
"id": "ba6586c21ba611f1a3dc476f0709e75e",
"language": "English",
"name": "Test Dataset",
"parser_config": {
"graphrag": { "use_graphrag": false },
"llm_id": "deepseek-chat@DeepSeek",
"raptor": { "use_raptor": false }
},
"permission": "me",
"running_count": 0,
"similarity_threshold": 0.2,
"status": "1",
"tenant_id": "928f92a210b911f1ac4cc39e0b8fa3ad",
"token_num": 1746,
"unstart_count": 0,
"update_date": "2026-03-09T18:59:32",
"update_time": 1773053972723,
"vector_similarity_weight": 0.3
}
],
"total_datasets": 1
}
```
Failure:
```json