Feat: Support get aggregated parsing status to dataset via the API (#13481)

### What problem does this PR solve? Support getting aggregated parsing status to dataset via the API Issue: #12810 ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: heyang.why <heyang.why@alibaba-inc.com>
2026-07-14 07:16:41 +08:00 · 2026-03-10 18:05:45 +08:00
parent 68a623154a
commit 08f83ff331
7 changed files with 654 additions and 309 deletions
--- a/docs/references/python_api_reference.md
+++ b/docs/references/python_api_reference.md
@ -266,7 +266,8 @@ RAGFlow.list_datasets(
    orderby: str = "create_time", 
    desc: bool = True,
    id: str = None,
-    name: str = None
+    name: str = None,
+    include_parsing_status: bool = False
 ) -> list[DataSet]
 ```

@ -301,6 +302,16 @@ The ID of the dataset to retrieve. Defaults to `None`.

 The name of the dataset to retrieve. Defaults to `None`.

+##### include_parsing_status: `bool`
+
+Whether to include document parsing status counts in each returned `DataSet` object. Defaults to `False`. When set to `True`, each `DataSet` object will include the following additional attributes:
+
+- `unstart_count`: `int` Number of documents not yet started parsing.
+- `running_count`: `int` Number of documents currently being parsed.
+- `cancel_count`: `int` Number of documents whose parsing was cancelled.
+- `done_count`: `int` Number of documents that have been successfully parsed.
+- `fail_count`: `int` Number of documents whose parsing failed.
+
 #### Returns

 - Success: A list of `DataSet` objects.
@ -322,6 +333,13 @@ dataset = rag_object.list_datasets(id = "id_1")
 print(dataset[0])
 ```

+##### List datasets with parsing status
+
+```python
+for dataset in rag_object.list_datasets(include_parsing_status=True):
+    print(dataset.done_count, dataset.fail_count, dataset.running_count)
+```
+
 ---

 ### Update dataset