ragflow/docs/references/python_api_reference.md

---
sidebar_position: 5
slug: /python_api_reference
sidebar_custom_props: {
  categoryIcon: SiPython
}
---
# Python API

A complete reference for RAGFlow's Python APIs. Before proceeding, please ensure you [have your RAGFlow API key ready for authentication](https://ragflow.io/docs/dev/acquire_ragflow_api_key).

:::tip NOTE
Run the following command to download the Python SDK:

```bash
pip install ragflow-sdk
```

:::

---

## ERROR CODES

---

| Code | Message               | Description                |
|------|-----------------------|----------------------------|
| 400  | Bad Request           | Invalid request parameters |
| 401  | Unauthorized          | Unauthorized access        |
| 403  | Forbidden             | Access denied              |
| 404  | Not Found             | Resource not found         |
| 500  | Internal Server Error | Server internal error      |
| 1001 | Invalid Chunk ID      | Invalid Chunk ID           |
| 1002 | Chunk Update Failed   | Chunk update failed        |

---

## OpenAI-Compatible API

---

### Create chat completion

Creates a model response for the given historical chat conversation via OpenAI's API.

#### Parameters

##### chat_id: `string`, *Required*

Existing chat assistant ID. This value is part of the request path: `/api/v1/openai/<chat_id>/chat/completions`.

##### model: `string`, *Required*

The model used to generate the response. You may also use the legacy placeholder value `"model"` to keep using the chat assistant's configured model.

##### messages: `list[object]`, *Required*

A list of historical chat messages used to generate the response. This must contain at least one message with the `user` role.

##### stream: `boolean`

Whether to receive the response as a stream. Set this to `false` explicitly if you prefer to receive the entire response in one go instead of as a stream.

#### Returns

- Success: Response [message](https://platform.openai.com/docs/api-reference/chat/create) like OpenAI
- Failure: `Exception`

#### Examples

```python
from openai import OpenAI
import json

model = "glm-4-flash@ZHIPU-AI"
client = OpenAI(api_key="ragflow-api-key", base_url="http://ragflow_address/api/v1/openai/<chat_id>/chat")

stream = True
reference = True

request_kwargs = dict(
    model=model,
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who are you?"},
        {"role": "assistant", "content": "I am an AI assistant named..."},
        {"role": "user", "content": "Can you tell me how to install neovim"},
    ],
    extra_body={
        "reference": reference,
        "reference_metadata": {
            "include": True,
            "fields": ["author", "year", "source"],
        },
    },
)

if stream:
    completion = client.chat.completions.create(stream=True, **request_kwargs)
    for chunk in completion:
        print(chunk)
else:
    resp = client.chat.completions.with_raw_response.create(
        stream=False, **request_kwargs
    )
    print("status:", resp.http_response.status_code)
    raw_text = resp.http_response.text
    print("raw:", raw_text)

    data = json.loads(raw_text)
    print("assistant:", data["choices"][0]["message"].get("content"))
    print("reference:", data["choices"][0]["message"].get("reference"))
```

When `extra_body.reference` is `true`, the streamed final chunk may include `choices[0].delta.reference`, and the non-stream response may include `choices[0].message.reference`.

When `extra_body.reference_metadata.include` is `true`, each reference chunk may include a `document_metadata` object in both streaming and non-streaming responses.

## DATASET MANAGEMENT

---

### Create dataset

```python
RAGFlow.create_dataset(
    name: str,
    avatar: Optional[str] = None,
    description: Optional[str] = None,
    embedding_model: Optional[str] = "BAAI/bge-large-zh-v1.5@BAAI",
    permission: str = "me",
    chunk_method: str = "naive",
    parser_config: DataSet.ParserConfig = None
) -> DataSet
```

Creates a dataset.

#### Parameters

##### name: `string`, *Required*

The unique name of the dataset to create. It must adhere to the following requirements:

- Maximum 128 characters.
- Case-insensitive.

##### avatar: `string`

Base64 encoding of the avatar. Defaults to `None`

##### description: `string`

A brief description of the dataset to create. Defaults to `None`.


##### permission

Specifies who can access the dataset to create. Available options:

- `"me"`: (Default) Only you can manage the dataset.
- `"team"`: All team members can manage the dataset.

##### chunk_method, `string`

The chunking method of the dataset to create. Available options:

- `"naive"`: General (default)
- `"manual`: Manual
- `"qa"`: Q&A
- `"table"`: Table
- `"paper"`: Paper
- `"book"`: Book
- `"laws"`: Laws
- `"presentation"`: Presentation
- `"picture"`: Picture
- `"one"`: One
- `"email"`: Email

##### parser_config

The parser configuration of the dataset. A `ParserConfig` object's attributes vary based on the selected `chunk_method`:

- `chunk_method`=`"naive"`:
  `{"chunk_token_num":512,"delimiter":"\\n","html4excel":False,"layout_recognize":True,"raptor":{"use_raptor":False},"parent_child":{"use_parent_child":False,"children_delimiter":"\\n"}}`.
- `chunk_method`=`"qa"`:
  `{"raptor": {"use_raptor": False}}`
- `chunk_method`=`"manuel"`:
  `{"raptor": {"use_raptor": False}}`
- `chunk_method`=`"table"`:
  `None`
- `chunk_method`=`"paper"`:
  `{"raptor": {"use_raptor": False}}`
- `chunk_method`=`"book"`:
  `{"raptor": {"use_raptor": False}}`
- `chunk_method`=`"laws"`:
  `{"raptor": {"use_raptor": False}}`
- `chunk_method`=`"picture"`:
  `None`
- `chunk_method`=`"presentation"`:
  `{"raptor": {"use_raptor": False}}`
- `chunk_method`=`"one"`:
  `None`
- `chunk_method`=`"knowledge-graph"`:
  `{"chunk_token_num":128,"delimiter":"\\n","entity_types":["organization","person","location","event","time"]}`
- `chunk_method`=`"email"`:
  `None`

#### Returns

- Success: A `dataset` object.
- Failure: `Exception`

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="kb_1")
```

---

### Delete datasets

```python
RAGFlow.delete_datasets(ids: list[str] | None = None, delete_all: bool = False)
```

Deletes datasets by ID.

#### Parameters

##### ids: `list[str]` or `None`

The IDs of the datasets to delete. Defaults to `None`.

- If omitted, or set to `null` or an empty array, no datasets are deleted.
- If an array of IDs is provided, only the datasets matching those IDs are deleted.

##### delete_all: `bool`

Whether to delete all datasets owned by the current user when `ids` is omitted, or set to `None` or an empty list. Defaults to `False`.

#### Returns

- Success: No value is returned.
- Failure: `Exception`

#### Examples

```python
rag_object.delete_datasets(ids=["d94a8dc02c9711f0930f7fbc369eab6d","e94a8dc02c9711f0930f7fbc369eab6e"])
rag_object.delete_datasets(delete_all=True)
```

---

### List datasets

```python
RAGFlow.list_datasets(
    page: int = 1,
    page_size: int = 30,
    orderby: str = "create_time",
    desc: bool = True,
    id: str = None,
    name: str = None,
    include_parsing_status: bool = False
) -> list[DataSet]
```

Lists datasets.

#### Parameters

##### page: `int`

Specifies the page on which the datasets will be displayed. Defaults to `1`.

##### page_size: `int`

The number of datasets on each page. Defaults to `30`.

##### orderby: `string`

The field by which datasets should be sorted. Available options:

- `"create_time"` (default)
- `"update_time"`

##### desc: `bool`

Indicates whether the retrieved datasets should be sorted in descending order. Defaults to `True`.

##### id: `string`

The ID of the dataset to retrieve. Defaults to `None`.

##### name: `string`

The name of the dataset to retrieve. Defaults to `None`.

##### include_parsing_status: `bool`

Whether to include document parsing status counts in each returned `DataSet` object. Defaults to `False`. When set to `True`, each `DataSet` object will include the following additional attributes:

- `unstart_count`: `int` Number of documents not yet started parsing.
- `running_count`: `int` Number of documents currently being parsed.
- `cancel_count`: `int` Number of documents whose parsing was cancelled.
- `done_count`: `int` Number of documents that have been successfully parsed.
- `fail_count`: `int` Number of documents whose parsing failed.

#### Returns

- Success: A list of `DataSet` objects.
- Failure: `Exception`.

#### Examples

##### List all datasets

```python
for dataset in rag_object.list_datasets():
    print(dataset)
```

##### Retrieve a dataset by ID

```python
dataset = rag_object.list_datasets(id = "id_1")
print(dataset[0])
```

##### List datasets with parsing status

```python
for dataset in rag_object.list_datasets(include_parsing_status=True):
    print(dataset.done_count, dataset.fail_count, dataset.running_count)
```

---

### Update dataset

```python
DataSet.update(update_message: dict)
```

Updates configurations for the current dataset.

#### Parameters

##### update_message: `dict[str, str|int]`, *Required*

A dictionary representing the attributes to update, with the following keys:

- `"name"`: `string` The revised name of the dataset.
  - Basic Multilingual Plane (BMP) only
  - Maximum 128 characters
  - Case-insensitive
- `"avatar"`: (*Body parameter*), `string`
  The updated base64 encoding of the avatar.
  - Maximum 65535 characters
- `"embedding_model"`: (*Body parameter*), `string`
  The updated embedding model name.
  - Ensure that `"chunk_count"` is `0` before updating `"embedding_model"`.
  - Maximum 255 characters
  - Must follow `model_name@model_factory` format
- `"permission"`: (*Body parameter*), `string`
  The updated dataset permission. Available options:
  - `"me"`: (Default) Only you can manage the dataset.
  - `"team"`: All team members can manage the dataset.
- `"pagerank"`: (*Body parameter*), `int`
  refer to [Set page rank](https://ragflow.io/docs/dev/set_page_rank)
  - Default: `0`
  - Minimum: `0`
  - Maximum: `100`
- `"chunk_method"`: (*Body parameter*), `enum<string>`
  The chunking method for the dataset. Available options:
  - `"naive"`: General (default)
  - `"book"`: Book
  - `"email"`: Email
  - `"laws"`: Laws
  - `"manual"`: Manual
  - `"one"`: One
  - `"paper"`: Paper
  - `"picture"`: Picture
  - `"presentation"`: Presentation
  - `"qa"`: Q&A
  - `"table"`: Table
  - `"tag"`: Tag

#### Returns

- Success: No value is returned.
- Failure: `Exception`

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(name="kb_name")
dataset = dataset[0]
dataset.update({"embedding_model":"BAAI/bge-zh-v1.5", "chunk_method":"manual"})
```

---

## FILE MANAGEMENT WITHIN DATASET

---

### Upload documents

```python
DataSet.upload_documents(document_list: list[dict])
```

Uploads documents to the current dataset.

#### Parameters

##### document_list: `list[dict]`, *Required*

A list of dictionaries representing the documents to upload, each containing the following keys:

- `"display_name"`: (Optional) The file name to display in the dataset.
- `"blob"`: (Optional) The binary content of the file to upload.

#### Returns

- Success: No value is returned.
- Failure: `Exception`

#### Examples

```python
dataset = rag_object.create_dataset(name="kb_name")
dataset.upload_documents([{"display_name": "1.txt", "blob": "<BINARY_CONTENT_OF_THE_DOC>"}, {"display_name": "2.pdf", "blob": "<BINARY_CONTENT_OF_THE_DOC>"}])
```

---

### Update document

```python
Document.update(update_message:dict)
```

Updates configurations for the current document.

#### Parameters

##### update_message: `dict[str, str|dict[]]`, *Required*

A dictionary representing the attributes to update, with the following keys:

- `"display_name"`: `string` The name of the document to update.
- `"meta_fields"`: `dict[str, Any]` The meta fields of the document.
- `"chunk_method"`: `string` The parsing method to apply to the document.
  - `"naive"`: General
  - `"manual`: Manual
  - `"qa"`: Q&A
  - `"table"`: Table
  - `"paper"`: Paper
  - `"book"`: Book
  - `"laws"`: Laws
  - `"presentation"`: Presentation
  - `"picture"`: Picture
  - `"one"`: One
  - `"email"`: Email
- `"parser_config"`: `dict[str, Any]` The parsing configuration for the document. Its attributes vary based on the selected `"chunk_method"`:
  - `"chunk_method"`=`"naive"`:
    `{"chunk_token_num":128,"delimiter":"\\n","html4excel":False,"layout_recognize":True,"raptor":{"use_raptor":False},"parent_child":{"use_parent_child":False,"children_delimiter":"\\n"}}`.
  - `chunk_method`=`"qa"`:
    `{"raptor": {"use_raptor": False}}`
  - `chunk_method`=`"manuel"`:
    `{"raptor": {"use_raptor": False}}`
  - `chunk_method`=`"table"`:
    `None`
  - `chunk_method`=`"paper"`:
    `{"raptor": {"use_raptor": False}}`
  - `chunk_method`=`"book"`:
    `{"raptor": {"use_raptor": False}}`
  - `chunk_method`=`"laws"`:
    `{"raptor": {"use_raptor": False}}`
  - `chunk_method`=`"presentation"`:
    `{"raptor": {"use_raptor": False}}`
  - `chunk_method`=`"picture"`:
    `None`
  - `chunk_method`=`"one"`:
    `None`
  - `chunk_method`=`"knowledge-graph"`:
    `{"chunk_token_num":128,"delimiter":"\\n","entity_types":["organization","person","location","event","time"]}`
  - `chunk_method`=`"email"`:
    `None`

#### Returns

- Success: No value is returned.
- Failure: `Exception`

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(id='id')
dataset = dataset[0]
doc = dataset.list_documents(id="wdfxb5t547d")
doc = doc[0]
doc.update([{"parser_config": {"chunk_token_num": 256}}, {"chunk_method": "manual"}])
```

---

### Download document

```python
Document.download() -> bytes
```

Downloads the current document.

#### Returns

The downloaded document in bytes.

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(id="id")
dataset = dataset[0]
doc = dataset.list_documents(id="wdfxb5t547d")
doc = doc[0]
open("~/ragflow.txt", "wb+").write(doc.download())
print(doc)
```

---

### List documents

```python
Dataset.list_documents(
    id: str = None,
    keywords: str = None,
    page: int = 1,
    page_size: int = 30,
    order_by: str = "create_time",
    desc: bool = True,
    create_time_from: int = 0,
    create_time_to: int = 0
) -> list[Document]
```

Lists documents in the current dataset.

#### Parameters

##### id: `string`

The ID of the document to retrieve. Defaults to `None`.

##### keywords: `string`

The keywords used to match document titles. Defaults to `None`.

##### page: `int`

Specifies the page on which the documents will be displayed. Defaults to `1`.

##### page_size: `int`

The maximum number of documents on each page. Defaults to `30`.

##### orderby: `string`

The field by which documents should be sorted. Available options:

- `"create_time"` (default)
- `"update_time"`

##### desc: `bool`

Indicates whether the retrieved documents should be sorted in descending order. Defaults to `True`.

##### create_time_from: `int`
Unix timestamp for filtering documents created after this time. 0 means no filter. Defaults to 0.

##### create_time_to: `int`
Unix timestamp for filtering documents created before this time. 0 means no filter. Defaults to 0.

#### Returns

- Success: A list of `Document` objects.
- Failure: `Exception`.

A `Document` object contains the following attributes:

- `id`: The document ID. Defaults to `""`.
- `name`: The document name. Defaults to `""`.
- `thumbnail`: The thumbnail image of the document. Defaults to `None`.
- `dataset_id`: The dataset ID associated with the document. Defaults to `None`.
- `chunk_method` The chunking method name. Defaults to `"naive"`.
- `source_type`: The source type of the document. Defaults to `"local"`.
- `type`: Type or category of the document. Defaults to `""`. Reserved for future use.
- `created_by`: `string` The creator of the document. Defaults to `""`.
- `size`: `int` The document size in bytes. Defaults to `0`.
- `token_count`: `int` The number of tokens in the document. Defaults to `0`.
- `chunk_count`: `int` The number of chunks in the document. Defaults to `0`.
- `progress`: `float` The current processing progress as a percentage. Defaults to `0.0`.
- `progress_msg`: `string` A message indicating the current progress status. Defaults to `""`.
- `process_begin_at`: `datetime` The start time of document processing. Defaults to `None`.
- `process_duration`: `float` Duration of the processing in seconds. Defaults to `0.0`.
- `run`: `string` The document's processing status:
  - `"UNSTART"`  (default)
  - `"RUNNING"`
  - `"CANCEL"`
  - `"DONE"`
  - `"FAIL"`
- `status`: `string` Reserved for future use.
- `parser_config`: `ParserConfig` Configuration object for the parser. Its attributes vary based on the selected `chunk_method`:
  - `chunk_method`=`"naive"`:
    `{"chunk_token_num":128,"delimiter":"\\n","html4excel":False,"layout_recognize":True,"raptor":{"use_raptor":False}}`.
  - `chunk_method`=`"qa"`:
    `{"raptor": {"use_raptor": False}}`
  - `chunk_method`=`"manuel"`:
    `{"raptor": {"use_raptor": False}}`
  - `chunk_method`=`"table"`:
    `None`
  - `chunk_method`=`"paper"`:
    `{"raptor": {"use_raptor": False}}`
  - `chunk_method`=`"book"`:
    `{"raptor": {"use_raptor": False}}`
  - `chunk_method`=`"laws"`:
    `{"raptor": {"use_raptor": False}}`
  - `chunk_method`=`"presentation"`:
    `{"raptor": {"use_raptor": False}}`
  - `chunk_method`=`"picure"`:
    `None`
  - `chunk_method`=`"one"`:
    `None`
  - `chunk_method`=`"email"`:
    `None`

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="kb_1")

filename1 = "~/ragflow.txt"
blob = open(filename1 , "rb").read()
dataset.upload_documents([{"name":filename1,"blob":blob}])
for doc in dataset.list_documents(keywords="rag", page=0, page_size=12):
    print(doc)
```

---

### Delete documents

```python
DataSet.delete_documents(ids: list[str] | None = None, delete_all: bool = False)
```

Deletes documents by ID.

#### Parameters

##### ids: `list[str]` or `None`

The IDs of the documents to delete. Defaults to `None`.

- If omitted, or set to `null` or an empty array, no documents are deleted.
- If an array of IDs is provided, only the documents matching those IDs are deleted.

##### delete_all: `bool`

Whether to delete all documents in the current dataset when `ids` is omitted, or set to `None` or an empty list. Defaults to `False`.

#### Returns

- Success: No value is returned.
- Failure: `Exception`

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(name="kb_1")
dataset = dataset[0]
dataset.delete_documents(ids=["id_1","id_2"])
dataset.delete_documents(delete_all=True)
```

---

### Parse documents

```python
DataSet.async_parse_documents(document_ids:list[str]) -> None
```

Parses documents in the current dataset.

#### Parameters

##### document_ids: `list[str]`, *Required*

The IDs of the documents to parse.

#### Returns

- Success: No value is returned.
- Failure: `Exception`

#### Examples

```python
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="dataset_name")
documents = [
    {'display_name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
    {'display_name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
    {'display_name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()}
]
dataset.upload_documents(documents)
documents = dataset.list_documents(keywords="test")
ids = []
for document in documents:
    ids.append(document.id)
dataset.async_parse_documents(ids)
print("Async bulk parsing initiated.")
```

---

### Parse documents (with document status)

```python
DataSet.parse_documents(document_ids: list[str]) -> list[tuple[str, str, int, int]]
```

*Asynchronously* parses documents in the current dataset.

This method encapsulates `async_parse_documents()`. It awaits the completion of all parsing tasks before returning detailed results, including the parsing status and statistics for each document. If a keyboard interruption occurs (e.g., `Ctrl+C`), all pending parsing tasks will be cancelled gracefully.

#### Parameters

##### document_ids: `list[str]`, *Required*

The IDs of the documents to parse.

#### Returns

A list of tuples with detailed parsing results:

```python
[
  (document_id: str, status: str, chunk_count: int, token_count: int),
  ...
]
```
- `status`: The final parsing state (e.g., `success`, `failed`, `cancelled`).
- `chunk_count`: The number of content chunks created from the document.
- `token_count`: The total number of tokens processed.

---

#### Example

```python
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="dataset_name")
documents = dataset.list_documents(keywords="test")
ids = [doc.id for doc in documents]

try:
    finished = dataset.parse_documents(ids)
    for doc_id, status, chunk_count, token_count in finished:
        print(f"Document {doc_id} parsing finished with status: {status}, chunks: {chunk_count}, tokens: {token_count}")
except KeyboardInterrupt:
    print("\nParsing interrupted by user. All pending tasks have been cancelled.")
except Exception as e:
    print(f"Parsing failed: {e}")
```

---

### Stop parsing documents

```python
DataSet.async_cancel_parse_documents(document_ids:list[str])-> None
```

Stops parsing specified documents.

#### Parameters

##### document_ids: `list[str]`, *Required*

The IDs of the documents for which parsing should be stopped.

#### Returns

- Success: No value is returned.
- Failure: `Exception`

#### Examples

```python
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.create_dataset(name="dataset_name")
documents = [
    {'display_name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
    {'display_name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
    {'display_name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()}
]
dataset.upload_documents(documents)
documents = dataset.list_documents(keywords="test")
ids = []
for document in documents:
    ids.append(document.id)
dataset.async_parse_documents(ids)
print("Async bulk parsing initiated.")
dataset.async_cancel_parse_documents(ids)
print("Async bulk parsing cancelled.")
```

---

## CHUNK MANAGEMENT WITHIN DATASET

---

### Add chunk

```python
Document.add_chunk(content:str, important_keywords:list[str] = [], questions:list[str] = [], image_base64:str = None, *, tag_kwd:list[str] = []) -> Chunk
```

Adds a chunk to the current document.

#### Parameters

##### content: `string`, *Required*

The text content of the chunk.

##### important_keywords: `list[str]`

The key terms or phrases to tag with the chunk.

##### questions: `list[str]`

Optional questions to use when embedding the chunk.

##### image_base64: `string`

A base64-encoded image to associate with the chunk. If the chunk already has an image, the new image will be vertically concatenated below the existing one.

##### tag_kwd: `list[str]`

Tag keywords to associate with the chunk.

#### Returns

- Success: A `Chunk` object.
- Failure: `Exception`.

A `Chunk` object contains the following attributes:

- `id`: `string`: The chunk ID.
- `content`: `string` The text content of the chunk.
- `important_keywords`: `list[str]` A list of key terms or phrases tagged with the chunk.
- `tag_kwd`: `list[str]` A list of tag keywords associated with the chunk.
- `questions`: `list[str]` A list of questions associated with the chunk.
- `image_id`: `string` The image ID associated with the chunk (empty string if no image).
- `create_time`: `string` The time when the chunk was created (added to the document).
- `create_timestamp`: `float` The timestamp representing the creation time of the chunk, expressed in seconds since January 1, 1970.
- `dataset_id`: `string` The ID of the associated dataset.
- `document_name`: `string` The name of the associated document.
- `document_id`: `string` The ID of the associated document.
- `available`: `bool` The chunk's availability status in the dataset. Value options:
  - `False`: Unavailable
  - `True`: Available (default)

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
datasets = rag_object.list_datasets(id="123")
dataset = datasets[0]
doc = dataset.list_documents(id="wdfxb5t547d")
doc = doc[0]
chunk = doc.add_chunk(content="xxxxxxx")
```

Adding a chunk with an image:

```python
import base64

with open("image.jpg", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()
chunk = doc.add_chunk(content="description of image", image_base64=img_b64)
```

---

### List chunks

```python
Document.list_chunks(keywords: str = None, page: int = 1, page_size: int = 30, id : str = None) -> list[Chunk]
```

Lists chunks in the current document.

#### Parameters

##### keywords: `string`

The keywords used to match chunk content. Defaults to `None`

##### page: `int`

Specifies the page on which the chunks will be displayed. Defaults to `1`.

##### page_size: `int`

The maximum number of chunks on each page. Defaults to `30`.

##### id: `string`

The ID of the chunk to retrieve. Default: `None`

#### Returns

- Success: A list of `Chunk` objects.
- Failure: `Exception`.

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets("123")
dataset = dataset[0]
docs = dataset.list_documents(keywords="test", page=1, page_size=12)
for chunk in docs[0].list_chunks(keywords="rag", page=0, page_size=12):
    print(chunk)
```

---

### Delete chunks

```python
Document.delete_chunks(ids: list[str] | None = None, delete_all: bool = False)
```

Deletes chunks by ID.

#### Parameters

##### ids: `list[str]` or `None`

The IDs of the chunks to delete. Defaults to `None`.

- If omitted, or set to `null` or an empty array, no chunks are deleted.
- If an array of IDs is provided, only the chunks matching those IDs are deleted.

##### delete_all: `bool`

Whether to delete all chunks in the current document when `ids` is omitted, or set to `None` or an empty list. Defaults to `False`.

#### Returns

- Success: No value is returned.
- Failure: `Exception`

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(id="123")
dataset = dataset[0]
doc = dataset.list_documents(id="wdfxb5t547d")
doc = doc[0]
chunk = doc.add_chunk(content="xxxxxxx")
doc.delete_chunks(["id_1","id_2"])
doc.delete_chunks(delete_all=True)
```

---

### Update chunk

```python
Chunk.update(update_message: dict)
```

Updates content or configurations for the current chunk.

#### Parameters

##### update_message: `dict[str, str|list[str]|bool]` *Required*

A dictionary representing the attributes to update, with the following keys:

- `"content"`: `string` The text content of the chunk.
- `"important_keywords"`: `list[str]` A list of key terms or phrases to tag with the chunk.
- `"questions"`: `list[str]` A list of questions associated with the chunk.
- `"tag_kwd"`: `list[str]` A list of tag keywords to associate with the chunk.
- `"positions"`: `list` Updated source positions for the chunk.
- `"available"`: `bool` The chunk's availability status in the dataset. Value options:
  - `False`: Unavailable
  - `True`: Available (default)
- `"image_base64"`: `string` Base64-encoded image content to associate with the chunk.

#### Returns

- Success: No value is returned.
- Failure: `Exception`

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(id="123")
dataset = dataset[0]
doc = dataset.list_documents(id="wdfxb5t547d")
doc = doc[0]
chunk = doc.add_chunk(content="xxxxxxx")
chunk.update({"content":"sdfx..."})
```

---

### Retrieve chunks

```python
RAGFlow.retrieve(question:str="", dataset_ids:list[str]=None, document_ids=list[str]=None, page:int=1, page_size:int=30, similarity_threshold:float=0.2, vector_similarity_weight:float=0.3, top_k:int=1024,rerank_id:str=None,keyword:bool=False,cross_languages:list[str]=None,metadata_condition: dict=None) -> list[Chunk]
```

Retrieves chunks from specified datasets.

#### Parameters

##### question: `string`, *Required*

The user query or query keywords. Defaults to `""`.

##### dataset_ids: `list[str]`, *Required*

The IDs of the datasets to search. Defaults to `None`.

##### document_ids: `list[str]`

The IDs of the documents to search. Defaults to `None`. You must ensure all selected documents use the same embedding model. Otherwise, an error will occur.

##### page: `int`

The starting index for the documents to retrieve. Defaults to `1`.

##### page_size: `int`

The maximum number of chunks to retrieve. Defaults to `30`.

##### Similarity_threshold: `float`

The minimum similarity score. Defaults to `0.2`.

##### vector_similarity_weight: `float`

The weight of vector cosine similarity. Defaults to `0.3`. If x represents the vector cosine similarity, then (1 - x) is the term similarity weight.

##### top_k: `int`

The number of chunks engaged in vector cosine computation. Defaults to `1024`.

##### rerank_id: `string`

The ID of the rerank model. Defaults to `None`.

##### keyword: `bool`

Indicates whether to enable keyword-based matching:

- `True`: Enable keyword-based matching.
- `False`: Disable keyword-based matching (default).

##### cross_languages:  `list[string]`

The languages that should be translated into, in order to achieve keywords retrievals in different languages.

##### metadata_condition: `dict`

filter condition for `meta_fields`.

#### Returns

- Success: A list of `Chunk` objects representing the document chunks.
- Failure: `Exception`

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
dataset = rag_object.list_datasets(name="ragflow")
dataset = dataset[0]
name = 'ragflow_test.txt'
path = './test_data/ragflow_test.txt'
documents =[{"display_name":"test_retrieve_chunks.txt","blob":open(path, "rb").read()}]
docs = dataset.upload_documents(documents)
doc = docs[0]
doc.add_chunk(content="This is a chunk addition test")
for c in rag_object.retrieve(dataset_ids=[dataset.id],document_ids=[doc.id]):
  print(c)
```

---

## CHAT ASSISTANT MANAGEMENT

---

### Create chat assistant

```python
RAGFlow.create_chat(
    name: str,
    icon: str = "",
    dataset_ids: list[str] | None = None,
    llm_id: str | None = None,
    llm_setting: dict | None = None,
    prompt_config: dict | None = None,
    **kwargs
) -> Chat
```

Creates a chat assistant.

#### Parameters

##### name: `string`, *Required*

The name of the chat assistant.

##### icon: `string`

Base64 encoding of the avatar. Defaults to `""`.

##### dataset_ids: `list[str]`

The IDs of the associated datasets. Defaults to `[]`. When omitted or empty, the SDK creates an empty chat assistant and you can attach datasets later.

##### llm_id: `str | None`

The LLM model name/ID to use. If `None`, the user’s default chat model is used. Defaults to `None`.

##### llm_setting: `dict | None`

Configuration for LLM generation parameters. Defaults to `None` (server-side defaults apply). Supported keys:

- `"temperature"`: `float` Controls the randomness of the model's output. Higher values increase creativity, while lower values make responses more deterministic. Defaults to `0.1`.
- `"top_p"`: `float` Sets the nucleus sampling threshold. The model considers only the results of the tokens with `top_p` probability mass. Defaults to `0.3`.
- `"presence_penalty"`: `float` Penalizes tokens based on whether they have appeared in the text so far, increasing the likelihood of the model talking about new topics. Defaults to `0.4`.
- `"frequency_penalty"`: `float` Penalizes tokens based on their existing frequency in the text, decreasing the likelihood of repeating the same lines. Defaults to `0.7`.
- `"max_token"`: `int` The maximum number of tokens to generate in the response. Defaults to `512`.

##### prompt_config: `dict | None`

Instructions and behavioral settings for the LLM. Defaults to `None` (server-side defaults apply). Supported keys:

- `"system"`: `string` The core system prompt or instructions defining the assistant's persona.
- `"empty_response"`: `string` The specific message returned when no relevant information is retrieved. If left blank, the LLM will generate its own response. Defaults to `None`.
- `"prologue"`: `string` The initial greeting displayed to the user. Defaults to `"Hi! I’m your assistant. What can I do for you?"`.
- `"quote"`: `boolean` Determines whether the assistant should include citations or source references in its responses. Defaults to `True`.
- `"parameters"`: `list[dict]` A list of variables utilized within the system prompt. Each entry must include a `"key"` (`string`) and an `"optional"` (`boolean`) status. The `knowledge` key is reserved for retrieved context chunks. Default: `[{"key": "knowledge", "optional": true}]`.

#### Returns

- Success: A `Chat` object representing the chat assistant.
- Failure: `Exception`

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
datasets = rag_object.list_datasets(name="kb_1")
dataset_ids = []
for dataset in datasets:
    dataset_ids.append(dataset.id)
assistant = rag_object.create_chat("Miss R", dataset_ids=dataset_ids)
```

---

### Update chat assistant

```python
Chat.update(update_message: dict)
```

Performs a partial update to the configuration settings for the current chat assistant.

`Chat.update()` utilizes the `PATCH /api/v1/chats/{chat_id}` endpoint. Only the specified keys are modified, while all other existing fields are preserved.

#### Parameters

##### update_message: `dict`, *Required*

A dictionary containing the attributes to be updated. Supported keys include:

- `"name"`: `string` The updated name of the chat assistant.
- `"icon"`: `string` A Base64-encoded string representing the assistant's avatar.
- `"dataset_ids"`: `list[string]` A list of unique identifiers for the datasets associated with the assistant.
- `"llm_id"`: `string` The unique identifier or name of the LLM to be used.
- `"llm_setting"`: `dict` Configuration for LLM generation parameters:
  - `"temperature"`: `float` Controls the randomness of the model's output.
  - `"top_p"`: `float` Sets the nucleus sampling threshold.
  - `"presence_penalty"`: `float` Penalizes tokens based on whether they have already appeared in the text.
  - `"frequency_penalty"`: `float` Penalizes tokens based on their existing frequency in the text.
  - `"max_token"`: `int` The maximum number of tokens to generate in the response.
- `"prompt_config"`: `dict` Instructions and behavioral settings for the LLM:
  - `"system"`: `string` The core system prompt or instructions defining the assistant's persona.
  - `"empty_response"`: `string` The message returned when no relevant information is retrieved. Leave blank to allow the LLM to improvise.
  - `"prologue"`: `string` The initial greeting displayed to the user.
  - `"quote"`: `boolean` Determines whether the assistant should include citations or source references.
  - `"parameters"`: `list[dict]` Variables used within the system prompt (e.g., the reserved `knowledge` key).
- `"similarity_threshold"`: `float` The minimum similarity score required for retrieved context chunks. Defaults to `0.2`.
- `"vector_similarity_weight"`: `float` The weight assigned to vector cosine similarity within the hybrid search score. Defaults to `0.3`.
- `"top_n"`: `int` The number of top-ranked chunks provided to the LLM as context. Defaults to `6`.
- `"top_k"`: `int` The size of the initial candidate pool retrieved for reranking. Defaults to `1024`.
- `"rerank_id"`: `string` The unique identifier for the reranking model. If left empty, standard vector cosine similarity is used for ranking.

#### Returns

- Success: No value is returned.
- Failure: `Exception`

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
datasets = rag_object.list_datasets(name="kb_1")
dataset_id = datasets[0].id
assistant = rag_object.create_chat("Miss R", dataset_ids=[dataset_id])
assistant.update({"name": "Stefan", "llm_setting": {"temperature": 0.8}, "top_n": 8})
```

---

### Delete chat assistants

```python
RAGFlow.delete_chats(ids: list[str] | None = None, delete_all: bool = False)
```

Deletes chat assistants by ID.

#### Parameters

##### ids: `list[str]` or `None`

The IDs of the chat assistants to delete. Defaults to `None`.

- If omitted, or set to `null` or an empty array, no chat assistants are deleted.
- If an array of IDs is provided, only the chat assistants matching those IDs are deleted.

##### delete_all: `bool`

Whether to delete all chat assistants owned by the current user when `ids` is omitted, or set to `None` or an empty list. Defaults to `False`.

#### Returns

- Success: No value is returned.
- Failure: `Exception`

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
rag_object.delete_chats(ids=["id_1","id_2"])
rag_object.delete_chats(delete_all=True)
```

---

### List chat assistants

```python
RAGFlow.list_chats(
    page: int = 1,
    page_size: int = 30,
    orderby: str = "create_time",
    desc: bool = True,
    id: str | None = None,
    name: str | None = None,
    keywords: str | None = None,
    owner_ids: str | list[str] | None = None,
    parser_id: str | None = None
) -> list[Chat]
```

Lists chat assistants.

#### Parameters

##### page: `int`

Specifies the page on which the chat assistants will be displayed. Defaults to `1`.

##### page_size: `int`

The number of chat assistants on each page. Defaults to `30`.

##### orderby: `string`

The attribute by which the results are sorted. Available options:

- `"create_time"` (default)
- `"update_time"`

##### desc: `bool`

Indicates whether the retrieved chat assistants should be sorted in descending order. Defaults to `True`.

##### id: `string | None`

Exact match on chat assistant ID. Defaults to `None`.

Filters results by the exact name of the chat assistant. Defaults to `None`.

##### keywords: `string | None`

Performs a case-insensitive fuzzy search against chat assistant names. Defaults to `None`.

##### owner_ids: `string | list[string] | None`

Filters results by one or more owner tenant IDs. Defaults to `None`.

##### parser_id: `string | None`

Filters results by a specific parser type identifier. Defaults to `None`.

If `id` or `name` is specified, exact filtering takes precedence over the fuzzy matching provided by `keywords`.

#### Returns

- Success: A list of `Chat` objects.
- Failure: `Exception`.

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
for assistant in rag_object.list_chats():
    print(assistant)
```

---

## SESSION MANAGEMENT

---

### Create session with chat assistant

```python
Chat.create_session(name: str = "New session") -> Session
```

Creates a session with the current chat assistant.

#### Parameters

##### name: `string`

The name of the chat session to create.

#### Returns

- Success: A `Session` object containing the following attributes:
  - `id`: `string` The auto-generated unique identifier of the created session.
  - `name`: `string` The name of the created session.
  - `message`: `list[Message]` The opening message of the created session. Default: `[{"role": "assistant", "content": "Hi! I am your assistant, can I help you?"}]`
  - `chat_id`: `string` The ID of the associated chat assistant.
- Failure: `Exception`

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
assistant = rag_object.list_chats(name="Miss R")
assistant = assistant[0]
session = assistant.create_session()
```

---

### Update chat assistant's session

```python
Session.update(update_message: dict)
```

Updates the current session of the current chat assistant.

#### Parameters

##### update_message: `dict[str, Any]`, *Required*

A dictionary representing the attributes to update, with only one key:

- `"name"`: `string` The revised name of the session.

#### Returns

- Success: No value is returned.
- Failure: `Exception`

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
assistant = rag_object.list_chats(name="Miss R")
assistant = assistant[0]
session = assistant.create_session("session_name")
session.update({"name": "updated_name"})
```

---

### List chat assistant's sessions

```python
Chat.list_sessions(
    page: int = 1,
    page_size: int = 30,
    orderby: str = "create_time",
    desc: bool = True,
    id: str = None,
    name: str = None,
    user_id: str = None
) -> list[Session]
```

Lists sessions associated with the current chat assistant.

#### Parameters

##### page: `int`

Specifies the page on which the sessions will be displayed. Defaults to `1`.

##### page_size: `int`

The number of sessions on each page. Defaults to `30`.

##### orderby: `string`

The field by which sessions should be sorted. Available options:

- `"create_time"` (default)
- `"update_time"`

##### desc: `bool`

Indicates whether the retrieved sessions should be sorted in descending order. Defaults to `True`.

##### id: `string`

The ID of the chat session to retrieve. Defaults to `None`.

##### name: `string`

The name of the chat session to retrieve. Defaults to `None`.

##### user_id: `str`

The optional user-defined ID to filter sessions by. Defaults to `None`.

#### Returns

- Success: A list of `Session` objects associated with the current chat assistant.
- Failure: `Exception`.

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
assistant = rag_object.list_chats(name="Miss R")
assistant = assistant[0]
for session in assistant.list_sessions():
    print(session)
```

---

### Delete chat assistant's sessions

```python
Chat.delete_sessions(ids: list[str] | None = None, delete_all: bool = False)
```

Deletes sessions of the current chat assistant by ID.

#### Parameters

##### ids: `list[str]` or `None`

The IDs of the sessions to delete. Defaults to `None`.

- If omitted, or set to `null` or an empty array, no sessions are deleted.
- If an array of IDs is provided, only the sessions matching those IDs are deleted.

##### delete_all: `bool`

Whether to delete all sessions of the current chat assistant when `ids` is omitted, or set to `None` or an empty list. Defaults to `False`.

#### Returns

- Success: No value is returned.
- Failure: `Exception`

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
assistant = rag_object.list_chats(name="Miss R")
assistant = assistant[0]
assistant.delete_sessions(ids=["id_1","id_2"])
assistant.delete_sessions(delete_all=True)
```

---

### Converse with chat assistant

```python
Session.ask(question: str = "", stream: bool = False, **kwargs) -> Optional[Message, iter[Message]]
```

Asks a specified chat assistant a question to start an AI-powered conversation.

:::tip NOTE
In streaming mode, not all responses include a reference, as this depends on the system's judgement.
:::

#### Parameters

##### question: `string`, *Required*

The question to start an AI-powered conversation. Default to `""`

##### stream: `bool`

Indicates whether to output responses in a streaming way:

- `True`: Enable streaming (default).
- `False`: Disable streaming.

##### **kwargs

The parameters in prompt(system).

#### Returns

- A `Message` object containing the response to the question if `stream` is set to `False`.
- An iterator containing multiple `message` objects (`iter[Message]`) if `stream` is set to `True`

The following shows the attributes of a `Message` object:

##### id: `string`

The auto-generated message ID.

##### content: `string`

The content of the message. Defaults to `"Hi! I am your assistant, can I help you?"`.

##### reference: `list[Chunk]`

A list of `Chunk` objects representing references to the message, each containing the following attributes:

- `id` `string`
  The chunk ID.
- `content` `string`
  The content of the chunk.
- `img_id` `string`
  The ID of the snapshot of the chunk. Applicable only when the source of the chunk is an image, PPT, PPTX, or PDF file.
- `document_id` `string`
  The ID of the referenced document.
- `document_name` `string`
  The name of the referenced document.
- `document_metadata` `dict`
  Optional document metadata, returned only when `extra_body.reference_metadata.include` is `true`.
- `position` `list[str]`
  The location information of the chunk within the referenced document.
- `dataset_id` `string`
  The ID of the dataset to which the referenced document belongs.
- `similarity` `float`
  A composite similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity. It is the weighted sum of `vector_similarity` and `term_similarity`.
- `vector_similarity` `float`
  A vector similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between vector embeddings.
- `term_similarity` `float`
  A keyword similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between keywords.

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
assistant = rag_object.list_chats(name="Miss R")
assistant = assistant[0]
session = assistant.create_session()

print("\n==================== Miss R =====================\n")
print("Hello. What can I do for you?")

while True:
    question = input("\n==================== User =====================\n> ")
    print("\n==================== Miss R =====================\n")

    cont = ""
    for ans in session.ask(question, stream=True):
        print(ans.content[len(cont):], end='', flush=True)
        cont = ans.content
```

---

### Create session with agent

```python
Agent.create_session(**kwargs) -> Session
```

Creates a session with the current agent.

#### Parameters

##### **kwargs

The parameters in `begin` component.

Also supports:

- `release` (`bool | str`, optional): When set to `True` (or `"true"`), creates a session with the published agent app only.

#### Returns

- Success: A `Session` object containing the following attributes:
  - `id`: `string` The auto-generated unique identifier of the created session.
  - `message`: `list[Message]` The messages of the created session assistant. Default: `[{"role": "assistant", "content": "Hi! I am your assistant, can I help you?"}]`
  - `agent_id`: `string` The ID of the associated agent.
- Failure: `Exception`

#### Examples

```python
from ragflow_sdk import RAGFlow, Agent

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
agent_id = "AGENT_ID"
agent = rag_object.get_agent(agent_id)
session = agent.create_session()
# Or create in release mode:
# session = agent.create_session(release=True)
```

---

### Converse with agent

```python
Session.ask(question: str = "", stream: bool = False, **kwargs) -> Optional[Message | iter[Message]]
```

Asks a specified agent through the unified completion endpoint.

:::tip NOTE
In streaming mode, not all responses include a reference, as this depends on the system's judgement.
:::

#### Parameters

##### question: `string`

The user message sent to the agent. If the **Begin** component takes parameters, `question` can be an empty string.

##### stream: `bool`

Indicates whether to output responses in a streaming way:

- `True`: Enable streaming.
- `False`: Disable streaming.

##### kwargs: `dict`

Additional request parameters forwarded to the completion API. Common options:

- `inputs`: Variables defined in the **Begin** component.
- `session_id`: Continue an existing session instead of creating a new one.
- `release`: Use the latest published version of the agent.
- `return_trace`: Include execution trace information in the response.
- Other custom Begin component parameters supported by the current workflow.

#### Returns

- A `Message` object containing the response to the question if `stream` is set to `False`
- An iterator containing multiple `message` objects (`iter[Message]`) if `stream` is set to `True`

The following shows the attributes of a `Message` object:

##### id: `string`

The auto-generated message ID.

##### content: `string`

The content of the message. Defaults to `"Hi! I am your assistant, can I help you?"`.

##### reference: `list[Chunk]`

A list of `Chunk` objects representing references to the message, each containing the following attributes:

- `id` `string`
  The chunk ID.
- `content` `string`
  The content of the chunk.
- `image_id` `string`
  The ID of the snapshot of the chunk. Applicable only when the source of the chunk is an image, PPT, PPTX, or PDF file.
- `document_id` `string`
  The ID of the referenced document.
- `document_name` `string`
  The name of the referenced document.
- `document_metadata` `dict`
  Optional document metadata, returned only when `extra_body.reference_metadata.include` is `true`.
- `position` `list[str]`
  The location information of the chunk within the referenced document.
- `dataset_id` `string`
  The ID of the dataset to which the referenced document belongs.
- `similarity` `float`
  A composite similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity. It is the weighted sum of `vector_similarity` and `term_similarity`.
- `vector_similarity` `float`
  A vector similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between vector embeddings.
- `term_similarity` `float`
  A keyword similarity score of the chunk ranging from `0` to `1`, with a higher value indicating greater similarity between keywords.

#### Examples

```python
from ragflow_sdk import RAGFlow, Agent

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
AGENT_id = "AGENT_ID"
agent = rag_object.get_agent(AGENT_id)
session = agent.create_session()

print("\n===== Miss R ====\n")
print("Hello. What can I do for you?")

while True:
    question = input("\n===== User ====\n> ")
    print("\n==== Miss R ====\n")

    cont = ""
    for ans in session.ask(question, stream=True):
        print(ans.content[len(cont):], end='', flush=True)
        cont = ans.content
```

Use Begin inputs and request trace output:

```python
from ragflow_sdk import RAGFlow, Agent

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
agent = rag_object.get_agent("AGENT_ID")
session = agent.create_session()

message = session.ask(
    "",
    stream=False,
    inputs={
        "line_var": {
            "type": "line",
            "value": "I am line_var",
        }
    },
    return_trace=True,
)

print(message.content)
print(message.reference)
```

---

### List agent sessions

```python
Agent.list_sessions(
    page: int = 1,
    page_size: int = 30,
    orderby: str = "update_time",
    desc: bool = True,
    id: str = None
) -> List[Session]
```

Lists sessions associated with the current agent.

#### Parameters

##### page: `int`

Specifies the page on which the sessions will be displayed. Defaults to `1`.

##### page_size: `int`

The number of sessions on each page. Defaults to `30`.

##### orderby: `string`

The field by which sessions should be sorted. Available options:

- `"create_time"`
- `"update_time"`(default)

##### desc: `bool`

Indicates whether the retrieved sessions should be sorted in descending order. Defaults to `True`.

##### id: `string`

The ID of the agent session to retrieve. Defaults to `None`.

#### Returns

- Success: A list of `Session` objects associated with the current agent.
- Failure: `Exception`.

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
AGENT_id = "AGENT_ID"
agent = rag_object.get_agent(AGENT_id)
sessons = agent.list_sessions()
for session in sessions:
    print(session)
```
---
### Delete agent's sessions

```python
Agent.delete_sessions(ids: list[str] | None = None, delete_all: bool = False)
```

Deletes sessions of an agent by ID.

#### Parameters

##### ids: `list[str]` or `None`

The IDs of the sessions to delete. Defaults to `None`.

- If omitted, or set to `None` or an empty array, no sessions are deleted.
- If an array of IDs is provided, only the sessions matching those IDs are deleted.

##### delete_all: `bool`

Whether to delete all sessions of the current agent when `ids` is omitted, or set to `None` or an empty list. Defaults to `False`.

#### Returns

- Success: No value is returned.
- Failure: `Exception`

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
AGENT_id = "AGENT_ID"
agent = rag_object.get_agent(AGENT_id)
agent.delete_sessions(ids=["id_1","id_2"])
agent.delete_sessions(delete_all=True)
```

---

## AGENT MANAGEMENT

---

### List agents

```python
RAGFlow.list_agents(
    page: int = 1,
    page_size: int = 30,
    orderby: str = "update_time",
    desc: bool = True
) -> List[Agent]
```

Lists agents. This is a collection API and always returns a list.

#### Parameters

##### page: `int`

Specifies the page on which the agents will be displayed. Defaults to `1`.

##### page_size: `int`

The number of agents on each page. Defaults to `30`.

##### orderby: `string`

The attribute by which the results are sorted. Available options:

- `"create_time"`
- `"update_time"` (default)

##### desc: `bool`

Indicates whether the retrieved agents should be sorted in descending order. Defaults to `True`.

#### Returns

- Success: A list of `Agent` objects.
- Failure: `Exception`.

#### Examples

```python
from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
for agent in rag_object.list_agents():
    print(agent)
```

---

### Get agent

```python
RAGFlow.get_agent(agent_id: str) -> Agent
```

Gets a single agent by ID and returns the detailed agent payload.

#### Parameters

##### agent_id: `string`

The ID of the agent to retrieve.

#### Returns

- Success: An `Agent` object.
- Failure: `Exception`.

#### Examples

```python
from ragflow_sdk import RAGFlow

rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
agent = rag_object.get_agent("AGENT_ID")
print(agent)
```

---

### Create agent

```python
RAGFlow.create_agent(
    title: str,
    dsl: dict,
    description: str | None = None
) -> None
```

Create an agent.

#### Parameters

##### title: `string`

Specifies the title of the agent.

##### dsl: `dict`

Specifies the canvas DSL of the agent.

##### description: `string`

The description of the agent. Defaults to `None`.

#### Returns

- Success: Nothing.
- Failure: `Exception`.

#### Examples

```python
from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
rag_object.create_agent(
  title="Test Agent",
  description="A test agent",
  dsl={
    # ... canvas DSL here ...
  }
)
```

---

### Update agent

```python
RAGFlow.update_agent(
    agent_id: str,
    title: str | None = None,
    description: str | None = None,
    dsl: dict | None = None
) -> None
```

Update an agent.

#### Parameters

##### agent_id: `string`

Specifies the id of the agent to be updated.

##### title: `string`

Specifies the new title of the agent. `None` if you do not want to update this.

##### dsl: `dict`

Specifies the new canvas DSL of the agent. `None` if you do not want to update this.

##### description: `string`

The new description of the agent. `None` if you do not want to update this.

#### Returns

- Success: Nothing.
- Failure: `Exception`.

#### Examples

```python
from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
rag_object.update_agent(
  agent_id="58af890a2a8911f0a71a11b922ed82d6",
  title="Test Agent",
  description="A test agent",
  dsl={
    # ... canvas DSL here ...
  }
)
```

---

### Delete agent

```python
RAGFlow.delete_agent(
    agent_id: str
) -> None
```

Delete an agent.

#### Parameters

##### agent_id: `string`

Specifies the id of the agent to be deleted.

#### Returns

- Success: Nothing.
- Failure: `Exception`.

#### Examples

```python
from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
rag_object.delete_agent("58af890a2a8911f0a71a11b922ed82d6")
```

---


## Memory Management

### Create Memory

```python
Ragflow.create_memory(
    name: str,
    memory_type: list[str],
    embd_id: str,
    llm_id: str
) -> Memory
```

Create a new memory.

#### Parameters

##### name: `string`, *Required*

The unique name of the memory to create. It must adhere to the following requirements:

- Basic Multilingual Plane (BMP) only
- Maximum 128 characters

##### memory_type: `list[str]`, *Required*

Specifies the types of memory to extract. Available options:

- `raw`: The raw dialogue content between the user and the agent . *Required by default*.
- `semantic`: General knowledge and facts about the user and world.
- `episodic`: Time-stamped records of specific events and experiences.
- `procedural`: Learned skills, habits, and automated procedures.

##### embd_id: `string`, *Required*

The name of the embedding model to use. For example: `"BAAI/bge-large-zh-v1.5@BAAI"`

- Maximum 255 characters
- Must follow `model_name@model_factory` format

##### llm_id: `string`, *Required*

The name of the chat model to use. For example: `"glm-4-flash@ZHIPU-AI"`

- Maximum 255 characters
- Must follow `model_name@model_factory` format

#### Returns

- Success: A `memory` object.

- Failure: `Exception`

#### Examples

```python
from ragflow_sdk import RAGFlow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
memory = rag_obj.create_memory("name", ["raw"], "BAAI/bge-large-zh-v1.5@SILICONFLOW", "glm-4-flash@ZHIPU-AI")
```

---


### Update Memory

```python
Memory.update(
	update_dict: dict
) -> Memory
```

Updates configurations for a specified memory.

#### Parameters

##### update_dict: `dict`, *Required*

Configurations to update. Available configurations:

- `name`: `string`, *Optional*

  The revised name of the memory.

  - Basic Multilingual Plane (BMP) only
  - Maximum 128 characters, *Optional*

- `avatar`: `string`, *Optional*

  The updated base64 encoding of the avatar.

  - Maximum 65535 characters

- `permission`:  `enum<string>`, *Optional*

  The updated memory permission. Available options:

  - `"me"`: (Default) Only you can manage the memory.
  - `"team"`: All team members can manage the memory.

- `llm_id`: `string`, *Optional*

  The name of the chat model to use. For example: `"glm-4-flash@ZHIPU-AI"`

  - Maximum 255 characters
  - Must follow `model_name@model_factory` format

- `description`: `string`, *Optional*

  The description of the memory. Defaults to `None`.

- `memory_size`: `int`, *Optional*

  Defaults to `5*1024*1024` Bytes. Accounts for each message's content + its embedding vector (≈ Content + Dimensions × 8 Bytes). Example: A 1 KB message with 1024-dim embedding uses ~9 KB. The 5 MB default limit holds ~500 such messages.

  - Maximum 10 * 1024 * 1024 Bytes

- `forgetting_policy`: `enum<string>`, *Optional*

  Evicts existing data based on the chosen policy when the size limit is reached, freeing up space for new messages. Available options:

  - `"FIFO"`: (Default) Prioritize messages with the earliest `forget_at` time for removal. When the pool of messages that have `forget_at` set is insufficient, it falls back to selecting messages in ascending order of their `valid_at` (oldest first).

- `temperature`: (*Body parameter*), `float`, *Optional*

  Adjusts output randomness. Lower = more deterministic; higher = more creative.

  - Range [0, 1]

- `system_prompt`: (*Body parameter*), `string`, *Optional*

  Defines the system-level instructions and role for the AI assistant. It is automatically assembled based on the selected `memory_type` by `PromptAssembler` in `memory/utils/prompt_util.py`. This prompt sets the foundational behavior and context for the entire conversation.

  - Keep the `OUTPUT REQUIREMENTS` and `OUTPUT FORMAT` parts unchanged.

- `user_prompt`: (*Body parameter*), `string`, *Optional*

  Represents the user's custom setting, which is the specific question or instruction the AI needs to respond to directly. Defaults to `None`.

#### Returns

- Success: A `memory` object.

- Failure: `Exception`

#### Examples

```python
from ragflow_sdk import Ragflow, Memory
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
memory_obejct = Memory(rag_object, {"id": "your memory_id"})
memory_object.update({"name": "New_name"})
```

---


### List Memory

```python
Ragflow.list_memory(
    page: int = 1,
    page_size: int = 50,
    tenant_id: str | list[str] = None,
    memory_type: str | list[str] = None,
    storage_type: str = None,
    keywords: str = None) -> dict
```

List memories.

#### Parameters

##### page: `int`, *Optional*

Specifies the page on which the datasets will be displayed. Defaults to `1`

##### page_size: `int`, *Optional*

The number of memories on each page. Defaults to `50`.

##### tenant_id: `string` or `list[str]`, *Optional*

The owner's ID, supports search multiple IDs.

##### memory_type: `string` or `list[str]`, *Optional*

The type of memory (as set during creation). A memory matches if its type is **included in** the provided value(s). Available options:

- `raw`
- `semantic`
- `episodic`
- `procedural`

##### storage_type: `string`, *Optional*

The storage format of messages. Available options:

- `table`: (Default)

##### keywords: `string`, *Optional*

The name of memory to retrieve, supports fuzzy search.

#### Returns

Success: A dict of `Memory` object list and total count.

```json
{"memory_list": list[Memory], "total_count": int}
```

Failure: `Exception`

#### Examples

```
from ragflow_sdk import Ragflow, Memory
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
rag_obejct.list_memory()
```

---


### Get Memory Config

```python
Memory.get_config()
```

Get the configuration of a specified memory.

#### Parameters

None

#### Returns

Success: A `Memory` object.

Failure: `Exception`

#### Examples

```python
from ragflow_sdk import Ragflow, Memory
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
memory_obejct = Memory(rag_object, {"id": "your memory_id"})
memory_obejct.get_config()
```

---


### Delete Memory

```python
Ragflow.delete_memory(
    memory_id: str
) -> None
```

Delete a specified memory.

#### Parameters

##### memory_id: `string`, *Required*

The ID of the memory.

#### Returns

Success: Nothing

Failure: `Exception`

#### Examples

```python
from ragflow_sdk import Ragflow, Memory
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
rag_object.delete_memory("your memory_id")
```

---


### List messages of a memory

```python
Memory.list_memory_messages(
    agent_id: str | list[str]=None,
    keywords: str=None,
    page: int=1,
    page_size: int=50
) -> dict
```

List the messages of a specified memory.

#### Parameters

##### agent_id: `string` or `list[str]`, *Optional*

Filters messages by the ID of their source agent. Supports multiple values.

##### keywords: `string`, *Optional*

Filters messages by their session ID. This field supports fuzzy search.

##### page: `int`, *Optional*

Specifies the page on which the messages will be displayed. Defaults to `1`.

##### page_size: `int`, *Optional*

The number of messages on each page. Defaults to `50`.

#### Returns

Success: a dict of messages and meta info.

```json
{"messages": {"message_list": [{message dict}], "total_count": int}, "storage_type": "table"}
```

Failure: `Exception`

#### Examples

```python
from ragflow_sdk import Ragflow, Memory
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
memory_obejct = Memory(rag_object, {"id": "your memory_id"})
memory_obejct.list_memory_messages()
```

---


### Add Message

```python
Ragflow.add_message(
    memory_id: list[str],
    agent_id: str,
    session_id: str,
    user_input: str,
    agent_response: str,
    user_id: str = ""
) -> str
```

Add a message to specified memories.

#### Parameters

##### memory_id: `list[str]`, *Required*

The IDs of the memories to save messages.

##### agent_id: `string`, *Required*

The ID of the message's source agent.

##### session_id: `string`, *Required*

The ID of the message's session.

##### user_input: `string`, *Required*

The text input provided by the user.

##### agent_response: `string`, *Required*

The text response generated by the AI agent.

##### user_id: `string`, *Optional*

The user participating in the conversation with the agent. Defaults to `""`.

#### Returns

Success:  A text `"All add to task."`

Failure: `Exception`

#### Examples

```python
from ragflow_sdk import Ragflow, Memory
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
message_payload = {
    "memory_id": memory_ids,
    "agent_id": agent_id,
    "session_id": session_id,
    "user_id": "",
    "user_input": "Your question here",
    "agent_response": """
Your agent response here
"""
}
client.add_message(**message_payload)
```

---


### Forget Message

```python
Memory.forget_message(message_id: int) -> bool
```

Forget a specified message. After forgetting, this message will not be retrieved by agents, and it will also be prioritized for cleanup by the forgetting policy.

#### Parameters

##### message_id: `int`, *Required*

The ID of the message to forget.

#### Returns

Success: True

Failure: `Exception`

#### Examples

```python
from ragflow_sdk import Ragflow, Memory
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
memory_object = Memory(rag_object, {"id": "your memory_id"})
memory_object.forget_message(message_id)
```

---


### Update message status

```python
Memory.update_message_status(message_id: int, status: bool) -> bool
```

Update message status, enable or disable a message. Once a message is disabled, it will not be retrieved by agents.

#### Parameters

##### message_id: `int`, *Required*

The ID of the message to enable or disable.

##### status: `bool`, *Required*

The status of message. `True` = `enabled`, `False` = `disabled`.

#### Returns

Success: `True`

Failure: `Exception`

#### Examples

```python
from ragflow_sdk import Ragflow, Memory
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
memory_object = Memory(rag_object, {"id": "your memory_id"})
memory_object.update_message_status(message_id, True)
```

---


### Search message

```python
Ragflow.search_message(
    query: str,
    memory_id: list[str],
    agent_id: str=None,
    session_id: str=None,
    user_id: str=None,
    similarity_threshold: float=0.2,
    keywords_similarity_weight: float=0.7,
    top_n: int=10
) -> list[dict]
```

Searches and retrieves messages from memory based on the provided `query` and other configuration parameters.

#### Parameters

##### query: `string`, *Required*

The search term or natural language question used to find relevant messages.

##### memory_id: `list[str]`, *Required*

The IDs of the memories to search. Supports multiple values.

##### agent_id: `string`, *Optional*

The ID of the message's source agent. Defaults to `None`.

##### session_id: `string`, *Optional*

The ID of the message's session. Defaults to `None`.

##### user_id: `string`, *Optional*

The user participating in the conversation with the agent. Defaults to `None`.

##### similarity_threshold: `float`, *Optional*

The minimum cosine similarity score required for a message to be considered a match. A higher value yields more precise but fewer results. Defaults to `0.2`.

- Range [0.0, 1.0]

##### keywords_similarity_weight: `float`, *Optional*

Controls the influence of keyword matching versus semantic (embedding-based) matching in the final relevance score. A value of 0.5 gives them equal weight. Defaults to `0.7`.

- Range [0.0, 1.0]

##### top_n: `int`, *Optional*

The maximum number of most relevant messages to return. This limits the result set size for efficiency. Defaults to `10`.

#### Returns

Success: A list of `message` dict.

Failure: `Exception`

#### Examples

```python
from ragflow_sdk import Ragflow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
rag_object.search_message("your question", ["your memory_id"])
```

---


### Get Recent Messages

```python
Ragflow.get_recent_messages(
    memory_id: list[str],
    agent_id: str=None,
    session_id: str=None,
    limit: int=10
) -> list[dict]
```

Retrieves the most recent messages from specified memories. Typically accepts a `limit` parameter to control the number of messages returned.

#### Parameters

##### memory_id: `list[str]`, *Required*

The IDs of the memories to search. Supports multiple values.

##### agent_id: `string`, *Optional*

The ID of the message's source agent. Defaults to `None`.

##### session_id: `string`, *Optional*

The ID of the message's session. Defaults to `None`.

##### limit: `int`, *Optional*

Control the number of messages returned. Defaults to `10`.

#### Returns

Success: A list of `message` dict.

Failure: `Exception`

#### Examples

```python
from ragflow_sdk import Ragflow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
rag_object.get_recent_messages(["your memory_id"])
```

---


### Get Message Content

```python
Memory.get_message_content(message_id: int)
```

Retrieves the full content and embed vector of a specific message using its unique message ID.

#### Parameters

##### message_id: `int`, *Required*

#### Returns

Success: A `message` dict.

Failure: `Exception`

#### Examples

```python
from ragflow_sdk import Ragflow
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
memory_object = Memory(rag_object, {"id": "your memory_id"})
memory_object.get_message_content(message_id)
```

---