Files
ragflow/test/benchmark/README.md
6ba3i 5b22f94502 Feat: Benchmark CLI additions and documentation (#12536)
### What problem does this PR solve?

This PR adds a dedicated HTTP benchmark CLI for RAGFlow chat and
retrieval endpoints so we can measure latency/QPS.

### Type of change

- [x] Documentation Update
- [x] Other (please describe): Adds a CLI benchmarking tool for
chat/retrieval latency/QPS

---------

Co-authored-by: Liu An <asiro@qq.com>
2026-01-14 13:49:16 +08:00

9.2 KiB

RAGFlow HTTP Benchmark CLI

Run (from repo root):

  PYTHONPATH=./test uv run -m benchmark [global flags] <chat|retrieval> [command flags]
  Global flags can be placed before or after the command.

If you run from another directory:

  PYTHONPATH=/Directory_name/ragflow/test uv run -m benchmark [global flags] <chat|retrieval> [command flags]

JSON args: For --dataset-payload, --chat-payload, --messages-json, --extra-body, --payload

  • Pass inline JSON: '{"key": "value"}'
  • Or use a file: '@/path/to/file.json'

Global flags

  --base-url
    Base server URL.
    Env: RAGFLOW_BASE_URL or HOST_ADDRESS
  --api-version
    API version string (default: v1).
    Env: RAGFLOW_API_VERSION
  --api-key
    API key for Authorization: Bearer <token>.
  --connect-timeout
    Connect timeout seconds (default: 5.0).
  --read-timeout
    Read timeout seconds (default: 60.0).
  --no-verify-ssl
    Disable SSL verification.
  --iterations
    Iterations per benchmark (default: 1).
  --concurrency
    Number of concurrent requests (default: 1). Uses multiprocessing.
  --json
    Output JSON report (plain stdout).
  --print-response
    Print response content per iteration (stdout). With --json, responses are included in the JSON output.
  --response-max-chars
    Truncate printed responses to N chars (0 = no limit).

Auth and bootstrap flags (used when --api-key is not provided)

  --login-email
    Login email.
    Env: RAGFLOW_EMAIL
  --login-nickname
    Nickname for registration. If omitted, defaults to email prefix when registering.
    Env: RAGFLOW_NICKNAME
  --login-password
    Login password (encrypted client-side). Requires pycryptodomex in the test group.
  --allow-register
    Attempt /user/register before login (best effort).
  --token-name
    Optional API token name for /system/new_token.
  --bootstrap-llm
    Ensure LLM factory API key is configured via /llm/set_api_key.
  --llm-factory
    LLM factory name for bootstrap.
    Env: RAGFLOW_LLM_FACTORY
  --llm-api-key
    LLM API key for bootstrap.
    Env: ZHIPU_AI_API_KEY
  --llm-api-base
    Optional LLM API base URL.
    Env: RAGFLOW_LLM_API_BASE
  --set-tenant-info
    Set tenant defaults via /user/set_tenant_info.
  --tenant-llm-id
    Tenant chat model ID.
    Env: RAGFLOW_TENANT_LLM_ID
  --tenant-embd-id
    Tenant embedding model ID.
    Env: RAGFLOW_TENANT_EMBD_ID
  --tenant-img2txt-id
    Tenant image2text model ID.
    Env: RAGFLOW_TENANT_IMG2TXT_ID
  --tenant-asr-id
    Tenant ASR model ID (default empty).
    Env: RAGFLOW_TENANT_ASR_ID
  --tenant-tts-id
    Tenant TTS model ID.
    Env: RAGFLOW_TENANT_TTS_ID

Dataset/document flags (shared by chat and retrieval)

  --dataset-id
    Existing dataset ID.
  --dataset-ids
    Comma-separated dataset IDs.
  --dataset-name
    Dataset name when creating a new dataset.
    Env: RAGFLOW_DATASET_NAME
  --dataset-payload
    JSON body for dataset creation (see API docs).
  --document-path
    Document path to upload (repeatable).
  --document-paths-file
    File containing document paths, one per line.
  --parse-timeout
    Document parse timeout seconds (default: 120.0).
  --parse-interval
    Document parse poll interval seconds (default: 1.0).
  --teardown
    Delete created resources after run.

Chat command flags

  --chat-id
    Existing chat ID. If omitted, a chat is created.
  --chat-name
    Chat name when creating a new chat.
    Env: RAGFLOW_CHAT_NAME
  --chat-payload
    JSON body for chat creation (see API docs).
  --model
    Model field for OpenAI-compatible completion request.
    Env: RAGFLOW_CHAT_MODEL
  --message
    Single user message (required unless --messages-json is provided).
  --messages-json
    JSON list of OpenAI-format messages (required unless --message is provided).
  --extra-body
    JSON extra_body for OpenAI-compatible request.

Retrieval command flags

  --question
    Retrieval question (required unless provided in --payload).
  --payload
    JSON body for /api/v1/retrieval (see API docs).
  --document-ids
    Comma-separated document IDs for retrieval.

Model selection guidance

  • Embedding model is tied to the dataset. Set during dataset creation using --dataset-payload:
      {"name": "...", "embedding_model": "<model_name>@<provider>"}
Or set tenant defaults via --set-tenant-info with --tenant-embd-id.
  • Chat model is tied to the chat assistant. Set during chat creation using --chat-payload:
      {"name": "...", "llm": {"model_name": "<model_name>@<provider>"}}
Or set tenant defaults via --set-tenant-info with --tenant-llm-id.
  • --model is required by the OpenAI-compatible endpoint but does not override the chat assistant's configured model on the server.

What this CLI can do

  • This is a benchmark CLI. It always runs either a chat or retrieval benchmark and prints a report.
  • It can create datasets, upload documents, trigger parsing, and create chats as part of a benchmark run (setup for the benchmark).
  • It is not a general admin CLI; there are no standalone "create-only" or "manage" commands. Use the reports to capture created IDs for reuse.

Do I need the dataset ID?

  • If the CLI creates a dataset, it uses the returned dataset ID internally. You do not need to supply it for that same run.
  • The report prints "Created Dataset ID" so you can reuse it later with --dataset-id or --dataset-ids.
  • Dataset name is only used at creation time. Selection is always by ID.

Examples

Example: chat benchmark creating dataset + upload + parse + chat (login + register)

  PYTHONPATH=./test uv run -m benchmark chat \
    --base-url http://127.0.0.1:9380 \
    --allow-register \
    --login-email "qa@infiniflow.org" \
    --login-password "123" \
    --bootstrap-llm \
    --llm-factory ZHIPU-AI \
    --llm-api-key $ZHIPU_AI_API_KEY \
    --dataset-name "bench_dataset" \
    --dataset-payload '{"name":"bench_dataset","embedding_model":"BAAI/bge-small-en-v1.5@Builtin"}' \
    --document-path test/benchmark/test_docs/Doc1.pdf \
    --document-path test/benchmark/test_docs/Doc2.pdf \
    --document-path test/benchmark/test_docs/Doc3.pdf \
    --chat-name "bench_chat" \
    --chat-payload '{"name":"bench_chat","llm":{"model_name":"glm-4-flash@ZHIPU-AI"}}' \
    --message "What is the purpose of RAGFlow?" \
    --model "glm-4-flash@ZHIPU-AI"

Example: chat benchmark with existing dataset + chat id (no creation)

  PYTHONPATH=./test uv run -m benchmark chat \
    --base-url http://127.0.0.1:9380 \
    --chat-id <existing_chat_id> \
    --login-email "qa@infiniflow.org" \
    --login-password "123" \
    --message "What is the purpose of RAGFlow?" \
    --model "glm-4-flash@ZHIPU-AI"

Example: retrieval benchmark creating dataset + upload + parse

  PYTHONPATH=./test uv run -m benchmark retrieval \
    --base-url http://127.0.0.1:9380 \
    --allow-register \
    --login-email "qa@infiniflow.org" \
    --login-password "123" \
    --bootstrap-llm \
    --llm-factory ZHIPU-AI \
    --llm-api-key $ZHIPU_AI_API_KEY \
    --dataset-name "bench_dataset" \
    --dataset-payload '{"name":"bench_dataset","embedding_model":"BAAI/bge-small-en-v1.5@Builtin"}' \
    --document-path test/benchmark/test_docs/Doc1.pdf \
    --document-path test/benchmark/test_docs/Doc2.pdf \
    --document-path test/benchmark/test_docs/Doc3.pdf \
    --question "What does RAG mean?"

Example: retrieval benchmark with existing dataset IDs

  PYTHONPATH=./test uv run -m benchmark retrieval \
    --base-url http://127.0.0.1:9380 \
    --login-email "qa@infiniflow.org" \
    --login-password "123" \
    --dataset-ids "<dataset_id_1>,<dataset_id_2>" \
    --question "What does RAG mean?"

Example: retrieval benchmark with existing dataset IDs and document IDs

  PYTHONPATH=./test uv run -m benchmark retrieval \
    --base-url http://127.0.0.1:9380 \
    --login-email "qa@infiniflow.org" \
    --login-password "123" \
    --dataset-id "<dataset_id>" \
    --document-ids "<doc_id_1>,<doc_id_2>" \
    --question "What does RAG mean?"

Quick scripts

These scripts create a dataset, upload/parse docs from test/benchmark/test_docs, run the benchmark, and clean up. The both script runs retrieval then chat on the same dataset, then deletes it.

  • Make sure to run uv sync --python 3.12 --group test before running the commands.
  • It is also necessary to run these commands prior to initializing your containers if you plan on using the built-in embedded model: echo -e "TEI_MODEL=BAAI/bge-small-en-v1.5" >> docker/.env and echo -e "COMPOSE_PROFILES=\${COMPOSE_PROFILES},tei-cpu" >> docker/.env

Chat only:

  ./test/benchmark/run_chat.sh

Retrieval only:

  ./test/benchmark/run_retrieval.sh

Both (retrieval then chat on the same dataset):

  ./test/benchmark/run_retrieval_chat.sh

Requires:

  • ZHIPU_AI_API_KEY exported in your shell.

Defaults used:

  • Base URL: http://127.0.0.1:9380
  • Login: qa@infiniflow.org / 123 (with allow-register)
  • LLM bootstrap: ZHIPU-AI with $ZHIPU_AI_API_KEY
  • Dataset: bench_dataset (BAAI/bge-small-en-v1.5@Builtin)
  • Chat: bench_chat (glm-4-flash@ZHIPU-AI)
  • Chat message: "What is the purpose of RAGFlow?"
  • Retrieval question: "What does RAG mean?"
  • Iterations: 1
  • concurrency:f 4