Commit Graph

1634 Commits

Author SHA1 Message Date
6499bce2a6 fix: Langfuse chat observation (#15026)
### What problem does this PR solve?

Closes #15025

Langfuse-enabled `dialog_service.async_chat()` regressed to
`langfuse_tracer.start_generation(...)` after the earlier Langfuse v4
migration. Langfuse v4 uses `start_observation(as_type="generation")`,
so the remaining `start_generation` call can fail when chat tracing is
enabled.

This restores the migrated `start_observation(as_type="generation")`
call for chat observations while preserving the existing trace context,
model, input payload, and update/end flow. It also adds a regression
test with a fake Langfuse v4-style client that exposes
`start_observation()` but not `start_generation()`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Tests

- `.venv/bin/pytest
test/unit_test/api/db/services/test_dialog_service_final_answer.py -q`
- `.venv/bin/ruff check api/db/services/dialog_service.py
test/unit_test/api/db/services/test_dialog_service_final_answer.py`
2026-05-20 15:01:19 +08:00
ce3402cbb9 Fix: restore saved api_key fallback in add_llm (#14921) (#14941)
## Summary

Closes #14921.

Reconfiguring an existing LLM provider to enable **tool call** or
**vision** fails with `Your API key is invalid. Fail to access model.`
even when the saved API key is correct. The most visible report is
VLLM ("Cannot add vllm model" once `--enable-auto-tool-choice` /
vision is toggled on), but the bug applies to every provider whose
api_key field stays blank in edit mode.

## Root cause

PR #14885 ("Fix: llm add api key overridden") removed the existing-key
lookup in `api/apps/llm_app.py::add_llm`. The intent was correct —
stop the saved key from clobbering a user-provided new one — but the
removal was unconditional, so the edit path now has no fallback at all:

1. `web/src/pages/user-setting/setting-model/hooks.tsx:230` sets the
   initial `api_key` form value to `''` in edit mode (the real key is
   never returned to the browser).
2. The user toggles `is_tools` / `vision` without retyping the key.
3. `hooks.tsx:183-185` strips the empty `api_key` from the payload.
4. `add_llm` defaults to the placeholder `"x"`
   (`api/apps/llm_app.py:182`).
5. The upstream provider rejects `"x"` with `Your API key is invalid`.

## Fix

Restore the fallback **narrowly**, before any factory-specific handler
runs:

- If `req.get("api_key") is None`, look up the tenant's existing record
  (using the correctly suffixed `llm_name` for VLLM /
  OpenAI-API-Compatible / LocalAI / HuggingFace).
- Decode the saved blob with `_decode_api_key_config` and write **only
  the decoded `api_key` string** back into `req["api_key"]`. Never use
  the raw JSON payload — that was the exact thing PR #14885 was trying
  to avoid.
- When the user **does** type a new key, `req.get("api_key")` is not
  `None` and the fallback is skipped, so PR #14885's fix is preserved.

| Scenario | Before this PR | After this PR |
|---|---|---|
| Plain factory (VLLM, Ollama, …), retype key | OK | OK |
| Plain factory, blank key in edit (the bug) | Fails with "API key is
invalid" | Recovers saved key, validates against the real one |
| OpenRouter / Bedrock, change `provider_order` only | Fails |
`apikey_json([...])` rebuilds the JSON with saved `api_key` + new field
|
| User clears the form and types a brand-new key | OK (key replaced) |
OK (key replaced — fallback skipped) |

## Files changed

- `api/apps/llm_app.py` — restored fallback in `add_llm` (no other call
sites touched).

## Test plan

- [ ] Add a VLLM chat model with a valid api_key, no toggles → save
succeeds.
- [ ] Edit the same model, toggle **tool call** on, leave api_key blank
      → save succeeds, validation runs against the saved key.
- [ ] Edit again, toggle **vision** on (model_type → `image2text`),
      leave api_key blank → save succeeds.
- [ ] Edit again and **type a new api_key** → the new key replaces the
      saved one (`is None` check skips the fallback). Verify via the DB
      row or by deliberately typing a wrong key and observing the
      validation failure.
- [ ] Repeat the blank-key edit with **OpenRouter**, changing only
      `provider_order` → resulting api_key JSON contains the saved
      `api_key` and the new `provider_order`.
- [ ] First-time add of a new model name → no existing record, fallback
      no-ops, behaves as before.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2026-05-19 15:32:09 +08:00
f169ab4b39 feat(tts): cache synthesized speech in Redis to avoid redundant calls (#14851)
## What problem does this PR solve?

Closes #12017.

TTS output is deterministic for a given `(model, text)` pair, so
re-running the same text through the same TTS model produces the same
bytes — yet `Canvas.tts` and `dialog_service.tts` re-synthesized on
every request. That's slow and wastes provider quota whenever the same
assistant response is replayed, shared across users, or repeated within
a session.

### Change

New helper `rag/utils/tts_cache.py` with `synthesize_with_cache(tts_mdl,
cleaned_text)`:

- **Key:** `tts:cache:{model_id}:{sha256(text)}` — separate namespace
per model, identical cleaned text reuses a single entry across both call
sites.
- **Value:** the hex-encoded audio blob both call sites already
returned. No format change for downstream consumers.
- **TTL:** 7 days by default, configurable via
`RAGFLOW_TTS_CACHE_TTL_SECONDS`.
- **Failure modes:** a Redis hiccup falls back to direct synthesis; a
failed synthesis still returns `None` (existing contract preserved).


[`Canvas.tts`](https://github.com/infiniflow/ragflow/blob/main/agent/canvas.py#L683-L724)
and
[`dialog_service.tts`](https://github.com/infiniflow/ragflow/blob/main/api/db/services/dialog_service.py#L1367-L1380)
now route through the helper; the per-file bytes-accumulation/hex-encode
loop has been removed in favor of one shared implementation.

## Type of change

- [x] New Feature (non-breaking change which adds functionality)

## Test plan

- [ ] **Cache hit, chat path:** Configure a dialog with TTS enabled, ask
the same question twice with `stream=false`. Verify the second response
returns the same `audio_binary` and that the second invocation doesn't
hit the TTS provider (e.g., observe provider-side logs / usage counters;
check no `LLMBundle.tts can't update token usage` log line on the second
run).
- [ ] **Cache hit, agent path:** Same exercise via a Conversational
Agent that includes a Message component playing back the answer.
- [ ] **Cache isolation per model:** Switch tenant's `tts_id` between
two models, run the same text against each — confirm the second model's
first synthesis still happens (no cross-model hits).
- [ ] **TTL override:** Set `RAGFLOW_TTS_CACHE_TTL_SECONDS=120`, confirm
the entry expires after 2 minutes.
- [ ] **Redis unavailable:** Stop Redis (or break the connection).
Verify the TTS endpoint still works — synthesis falls back to direct
calls, with a `TTS cache lookup failed` / `TTS cache store failed`
warning logged.
- [ ] **Failure path:** Configure a TTS model with an invalid API key,
ensure the response still returns successfully with `audio_binary=None`
(no regression vs. current behavior).
2026-05-19 14:20:40 +08:00
87d22a4415 Fix: agent session log message (#14991)
### What problem does this PR solve?

agent session log message
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-19 12:00:02 +08:00
525a87be0f Misc: fix some typos (#14987)
### What problem does this PR solve?

Fix minor code quality issues:

1. Fix typo in assertion error message: "Can't fine" → "Can't find"
2. Remove duplicate line in common/connection_utils.py

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2026-05-19 10:47:06 +08:00
198f3c4b9a Fix: validate memory tenant model IDs on update and enforce tenant scope in memory pipeline (#14923)
### Related issues

Closes #14922

### What problem does this PR solve?

`POST /memories` already resolves `tenant_llm_id` and `tenant_embd_id`
through `ensure_tenant_model_id_for_params`, but `PUT
/memories/<memory_id>` accepted client-supplied `tenant_llm_id` /
`tenant_embd_id` without checking that those `tenant_llm` rows belong to
the memory owner’s tenant. A caller could persist another tenant’s row
IDs and later trigger extraction or embedding that loaded foreign model
credentials via `get_model_config_by_id(tenant_model_id)` with no tenant
allow-list.

This change aligns the update path with create: updates that change
models must go through `llm_id` / `embd_id` and
`ensure_tenant_model_id_for_params` scoped to the **memory’s**
`tenant_id` (not only the current user, so team-access cases stay
correct). Direct `tenant_*` fields in the body without `llm_id` /
`embd_id` are rejected. As defense in depth, `memory_message_service`
passes `allowed_tenant_ids` / `requester_tenant_id` into
`get_model_config_by_id` for LLM and embedding resolution so mismatched
IDs cannot be used even if bad data existed. A regression test rejects
payloads that set only `tenant_llm_id` / `tenant_embd_id`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: jony376 <jony376@gmail.com>
2026-05-19 10:11:46 +08:00
b69a6a5d80 Feat: full optimization on connector dashboard (#14979)
### What problem does this PR solve?

This PR improves the connector dashboard task management experience and
adds better visibility into connector execution logs.

### Overview:

#### Before
<img width="700" alt="image"
src="https://github.com/user-attachments/assets/e4a8ed6f-2e18-4f0f-8528-41a514550052"
/>

#### Now:
<img width="700" alt="Screenshot from 2026-05-18 16-31-30"
src="https://github.com/user-attachments/assets/d4ca193b-847a-49ae-9e4f-5fbca60ea627"
/>

### 1. Add a new logging page to the connector dashboard

A new logging page has been added so users can view connector task
execution logs directly from the connector dashboard.

### 2. Merge the Resume button into Confirm

The separate **Resume** button has been removed. The **Confirm** button
now represents different actions depending on the current task state:

- **Save**: Save form changes and reschedule tasks.
- **Stop**: Cancel currently scheduled or running tasks.
- **Resume**: Create new scheduled tasks after the previous tasks have
been stopped.
- **Start**: Start tasks when no task has been started yet.

### 3. Separate syncing and pruning tasks

Connector tasks are now separated into **syncing** and **pruning**.

Pruning is controlled by the **Sync deleted files** option:

- When **Sync deleted files** is disabled, only syncing tasks are shown.
- When **Sync deleted files** is enabled, both syncing and pruning tasks
are shown.

**Now: Sync deleted files disabled**

<img width="700" alt="Sync deleted files disabled"
src="https://github.com/user-attachments/assets/dbd9232e-614a-407f-a0b1-c109e5fa567d"
/>

**Now: Sync deleted files enabled**

<img width="700" alt="Sync deleted files enabled"
src="https://github.com/user-attachments/assets/1f527f48-ccb3-4ee8-97ca-086891489296"
/>

### 4. Update logs in backend

<img width="700" alt="image"
src="https://github.com/user-attachments/assets/10a95a3f-98c1-4e67-8afa-ddf6cda5b0b2"
/>

### 5. Remove connector resume API

- Removed: `POST /v1/connectors/<connector_id>/resume`
- Replaced by: `PATCH /v1/connectors/<connector_id>`


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-19 10:07:11 +08:00
93d3deb5e4 Fix admin CLI system variable commands (#14956)
## What

Fixes #12409.

Implements admin CLI support for:

- `list vars;`
- `show var <name-or-prefix>;`
- `set var <name> <value>;`

## Changes

- Wire Go CLI variable commands to the admin API.
- Support integer and quoted string values in `SET VAR`.
- Return variable rows as `data_type`, `name`, `setting_type`, and
`value`.
- Add exact-name lookup with prefix fallback for `SHOW VAR`.
- Validate values by stored data type: `string`, `integer`, `bool`, and
`json`.
- Keep the legacy Python admin CLI/server behavior aligned.
- Update admin CLI docs and add focused tests.

## Verification

- `go test -count=1 ./internal/cli`
- `python3.12 -m py_compile admin/server/services.py
admin/server/routes.py api/db/services/system_settings_service.py
admin/client/parser.py admin/client/ragflow_client.py`
- Python admin CLI parser smoke test for `SET VAR`, quoted values, `SHOW
VAR`, and `LIST VARS`.
- Attempted `./run_go_tests.sh`; local environment is missing native
tokenizer/linker artifacts:
  - `internal/cpp/cmake-build-release/librag_tokenizer_c_api.a`
  - `-lstdc++`

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-18 19:08:45 +08:00
732e4741c4 Bugfix: fix tag show (#14980)
### What problem does this PR solve?

Bugfix: fix tag show

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-18 18:55:01 +08:00
2dbe3b8a62 fix: metadata_condition returning all docs when filter matches nothing (#14967)
### What problem does this PR solve?

When _parse_doc_id_filter_with_metadata returns [], the empty list is
falsy so the WHERE id IN (...) clause was silently skipped, causing the
full dataset to be returned instead of an empty result.

Change `if doc_ids:` to `if doc_ids is not None:` in both get_list() and
get_by_kb_id() to distinguish between no filter (None) and a filter that
matched zero documents ([]).

Fixes #14962

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-18 18:54:30 +08:00
13b422037f Refactor: enhance graphrag - part 2 (#14972)
### What problem does this PR solve?
1. expose batch_chunk_token_size for configuration
2. retrieve chunks when build subgraph for the doc, not retreive all
docs chunks at the begining
3. get all chunks for a document, used to be hard coded 10000
4. delete not used method run_graphrag

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring

Follow on: #14617
2026-05-18 16:10:21 +08:00
dev
b12eaee38b fix(api): enforce tenant access for connector routes (#14747)
### What problem does this PR solve?

Fixes #14746.

Adds tenant access checks for connector-by-id REST routes before reading
connector details, mutating connector config/status, deleting
connectors, rebuilding, or listing sync logs. Unauthorized callers now
receive `RetCode.AUTHENTICATION_ERROR` with `No authorization.` without
reaching the connector/log mutation paths.

Validation:
- `python3 -m pytest
--confcutdir=test/testcases/test_web_api/test_connector_app
test/testcases/test_web_api/test_connector_app/test_connector_routes_unit.py`
- `uvx ruff check api/apps/restful_apis/connector_api.py
api/db/services/connector_service.py
test/testcases/test_web_api/test_connector_app/test_connector_routes_unit.py`

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: dev111-actor <dev111-actor@users.noreply.github.com>
2026-05-18 16:09:26 +08:00
56d73d0c2c Refactor: speed up ragflow server, save startup memory (#14973)
### What problem does this PR solve?

Refactor: speed up ragflow server, save startup memory, saved 200MiB,
and 5-9 seconds start time.

##### Before
1241292  |   |           \_ python3 api/ragflow_server.py
RAGFlow server is ready after 25.61845850944519s initialization.

##### After
1019968  |   |           \_ python3 api/ragflow_server.py
RAGFlow server is ready after 16.205134391784668s initialization.

### Type of change

- [x] Refactoring
2026-05-18 15:55:59 +08:00
fe82a96193 Fix: add SSRF guard for agent test_db_connection endpoint (#14860)
### What problem does this PR solve?

Closes #14858

The `test_db_connection` endpoint in the agent API accepts a
user-supplied `host` and connects to it directly via database drivers
(MySQL/PostgreSQL) without any validation. This allows an attacker to
probe internal network addresses (e.g. `127.0.0.1`, `10.x.x.x`,
link-local, etc.) through the server — a classic Server-Side Request
Forgery (SSRF) vulnerability.

This PR adds an SSRF guard that resolves the host and rejects any
address that is not globally routable before the database connection is
attempted.

**Changes:**
- **`common/ssrf_guard.py`** — Added `assert_host_is_safe()`, a
host-level counterpart of the existing `assert_url_is_safe()`, designed
for non-HTTP protocols (database drivers) where there is no URL to
parse.
- **`api/apps/restful_apis/agent_api.py`** — Call
`assert_host_is_safe(req["host"])` at the top of `test_db_connection` so
that non-public hosts are rejected early with a clear error message.

Fixes #14858

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-18 14:32:44 +08:00
f1d2383572 Push metadata filters down to Infinity (#14974)
### What problem does this PR solve?

Push metadata filters down to Infinity

### Type of change

- [x] Refactoring
2026-05-18 14:22:04 +08:00
7cdc74bbe5 Refactor: Drop the vector fetch for ES (#14970)
## Summary
- Stop pulling chunk vectors (`q_*_vec`) back from Elasticsearch in the
main retrieval path. ES already knows them; shipping them was pure
bandwidth/memory overhead.
- Recover the per-chunk cosine similarity via a second KNN-only ES call
filtered by the candidate chunk ids. The new `_score` is merged with
locally computed term similarity using the user-configured
`vector_similarity_weight`.
- Lazily fetch the chunk embedding only for the chunks
`insert_citations` actually needs.

## Details
**`rag/nlp/search.py`**
- `Dealer.search`: no longer appends `q_*_vec` to the ES select list.
OceanBase still gets it (its rerank path is unchanged).
- New `Dealer._knn_scores(sres, idx_names, kb_ids)`: a `MatchDenseExpr`
over the cached query vector filtered by `id IN sres.ids`, returning
`{chunk_id: cosine_score}` via ES `_score`.
- New `Dealer.rerank_with_knn(...)`: term similarity from
`qryr.token_similarity` plus the ES-supplied KNN score, combined with
`tkweight`/`vtweight` and the existing rank-feature bonus.
- New `Dealer.fetch_chunk_vectors(chunk_ids, tenant_ids, kb_ids, dim)`:
on-demand vector fetch for citation use.
- `Dealer.retrieval` routes Infinity → unchanged, OceanBase → existing
local `rerank`, ES → new KNN-score path.

**`common/doc_store/es_conn_base.py`**
- New `get_scores(res)` helper returning `{_id: _score}` directly from
hit headers (ES doesn't surface `_score` through `get_fields`).

**`api/db/services/dialog_service.py`**
- New top-level `_hydrate_chunk_vectors(...)` helper. On ES it
back-fills `ck["vector"]` from `fetch_chunk_vectors` right before
`insert_citations`. No-op on Infinity / OB (their chunks already carry
vectors).
- Both `decorate_answer` closures became `async` and are `await`-ed at
all call sites in `async_chat` and `async_ask`.

## Backend behavior
| Backend | Returns chunk vec in main search | Sim source | Vectors for
citations |
|---|---|---|---|
| ES | No | second KNN call (`_score`) merged with term sim | fetched on
demand |
| Infinity | No (unchanged) | normalized `_score` | already on chunks |
| OceanBase | Yes (kept) | local hybrid rerank | already on chunks |

## Test plan
2026-05-18 14:21:56 +08:00
9f2fb4611f Fix: guard empty/whitespace embedding inputs in LLMBundle (#14428) (#14924)
Closes #14428 


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-18 14:11:54 +08:00
e98f3e5c0d Fix session deletion leaking chat-upload blobs (#14969)
### What problem does this PR solve?

This fixes a bug where files uploaded in chat were left in storage after
the session was deleted. It now removes those chat-uploaded blobs during
session deletion. fixes #14965

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-18 11:14:27 +08:00
14c0985182 feat: bump Python minimum from 3.12 to 3.13, drop strenum backport (#14767)
Closes #14753

## What changed

| File | Change |
|---|---|
| `pyproject.toml` | `requires-python` → `>=3.13,<3.15`; remove
`strenum==0.4.15` |
| `Dockerfile` | `uv python install 3.13`, `uv sync --python 3.13` |
| `.github/workflows/tests.yml` | `uv sync --python 3.13` on both matrix
legs |
| `CLAUDE.md` | dev setup command + requirements note updated |
| `deepdoc/parser/mineru_parser.py` | `from strenum import StrEnum` →
`from enum import StrEnum` |
| `agent/tools/code_exec.py` | same |

`StrEnum` has been in the stdlib since Python 3.11 — the `strenum`
backport package is no longer needed once the floor is 3.13.

## Why uv.lock is not regenerated

`uv lock --python 3.13` fails because:

1. The infiniflow/graspologic fork pins `numpy>=1.26.4,<2.0.0`
2. `tensorflow-cpu>=2.20.0` (the first release with cp313 wheels)
depends on `ml-dtypes>=0.5.1`, which requires `numpy>=2.1.0`
3. These two constraints are irreconcilable on Python 3.13

The lockfile regeneration requires loosening the `numpy` upper bound in
the `infiniflow/graspologic` fork. Once that fork commit is updated and
the SHA in `pyproject.toml:49` is bumped, `uv lock --python 3.13` will
succeed.

## RFC corrections

Two claims in the original RFC (#14753) did not hold up under code
review:

- **"graspologic hard-blocks 3.13"** — the infiniflow fork at the pinned
commit has no `<3.13` Python constraint. The blocker is the transitive
`numpy<2.0.0` conflict with tensorflow-cpu's test dependency, not a
direct Python version cap.
- **"free-threading throughput gains for I/O-bound workload"** — Python
3.13 free-threading requires a special `--disable-gil` build and
provides no benefit for async I/O code (the GIL is already released
during I/O). The real motivation is forward compatibility and improved
error messages.
2026-05-15 14:40:53 +08:00
c9622d0924 fix(agentbot): aggregate structured output in non-streaming completions (#14848)
## What problem does this PR solve?

Closes #13384.

The `/api/v1/agentbots/<agent_id>/completions` non-streaming path
returned the first yielded SSE chunk and exited:

```python
async for answer in agent_completion(objs[0].tenant_id, agent_id, **req):
    return get_result(data=answer)
```

That meant structured output, the full assistant message, and reference
data were all dropped when an agent was called with `stream=false`.
Streaming worked because each event was forwarded individually;
non-streaming was returning a raw SSE-formatted string from a single
early event.

The v1 endpoint at
[`agent_api.py:1006-1050`](https://github.com/infiniflow/ragflow/blob/main/api/apps/restful_apis/agent_api.py#L1006-L1050)
already handles this correctly. This PR mirrors that aggregation in the
SDK beta endpoint: parse each SSE line, accumulate `content` from
`message` events, merge `reference`, collect `outputs.structured` from
each `node_finished` event keyed by `component_id`, and attach all of
them to the final response.

## Type of change

- [x] Bug fix (non-breaking change which fixes an issue)

## Test plan

- [ ] Build an agent with a node that emits structured output, call
`POST /api/v1/agentbots/<agent_id>/completions` with `stream=false` and
a beta API token, verify `data.structured.<component_id>` is present in
the response.
- [ ] Same agent with `stream=true` — verify behavior is unchanged.
- [ ] Agent without structured output — verify `data.structured` is
omitted, `content` and `reference` still aggregated correctly.
2026-05-15 12:42:33 +08:00
547b8cf9d8 security: always use RestrictedUnpickler in deserialize_b64 (CWE-502) (#14803)
## Summary

Harden `api/utils/configs.deserialize_b64` so that it always routes
pickle data through the existing `RestrictedUnpickler`
(`restricted_loads`) rather than falling back to bare `pickle.loads()`.

- **CWE-502** — Deserialization of Untrusted Data
- **File / function**: `api/utils/configs.py` → `deserialize_b64`
- **Caller**: `SerializedField.python_value` in `api/db/db_models.py`
(invoked by Peewee whenever a pickled DB column is read)

## The issue

Before this change, `deserialize_b64` consulted a
`use_deserialize_safe_module` config flag that **defaults to `False`**
and is not set anywhere in the repository:

```python
use_deserialize_safe_module = get_base_config('use_deserialize_safe_module', False)
if use_deserialize_safe_module:
    return restricted_loads(src)
return pickle.loads(src)   # <-- default path
```

So the default code path was unrestricted `pickle.loads()` on bytes read
from a MySQL `SerializedField(serialized_type=PICKLE)` column. Any
attacker who can influence those bytes (SQL injection elsewhere,
compromised DB credentials, a backup restored from an untrusted source,
or a compromised replication peer) can craft a pickle payload that
achieves arbitrary code execution on the ragflow application server when
the field is next read.

Today no model in-tree instantiates a `SerializedField` with the default
PICKLE type — only `JsonSerializedField` is used in practice — so the
attack surface is currently **latent** rather than actively reachable
through an HTTP endpoint. But the insecure-by-default behaviour is a
sharp edge: any future field that uses the default PICKLE serialization
would silently inherit RCE-on-read semantics.

## The fix

```diff
-    use_deserialize_safe_module = get_base_config(
-        'use_deserialize_safe_module', False)
-    if use_deserialize_safe_module:
-        return restricted_loads(src)
-    return pickle.loads(src)
+    return restricted_loads(src)
```

`restricted_loads` is the existing `RestrictedUnpickler` already defined
in the same file, which limits permitted modules to `numpy` and
`rag_flow`. The config flag (and the now-dead `get_base_config` import)
are removed.

Diff is 1 insertion / 6 deletions, scoped to a single function.

## Testing

- Built a malicious pickle whose `__reduce__` resolves to
`posix.system('id')`. Pre-fix: executes. Post-fix: `restricted_loads`
raises `UnpicklingError: global 'posix.system' is forbidden`.
- Round-tripped a benign `numpy.ndarray` through `serialize_b64` →
`deserialize_b64`. Values preserved bit-for-bit.
- Confirmed `use_deserialize_safe_module` is not set in any config file
in the tree, so removing the flag does not change any operator-facing
knob that was actually in use.

## A note on `restricted_loads` itself

The existing `SECURITY.md` notes that `restricted_loads`'s `numpy`
allow-list can still be reached via `numpy.f2py.diagnose.run_command`.
This PR does **not** attempt to fix that — it is a separate hardening
question about tightening the allow-list to specific symbols rather than
whole modules. The change here strictly improves on the status quo (bare
`pickle.loads`) and brings the default path in line with what the
`restricted_loads` helper was clearly designed for. Happy to follow up
with a separate PR narrowing the allow-list if that direction is
welcome.

## Adversarial review

Before submitting, we tried to argue this finding away. The two
strongest objections are (1) "no field uses PICKLE today, so this is
unreachable" — true, but the default behaviour of a security-sensitive
helper still matters because new fields silently inherit it; and (2)
"the attacker already needs DB write access, which is game over" —
partially true, but pickle-RCE meaningfully escalates *data tampering*
into *code execution on the application host* (filesystem, internal
network, in-process secrets), which is not equivalent. The fix is one
line of real code, has no behavioural cost for legitimate callers, and
removes an insecure default. We decided it was worth filing.

---

<sub>_Submitted by Sebastion — autonomous open-source security research
from [Foundation Machines](https://foundationmachines.ai). Free for
public repos via the [Sebastion AI GitHub
App](https://github.com/marketplace/sebastion-ai)._</sub>
2026-05-15 10:58:27 +08:00
58819f5d3e fix: add document download endpoint and refactor existing download function (#14927)
### What problem does this PR solve?

add document download endpoint and refactor existing download function

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-15 09:36:58 +08:00
a98994ff91 fix: close db connections reliably in test_db_connection (#14777)
## Summary

- Fixes resource-management bugs in the `POST
/agents/test_db_connection` endpoint where database connections could be
left open on error (part of #14750)

## Changes

- `api/apps/restful_apis/agent_api.py` — `test_db_connection`:
- mysql / mariadb / oceanbase / postgres: replaced bare `db.connect()` /
`db.close()` fallthrough with `with db.connection_context()` and a probe
`SELECT 1` — guaranteed close on both success and exception
- mssql: nested `try/finally` blocks so `cursor.close()` and
`db.close()` are always called even when `cursor.execute()` raises
  - trino: wrapped cursor ops in `try/finally` for the same reason
- Removed the `if req["db_type"] != "mssql": db.connect(); db.close()`
shared fallthrough block — each branch now owns its teardown
- Consolidated to a single `return get_json_result(...)` after the
if/elif chain
2026-05-14 16:45:44 +08:00
bd99a22661 fix: atomic chunk/token counter updates for documents and knowledge b… (#14867)
### What problem does this PR solve?

Fixes #14866.

Previously, `DocumentService.increment_chunk_num` and
`decrement_chunk_num` updated the `Document` row and its parent
`Knowledgebase` row in two separate, non-transactional statements. If
the second update failed (DB error, connection drop, etc.) after the
first one succeeded, the document and knowledge base chunk/token
counters would drift apart and stay inconsistent.

There was also a behavioral asymmetry between the two methods:

- `increment_chunk_num` only logged a warning when the document row was
missing and returned a value that callers usually treated as success.
- `decrement_chunk_num` raised `LookupError` in the same situation.

This PR makes the counter updates atomic and aligns the missing-document
behavior between the two methods:

- Wrap the `Document` and `Knowledgebase` updates in
`increment_chunk_num` / `decrement_chunk_num` inside a `DB.atomic()`
block so both succeed or both roll back together.
- Raise `LookupError` from `increment_chunk_num` when the target
document no longer exists, matching `decrement_chunk_num`.
- Update `reset_document_for_reparse` in `document_api_service.py` to
catch the new `LookupError` and return a proper "Document not found!"
API error instead of propagating the exception.

No schema changes, no API contract changes for the success path; only
the failure mode for a missing document during reparse is now a clean
error response instead of an uncaught exception.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-14 14:48:52 +08:00
ba8cb9dd4a fix: replace mutable default arguments with None in LLM chat models (#13513)
## Summary
- Replace `gen_conf={}` with `gen_conf=None` + guard in
`rag/llm/chat_model.py` (12 instances across Base, BaiChuanChat,
LocalLLM, MistralChat, ReplicateChat, BaiduYiyanChat, GoogleChat
classes)
- Replace `doc_ids=[]` with `doc_ids=None` + guard in
`api/db/services/document_service.py` (1 instance)
- Mutable default arguments are shared across all calls, causing
potential cross-request state contamination
- See Python docs:
https://docs.python.org/3/faq/programming.html#why-are-default-values-shared-between-objects

## Test plan
- [x] Verify LLM calls work with and without explicit gen_conf
- [x] No behavior change for existing callers — `None` is replaced with
`{}` at function entry

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2026-05-14 14:46:47 +08:00
714f777fa0 Fix: missing authentication on agent file upload and download endpoints (#14854)
### What problem does this PR solve?

Closes #14853

The `/agents/download` and `/agents/<agent_id>/upload` endpoints in the
agent API are missing `@login_required` and `@add_tenant_id_to_kwargs`
decorators, allowing unauthenticated access. This is a security issue —
any user can upload files to or download files from an agent without
being logged in. Additionally, the upload endpoint bypasses canvas
access control (`@_require_canvas_access_async`).
This PR adds the missing authentication and authorization decorators to
both endpoints and replaces the manual `user_id` / `created_by` lookups
with the `tenant_id` provided by the auth middleware, making these
endpoints consistent with the rest of the agent API.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-14 13:48:41 +08:00
48b4aa3e93 Fix WebDriver resource leak in HTML-to-PDF conversion (#14310)
### What problem does this PR solve?

In `api/utils/web_utils.py`, `__get_pdf_from_html()` creates a Chrome
WebDriver but only calls `driver.quit()` inside the `TimeoutException`
handler. If the page element becomes stale before the timeout (no
exception raised), the WebDriver is never quit, leaking the Chrome
browser process and returning `None`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Changes

- Move the PDF printing logic and `driver.quit()` outside the `except`
block so they execute on all code paths
- Use `try/finally` to ensure `driver.quit()` is always called, even if
the `Page.printToPDF` DevTools call fails

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-14 13:28:58 +08:00
d46bbd30f7 Fix: send input and output token usage to Langfuse (#13294)
### What problem does this PR solve?

Closes #9837

The Langfuse integration currently only sends the output text to
`langfuse_generation.update()` without including token usage
information. This means Langfuse cannot track input/output token
consumption for cost analysis and monitoring.

### Solution

Add the `usage` parameter to `langfuse_generation.update()` with:
- `input`: approximate input token count from `message_fit_in()`
- `output`: approximate output token count from
`num_tokens_from_string(answer)`
- `total`: sum of input and output

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2026-05-14 13:11:37 +08:00
b89878c593 Fix: dataset document download route (#14910)
### What problem does this PR solve?

dataset document download route
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-14 10:59:06 +08:00
dd76653dc1 feat: add tag management for Agents with filtering and sorting (#14774) (#14799)
## Summary
Closes #14774.

Adds free-form tags on agents (UserCanvas) with full UI + API:
- Stored as comma-separated `tags` column on `UserCanvas` with online
migration.
- New endpoints: `GET /v1/agents/tags` (aggregate counts) and `PUT
/v1/agent/<id>/tags` (write). `GET /v1/agents` accepts a `tags=` query.
- "Edit tags" item in agent dropdown opens a chip-style editor dialog;
tags render as badges on each agent card.
- New "Tags" facet in the agents filter bar, with counts.

## Implementation notes
- **Tag matching is exact-token**: the SQL filter wraps stored tags as
`,…,` and matches `,ml,` so `ml` doesn't match `ml-ops`.
- **Server-side normalization** in `UserCanvasService.update_tags`:
dedup (case-insensitive), per-tag cap of 64 chars, total length capped
at 512 chars to fit the column, commas inside tag values are replaced
with spaces.
- **Tenant authorization**: `PUT /v1/agent/<id>/tags` gates on
`UserCanvasService.accessible(canvas_id, tenant_id)`.
- **Tag listing scope**: `UserCanvasService.list_tags` follows the same
own + team-shared rule as `get_by_tenant_ids`.
- **i18n**: keys added to `en.ts` and `zh.ts` only (per project
convention; other locales fall back).
- **`HomeCard`** gets a non-breaking `extra?: ReactNode` slot for the
chip row; no `src/components/ui/` files modified.

## Test plan
- [ ] Backend boot runs `migrate_db` → confirm `user_canvas.tags` column
exists (`DESCRIBE user_canvas`).
- [ ] Agents page renders cards normally (no console error from missing
field).
- [ ] `⋯ → Edit tags` opens a dialog that stays open (regression: dialog
was unmounting with the dropdown).
- [ ] Typing a tag without pressing Enter and clicking Save persists it
(regression: last typed tag was being dropped).
- [ ] Chip input supports Enter/comma to commit, Backspace on empty to
remove, `×` to remove individual chip.
- [ ] Tag containing a comma sent via API is stored with the comma
replaced by a space.
- [ ] 20 long tags sent via API does not error (length cap silently
truncates).
- [ ] "Tags" filter in the filter bar shows counts and narrows the list.
- [ ] Filtering by `ml` does **not** return agents tagged `ml-ops`.
- [ ] UI in Chinese shows 编辑标签 / 添加标签以整理和筛选你的智能体 etc.
- [ ] `PUT /v1/agent/<other-tenant-id>/tags` returns `Agent not found or
no permission.`
2026-05-13 21:41:32 +08:00
8c5845f6ca fix: use context manager for pdfplumber to prevent resource leak (#13512)
## Summary
- Convert `pdfplumber.open()` to use `with` context manager in
`api/utils/file_utils.py` (`thumbnail_img` function)
- If any exception occurs between `open()` and `close()`, the PDF file
handle leaks
- The rest of the codebase (e.g. `read_potential_broken_pdf` in the same
file) already uses `with pdfplumber.open(...)` correctly

## Test plan
- [x] PDF thumbnail generation works correctly with context manager
- [x] Resources properly cleaned up on exceptions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-13 21:09:51 +08:00
e994051eb9 Feature/generic api connector (#13545)
# feat: Add Generic REST API Connector

## What problem does this PR solve?

RAGFlow supports many specific data source connectors (MySQL, Slack,
Google Drive, etc.), but there was no way to connect an arbitrary REST
API as a data source. Users with custom or third-party APIs had to write
a new connector class for each one.

This PR adds a **generic, configuration-driven REST API connector** that
lets users connect any REST API as a data source entirely through the UI
— no code changes needed per API.

---

## Features

### Core Connector (`common/data_source/rest_api_connector.py`)

- Implements `LoadConnector` and `PollConnector` interfaces for full and
incremental sync
- **Configurable authentication:** None, API Key (custom header), Bearer
Token, Basic Auth
- **Pluggable pagination:** Page-based, Offset-based, Cursor-based, or
None
- Smart page-size inference from user's query parameters to avoid
duplicate/conflicting params
- Configurable request delay between pages to prevent API rate limiting
- Auto-detection of the items array in JSON responses (`items`,
`results`, `data`, `records`, or first list found)
- **Advanced field mapping** with dot-notation (`country.name`), array
wildcards (`newsType[*].name`), type hints, and default values
- Optional content template rendering (`"Title: {title}\nBody: {body}"`)
- HTML stripping for content fields
- Stable document IDs via `hash128` from a configurable ID field or
auto-generated from item content
- Pydantic configuration schema with automatic coercion of UI string
inputs to dicts/lists

### Backend Registration (`rag/svr/sync_data_source.py`,
`common/constants.py`, `common/data_source/config.py`)

- `REST_API` sync class wired into RAGFlow's `func_factory`
- Full sync (`load_from_state`) and incremental polling (`poll_source`)
support
- Credentials and config passed from task to connector following
existing patterns (MySQL, SeaFile, etc.)

### Test Connection Endpoint (`api/apps/connector_app.py`)

- `POST /v1/connector/<id>/test` validates config schema,
authentication, and API connectivity without triggering a sync
- Clear error messages for auth failures vs. config issues

### Frontend UI (`web/src/pages/user-setting/data-source/constant/`)

- **Postman-style configuration:** Base URL, Query Parameters (key=value
per line), Auth, Content Fields, Metadata Fields, Pagination Type
- Auth-type-aware form: fields for API key header/value, Bearer token,
or Basic username/password appear only when relevant
- **Advanced Settings** toggle for: Custom Headers, Max Pages, Request
Delay, Poll Timestamp Field, Request Body (POST)
- Connector icon (SVG) and i18n strings (English)
- **"Test Connection"** button to validate before syncing

---

## Controls & Safety

- Configurable max pages safety cap (default: 1000, adjustable in UI)
- Configurable request delay between pages (default: 0.5s, adjustable in
UI)
- Auth errors (401/403) fail immediately without retries; transient
errors retry with exponential backoff
- Diagnostic logging: auth setup confirmation, request details on
failure, content field extraction status

---

## Type of change

- [x] New Feature (non-breaking change which adds functionality)


##Visual Screenshots of Features
<img width="482" height="510" alt="Screenshot 2026-03-11 at 5 19 52 PM"
src="https://github.com/user-attachments/assets/dcb7ab4a-1622-44f3-bb02-d6f0527314c4"
/>
(Connector can be configured within the external data sources tab)

Configuration Parameters:
<img width="661" height="682" alt="Screenshot 2026-03-11 at 5 20 46 PM"
src="https://github.com/user-attachments/assets/5e154e71-4ab5-4872-bfb2-04f02b73c18a"
/>
<img width="661" height="682" alt="Screenshot 2026-03-11 at 5 20 54 PM"
src="https://github.com/user-attachments/assets/00cb14b7-0bcf-4b94-9d71-34e93369ecb2"
/>

Connection can be tested before attaching to dataset:
<img width="981" height="681" alt="Screenshot 2026-03-11 at 5 21 40 PM"
src="https://github.com/user-attachments/assets/aaa6eeeb-89a7-4349-bc34-2423bf8be9ee"
/>

Ingestion tested with API connector (works perfectly fine):
<img width="1062" height="705" alt="Screenshot 2026-03-11 at 5 22 30 PM"
src="https://github.com/user-attachments/assets/afcd0d58-cadd-4152-badc-d2f14d96fbec"
/>

Search & Retrieval works as well with metadata flow:
<img width="1062" height="705" alt="Screenshot 2026-03-11 at 5 23 05 PM"
src="https://github.com/user-attachments/assets/d41ee935-dcf7-4456-b317-22a76ca032c0"
/>

---------

Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-13 20:35:01 +08:00
7f699d1202 Fix: enforce tenant authorization for tenant_rerank_id in retrieval flows (#14782)
### Related issues

Closes #14781 

### What problem does this PR solve?

Some retrieval endpoints accepted caller-supplied `tenant_rerank_id` and
resolved it through `get_model_config_by_id(...)`. That helper loaded
`TenantLLM` rows by global database id and returned decoded model
configuration without checking whether the model belonged to the
authenticated tenant or the dataset owner tenant.

This meant dataset access was validated, but rerank-model selection was
not. A caller who knew or could guess another tenant's
`tenant_rerank_id` could attempt retrieval with a foreign rerank model
config, creating a cross-tenant authorization gap for model usage.

This PR closes that gap by making `tenant_rerank_id` resolution
tenant-aware across the retrieval paths that accept it.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

### Solution

- Extend `get_model_config_by_id(...)` to accept an optional
`allowed_tenant_ids` set and reject `TenantLLM` rows whose `tenant_id`
is outside that set.
- Pass the allowed tenant scope from retrieval endpoints that accept
`tenant_rerank_id`:
  - `api/apps/sdk/doc.py`
  - `api/apps/sdk/session.py`
  - `api/apps/services/dataset_api_service.py`
- Use the authenticated tenant plus dataset-owner tenant ids already
derived by each retrieval flow as the authorization boundary for rerank
model selection.
- Add focused unit coverage to assert unauthorized `tenant_rerank_id`
values are rejected and that the allowed tenant set is propagated
correctly.

### Testing

- `python -m py_compile` on:
  - `api/db/joint_services/tenant_model_service.py`
  - `api/apps/services/dataset_api_service.py`
  - `api/apps/sdk/doc.py`
  - `api/apps/sdk/session.py`
- Added unit tests in:
-
`test/testcases/test_http_api/test_file_management_within_dataset/test_doc_sdk_routes_unit.py`
-
`test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py`

### Notes for reviewers

- This change is intentionally narrow: it affects only the
`tenant_rerank_id` path, not the normal `rerank_id` name-based
resolution path.
- Local lint/syntax checks passed.
- Full pytest execution could not be completed in this environment
because the local test runtime is missing `strenum`, so the route-test
files fail during collection before exercising the updated cases.

---------

Co-authored-by: jony376 <jony376@gmail.com>
2026-05-13 19:53:08 +08:00
f3b3596c29 Speed up ragflow server (#14894)
### What problem does this PR solve?

Speed up ragflow server

### Type of change

- [ ] Refactoring
2026-05-13 18:01:33 +08:00
8cb2bf04fb Fix: llm add api key overridden (#14885)
### What problem does this PR solve?

llm add api key overridden

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-13 17:15:32 +08:00
ff685d3131 Delete duplicate route (#14883)
### What problem does this PR solve?

The delete /graph is duplicated of
`/datasets/<dataset_id>/<index_type>`, delete it.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-13 15:57:44 +08:00
45d676bc05 Fix delete graphrag not take effect in UI (#14879)
### What problem does this PR solve?

Fix delete graphrag not take effect in UI

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-13 13:49:16 +08:00
64bd0130d3 Add REST API backward compatibility (#14872)
### What problem does this PR solve?

Add REST API backward compatibility

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-13 11:44:40 +08:00
5a5e766386 fix(api): authorize owner_ids for list chats and search apps (#14775)
Closes #14768
### What problem does this PR solve?
The `list_chats` and `list_searches` REST API endpoints did not enforce
authorization on the `owner_ids` query parameter. Any authenticated user
could pass arbitrary tenant IDs to `owner_ids` and retrieve chats or
search apps belonging to other tenants they are not a member of.

This PR resolves the issue by:
1. Looking up the current user's authorized tenants via
`TenantService.get_joined_tenants_by_user_id` and rejecting any
`owner_ids` that fall outside that set.
2. When no `owner_ids` are provided, scoping the query to only the
user's authorized tenants instead of returning an unfiltered result.
3. Adding unit tests that verify unauthorized `owner_ids` are rejected
with `OPERATING_ERROR`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-13 09:43:44 +08:00
2717ee283f feat(raptor): add Psi tree builder with original-space ranking and safe migration (#14679)
### What problem does this PR solve?

Closes #14674.

This PR improves RAPTOR configuration and tree construction while
preserving the existing RAPTOR behavior as the default.

RAPTOR currently builds summary layers with the original UMAP + GMM
clustering path. This PR keeps that default path, and adds:

- A hidden backend tree-builder option:
  - `tree_builder="raptor"`: default, existing RAPTOR behavior.
- `tree_builder="psi"`: rank-aware Psi-style tree builder using original
embedding-space cosine ranking.
- A user-facing clustering method option for the default RAPTOR builder:
  - `clustering_method="gmm"`: existing default.
- `clustering_method="ahc"`: agglomerative hierarchical clustering path.
- A RAPTOR UI setting for `Clustering method` and `Max cluster`.

### What changed

#### Backend

- Added `tree_builder` support for RAPTOR/Psi.
- Added `clustering_method` support for GMM/AHC.
- Kept existing RAPTOR + GMM as the default.
- Added Psi tree building from original-space cosine similarity.
- Added bucketed Psi building controls for large inputs:
  - `raptor.ext.psi_exact_max_leaves`
  - `raptor.ext.psi_bucket_size`
- Added method-aware RAPTOR summary metadata using existing
`extra.raptor_method`.
- Avoided adding a dedicated DB schema field for experimental method
tracking.
- Added cleanup/migration logic to avoid mixing stale RAPTOR summary
trees.
- Added defensive checks for Psi tree construction and summary failures.

#### Frontend/UI

- Added `Clustering method` in RAPTOR settings with `GMM` and `AHC`.
- Added/kept `Max cluster` in RAPTOR settings.
- Enlarged max cluster UI limit to `1024`, matching backend validation.
- Kept AHC editable even when a RAPTOR task has already finished.
- Fixed the UI save payload so `clustering_method` and `tree_builder`
are serialized through `parser_config.raptor.ext`, avoiding backend
validation errors for extra top-level RAPTOR fields.

Example saved RAPTOR config:

```json
{
  "raptor": {
    "max_cluster": 317,
    "ext": {
      "clustering_method": "ahc",
      "tree_builder": "raptor"
    }
  }
}

Co-authored-by: CaptainTimon <CaptainTimon@users.noreply.github.com>
2026-05-12 09:42:31 +08:00
415169d497 fix(dify): add GET method support to /dify/retrieval for health check (#13837)
## Summary
- Add GET method handler to `/api/v1/dify/retrieval` endpoint for Dify
external knowledge base connectivity verification
- GET requests return a simple success response; POST requests retain
existing retrieval logic unchanged

## Problem
When Dify integrates with RAGFlow as an external knowledge base, it
sends periodic GET requests to the retrieval endpoint for
health/connectivity checks. The endpoint only accepted POST, causing
werkzeug to return `405 Method Not Allowed`. After several successful
POST retrievals, the failing GET health checks trigger Dify's circuit
breaker, causing all subsequent requests to fail.

Traceback from the issue:
```
werkzeug.exceptions.MethodNotAllowed: 405 Method Not Allowed: The method is not allowed for the requested URL.
```

## Changes
- `api/apps/sdk/dify_retrieval.py`: Added a separate GET route handler
(`retrieval_health_check`) that returns `get_json_result(data=True)`

## Test plan
- [ ] Verify `GET /api/v1/dify/retrieval` returns `{"code": 0,
"message": "success", "data": true}`
- [ ] Verify `POST /api/v1/dify/retrieval` with valid API key and body
still works as before
- [ ] Verify Dify external knowledge base integration no longer returns
405 errors

Closes #13788

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Asksksn <Asksksn@noreply.gitcode.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2026-05-12 09:37:07 +08:00
663fc1d42c fix(opensearch): implement doc-meta dispatch surface on OSConnection (#14577)
### What problem does this PR solve?

Fixes #14570. On OpenSearch backends (`DOC_ENGINE=opensearch`) every
document-metadata write failed with `'OSConnection' object has no
attribute 'create_doc_meta_idx'`, so both `PATCH
/api/v1/datasets/{ds}/documents/{doc}` with `meta_fields` and `POST
/api/v1/datasets/{ds}/metadata/update` were unusable while every other
document operation (retrieval, parsing, name update, chunk management)
worked correctly on the same OpenSearch cluster.

The bug runs deeper than the missing method name in the error message
suggests. `DocMetadataService` also reached into
`settings.docStoreConn.es.*` directly for the index refresh, the
scripted partial update, and the count call, which means that even after
adding `create_doc_meta_idx` to `OSConnection` the very next call in the
same metadata flow would still raise `AttributeError` because
`OSConnection` exposes `self.os` rather than `self.es`. Fixing only the
reported symptom would have moved the failure one line down without
restoring the feature.

This PR adds a uniform document-metadata dispatch surface to both
connection classes so they present the same abstract API, and routes the
service layer through that surface via `getattr` guards instead of
poking at backend-specific attributes. The four new methods on
`OSConnection` and `ESConnectionBase` are `create_doc_meta_idx`,
`refresh_idx`, `count_idx`, and `replace_meta_fields`.
`OSConnection.create_doc_meta_idx` reuses the existing
`conf/doc_meta_es_mapping.json` schema in the OpenSearch `body=` form
because OpenSearch and Elasticsearch share the same index-creation
payload, and `replace_meta_fields` emits a full scripted assignment
(`ctx._source.meta_fields = params.meta_fields`) on both backends so
removed keys actually disappear instead of being preserved by deep-merge
semantics.

The `getattr`-guarded dispatch in `DocMetadataService` keeps the
existing fall-through paths intact for Infinity and OceanBase, which
continue to rely on their search-based count fallback and on the
delete-then-insert metadata replacement they used before, so this change
is strictly additive for those two backends.

Verification: `pytest
test/unit_test/rag/utils/test_opensearch_doc_meta.py` runs 16 new unit
tests that pass locally and pin the `OSConnection` dispatch surface, the
`create_doc_meta_idx` short-circuit when the index already exists, the
mapping-file payload routing, the `IndicesClient.create` failure path,
the `refresh_idx` and `count_idx` success and error sentinels, and the
full-assignment script emitted by `replace_meta_fields`. The test module
stubs `common.settings` and `rag.nlp` at import time so the suite runs
without the heavy backend SDKs that the rest of the repository pulls in
transitively.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: tmimmanuel <tmimmanuel@users.noreply.github.com>
2026-05-11 17:04:28 +08:00
292b0b8bce chore: fix some comments to improve readability (#14756)
### What problem does this PR solve?

fix some comments to improve readability

### Type of change

- [x] Documentation Update

---------

Signed-off-by: box4wangjing <box4wangjing@outlook.com>
2026-05-11 16:48:48 +08:00
592dba1489 Refact: Added a private helper _visibility_and_status_filter (#13627)
### What problem does this PR solve?

Added a private helper _visibility_and_status_filter(joined_tenant_ids,
user_id) that returns the Peewee condition: visible to user (team or
own) and status is VALID.

### Type of change
- [x] Refactoring

---------

Co-authored-by: Serobabov Aleksandr <40SerobabovAS@region.cbr.ru>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-05-11 15:21:41 +08:00
6ce014c23b fix: offload blocking DB/Redis calls to thread pool for high-concurrency support (#13825) (#13941)
### What problem does this PR solve?

Addresses event-loop blocking under high concurrency reported in #13825.
When multiple requests hit the API simultaneously, synchronous DB/Redis
calls block the async event loop, preventing Quart from handling other
requests and causing cascading 502/504 timeouts.

This PR wraps all remaining blocking DB/Redis calls in `canvas_app.py`,
`chat_api.py`, `session.py`, and `canvas_service.py` with `await
thread_pool_exec()`
- Offload all synchronous `Service.*`, `REDIS_CONN.*`, and
`APIToken.query` calls to the thread pool
- Convert sync endpoint handlers (`list_chats`, `get_chat`, `templates`,
`sessions`, etc.) to `async def`
- Convert sync helper functions (`_ensure_owned_chat`,
`_validate_llm_id`, `_validate_dataset_ids`, etc.) to async - no
duplicate sync/async pairs
- Wrap `CanvasReplicaService` Redis IO calls (`bootstrap`,
`replace_for_set`, `commit_after_run`)
- Use `asyncio.gather()` for concurrent file uploads and chat response
building

**Note:** This fixes the code-level event-loop blocking, which is a
prerequisite for handling concurrent requests. For the full "30
concurrent requests without 502/504" goal described in the issue, users
should also tune deployment config:
- `WS=4` or higher (HTTP worker processes, default 1)
- `MAX_CONCURRENT_CHATS=50` (default 10)
- `SANDBOX_EXECUTOR_MANAGER_POOL_SIZE` for workflow-heavy workloads

### Performance verification

Reviewer asked for a before-vs-after comparison
([comment](https://github.com/infiniflow/ragflow/pull/13941#issuecomment-4393667231)).
I built a self-contained microbenchmark that reproduces the exact
failure mode this PR targets: an async handler that performs blocking
DB/Redis-style calls (50 ms each, 3 per request, 30 concurrent requests)
is run twice — once with the pre-PR pattern (sync call directly inside
the async handler) and once with the post-PR pattern (`await
thread_pool_exec(...)`). The benchmark imports nothing from RAGFlow
except `thread_pool_exec` itself, so it is hermetic and reproducible
(`THREAD_POOL_MAX_WORKERS=128`, Python 3.13.12).

**Throughput — wall-clock for 30 concurrent requests (lower is better)**

| flavour | wall(s) | p50(s) | p95(s) | max(s) |
|---|---:|---:|---:|---:|
| before | 4.986 | 0.158 | 0.207 | 0.269 |
| after  | 0.248 | 0.181 | 0.230 | 0.231 |

The pre-PR handler serializes the entire load on the event-loop thread,
so 30 × 3 × 50 ms ≈ 4.5 s shows up as the wall time. The post-PR handler
parallelizes the blocking work across the thread pool and finishes the
same load in 248 ms — a **~20× speedup** on this workload.

**Event-loop responsiveness — latency of an unrelated probe coroutine
while the 30 slow requests are running (lower is better)**

| flavour | samples | probe p50 (ms) | probe p95 (ms) | probe max (ms) |
|---|---:|---:|---:|---:|
| before | 1 | 5442.26 | 5442.26 | 5442.26 |
| after  | 28 | 0.88 | 11.53 | 98.02 |

This is the metric that maps directly to "the API still answers other
requests while one is busy". A 5 ms-interval probe was scheduled while
the 30 slow handlers ran. With the pre-PR code the event loop was frozen
for the entire duration of the blocking work, so only one probe sample
was ever picked up and it waited **5,442 ms**. After the PR, 28 probe
samples landed with **p50 0.88 ms / p95 11.53 ms**, meaning unrelated
requests are no longer starved by the slow ones. That is the regression
mode behind the cascading 502/504s reported in #13825.

<details>
<summary>Raw benchmark output</summary>

```
config: 30 concurrent requests, 3 blocking calls of 50ms each per request, THREAD_POOL_MAX_WORKERS=128

=== Throughput (lower wall is better) ===
flavour       wall(s)     p50(s)     p95(s)     max(s)
before          4.986      0.158      0.207      0.269
after           0.248      0.181      0.230      0.231

=== Event-loop responsiveness (lower probe latency is better) ===
flavour       samples    probe p50(ms)    probe p95(ms)    probe max(ms)
before              1          5442.26          5442.26          5442.26
after              28             0.88            11.53            98.02
```

</details>

The benchmark script is included as a comment on the PR for
reproducibility.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Performance Improvement

Closes [#13825](https://github.com/infiniflow/ragflow/issues/13825)

---------

Co-authored-by: tmimmanuel <tmimmanuel@users.noreply.github.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2026-05-11 15:08:55 +08:00
a0efc453f3 Fix: safe argument guard and remove redundant redis call (#14060)
### What problem does this PR solve?

- Moved if not all([email, new_pwd, new_pwd2]) guard to the top, before
any decryption that could crash on None value
- Removed the redundant REDIS_CONN.get() call — one call is sufficient

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2026-05-11 15:02:24 +08:00
5ef7f50eef fix: use context manager for ThreadPoolExecutor in file_service.py (#14144)
## Summary
- Wrap 2 `ThreadPoolExecutor` instances in `file_service.py` with `with`
statement
- Ensures threads are properly shut down after all futures complete

## Problem

`parse_docs()` (line 532) and the file processing method (line 694)
create `ThreadPoolExecutor` instances that are never shut down. In a
long-running server process, this leaks thread resources on every
invocation — threads remain alive consuming memory even after all
submitted work is complete.

## Fix

Replace bare `ThreadPoolExecutor()` with `with ThreadPoolExecutor() as
exe:` context manager, which calls `executor.shutdown(wait=True)` on
exit.

## Test plan
- [x] Verified both call sites use `with` statement after fix
- [x] No remaining bare `ThreadPoolExecutor` in `file_service.py`
- [x] `document_service.py:1066` is a module-level executor (different
pattern, not changed in this PR)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2026-05-11 14:02:45 +08:00
a03b95f8c4 Fix: shared dataset chunk index lookup (#14764)
### What problem does this PR solve?

shared dataset chunk index lookup

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-11 13:50:08 +08:00
024c8cb0b5 Fix: dataset search rerank id type (#14759)
### What problem does this PR solve?
issue: https://github.com/infiniflow/ragflow/issues/14748

change: dataset search rerank id type
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-11 13:48:05 +08:00
46897d6fa4 Fix: bind memory message user_id to authenticated user for JWT auth (#14745)
### Related issues

Closes #14744

### What problem does this PR solve?

The Memory REST endpoint `POST /api/v1/messages` previously persisted
whatever `user_id` the client sent in the JSON body. Memory rows were
therefore attributed to an arbitrary string, even when the caller
authenticated as a normal workspace user via JWT (browser/session-style
bearer token decoded into an access token). That broke attribution and
audit semantics for shared memories (team visibility): any authorized
writer could spoof another subject id.

The Python SDK already sends an optional `user_id` for integrations
using **API keys** (`APIToken`) to tag an external subject distinct from
the tenant owner user.

### Solution

- Record **`g.auth_via_api_token`** in `_load_user`
(`api/apps/__init__.py`): set `True` only when authentication resolves
via `APIToken`, otherwise `False` after JWT-based login succeeds.
- In **`POST /messages`** (`memory_api.add_message`): if the request was
authenticated with an API key, keep accepting optional `user_id` from
the body (default empty string). For JWT-authenticated users, **always**
set stored `user_id` to **`current_user.id`** and ignore the client
field.
- Guard reads of `g` with **`RuntimeError`** handling so isolated
imports or tests without a Quart application context do not fail when
resolving `user_id`.
- Document on **`RAGFlow.add_message`** that `user_id` is only
meaningful for API-key authentication.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

### Testing

- `python -m py_compile` on modified modules (`api/apps/__init__.py`,
`api/apps/restful_apis/memory_api.py`).
- Recommended: run web/SDK memory message tests (`test_add_message`,
`test_message_routes_unit`) against a full environment with `quart` and
configured services.

### Notes for reviewers

- Behavior change **only** for callers using JWT-style authorization on
`POST /messages`; API-key callers keep prior optional `user_id`
semantics.

Co-authored-by: jony376 <jony376@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-05-11 13:26:05 +08:00