### What problem does this PR solve?
Closes#15088.
Adds Groq support to the Go model-provider layer so Groq instances can
be routed through the Go API server with the same OpenAI-compatible
chat, streaming, model listing, and connection-check flow used by other
SaaS providers.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
## Summary
- Added a Groq Go model driver.
- Added the Groq provider catalog and default OpenAI-compatible API URL.
- Registered Groq in the model factory.
- Added focused provider tests.
## What changed
- Implemented chat completions, SSE streaming, ListModels, and
CheckConnection for Groq.
- Covered request shape, stream termination, reasoning fallback, model
listing, custom base URLs, safe transport setup, and unsupported
methods.
- Kept the provider catalog scoped to current Groq chat-capable model
IDs.
- Cleaned up pre-existing Go model package validation blockers so the
package can be tested normally with vet enabled.
## Why
The existing Python/provider catalog path includes Groq, but the Go
model-provider layer did not have a Groq driver, so the Go API server
could not instantiate or use Groq as requested in #15088.
## Notes
The model package now validates without disabling vet.
---------
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
move agent attachment download api to the correct route and update
frontend callers
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Notes
- Move the attachment download endpoint from document routes to agent
routes.
- Update frontend download callers to use the agent attachment endpoint.
- Reuse the shared file response header helper instead of duplicating it
in `agent_api.py`.
## Summary
- Adds a `TokenPony` Go driver so the new API server can route TokenPony
chat instances, matching the existing Python `TokenPonyChat`
(`rag/llm/chat_model.py:1210`). Follows the same SaaS-driver shape used
for Astraflow, Avian, Novita, TogetherAI, Replicate, DeepInfra, Upstage,
and LongCat.
Closes#15086
---------
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
## Summary
Implements the TODO in `evaluation_service.py`: **Track token usage** in
evaluation results.
## Changes
- **Import** `num_tokens_from_string` from `common.token_utils`
- **Prompt tokens**: Use the full prompt returned by `async_chat` when
available (includes system prompt + knowledge base + query), otherwise
fall back to the question token count
- **Completion tokens**: Count tokens in the generated answer
- **Storage**: Store `token_usage` as `{prompt_tokens,
completion_tokens, total_tokens}` in each `EvaluationResult` instead of
`None`
## Why
The evaluation pipeline previously saved `token_usage: None` for every
result. This change allows downstream consumers (e.g. evaluation
dashboards, cost tracking) to see approximate token usage per test case
using the same tokenizer (tiktoken cl100k_base) used elsewhere in
RAGFlow.
## Testing
- No new tests added; existing evaluation flow unchanged
- Token counting uses existing `num_tokens_from_string` utility
---------
Co-authored-by: kiannidev <kiannidev@users.noreply.github.com>
### What problem does this PR solve?
Fixes#15066
OpenRouter now exposes an official speech-to-text endpoint at `POST
/api/v1/audio/transcriptions`, but the Go model driver still returned
`openrouter, no such method` from `TranscribeAudio`. This left
OpenRouter ASR models unavailable through the Go API server even though
the provider already has OpenRouter audio support for TTS.
Related provider-tracking context: #14736
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
The agent API currently does not pass chat_template_kwargs to the
underlying LLM call path, so clients cannot control template-level model
behavior (such as thinking-mode toggles) when invoking
/agents/chat/completion. This PR adds passthrough support for
chat_template_kwargs across agent execution flows (session and
non-session, streaming and non-streaming) by propagating it through
canvas runtime state and into LLM invocation kwargs. This addresses the
feature gap raised in [Issue
#14182](https://github.com/infiniflow/ragflow/issues/14182).
Closes#14182
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Closes#14789
### What problem does this PR solve?
User API endpoints (`login`, `user_profile`, `user_add`,
`forget_reset_password`) were returning full user objects via
`to_json()` / `to_dict()`, which included sensitive fields like
`password` and `access_token` in the response body. This leaks
credentials to the client.
This PR adds a `to_safe_dict()` method on the `User` model that strips
sensitive fields (`password`, `access_token`) and replaces all affected
call sites to use it.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
1. Enhance retry and timeout, and adjust the default timeout
2. NER: spacy do not batch chunks
3. extract _has_cancel_and_exit
4. enhance log messages
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
Closes#4310
### What problem does this PR solve?
Issue #4310 requests practical examples for the RAGFlow SDK and HTTP API
to help developers get started faster. The existing `example/sdk/`
folder only contains `dataset_example.py`. This PR fills the remaining
gaps by adding examples for three key API areas not yet covered in
`main` or by other open PRs (#13904, #13284):
- **Chunk management** — add, list, update, delete, and retrieve chunks
within a dataset
- **Chat assistant** — create a chat assistant, open a session, send
messages (streaming and non-streaming), and clean up
- **Retrieval** — perform semantic retrieval across one or multiple
datasets
### Type of change
- [x] Documentation Update
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Closes#14865
`download_img` in `common/misc_utils.py` is used for OAuth avatar URLs.
The previous implementation called `async_request` from
`common.http_client`, which followed redirects without re-validating
each hop and did not apply the same SSRF protections as this path needs.
That made it possible to reach non-public or disallowed targets (for
example via redirects or unsafe URLs) when fetching avatars.
This change replaces that flow with an explicit, bounded fetch: each URL
(including every redirect target) is checked with
`common.ssrf_guard.assert_url_is_safe`, DNS is pinned with
`pin_dns_global`, `httpx` streams the body with `follow_redirects=False`
and a manual redirect loop (capped by
`RAGFLOW_OAUTH_AVATAR_MAX_REDIRECTS`), and total response size is capped
(`RAGFLOW_OAUTH_AVATAR_MAX_BYTES`). Timeouts, proxy, and user agent
align with `HTTP_CLIENT_*` env vars without importing `http_client`, so
lightweight tests stay simple.
Unit tests cover empty/None URLs, loopback, cloud metadata-style
addresses, and disallowed schemes so SSRF regressions are caught early.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
This PR implements ASR and TTS support for the ZhipuAI Go driver.
The ZhipuAI model config already advertises `glm-asr-2512` as an ASR
model, but the Go driver returned `zhipu, no such method` from
`TranscribeAudio`. This adds the documented audio transcription endpoint
suffix and sends multipart transcription requests with `model`,
`stream=false`, and `file` fields.
Per maintainer review, this also adds the ZhipuAI TTS endpoint suffix
and implements `AudioSpeech` / `AudioSpeechWithSender` for `glm-tts`.
Closes#15133
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Closes#15089.
Adds PPIO support to the Go model-provider layer so PPIO instances can
be routed through the Go API server with the same OpenAI-compatible
chat, streaming, model listing, and connection-check flow used by other
SaaS providers.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
## Summary
- Added a PPIO Go model driver.
- Added the PPIO provider catalog and default OpenAI-compatible API URL.
- Registered PPIO in the model factory.
- Added focused provider and provider-manager tests.
## What changed
- Implemented chat completions, SSE streaming, ListModels, and
CheckConnection for PPIO.
- Covered request shape, stream termination, reasoning fallback, model
listing, custom base URLs, safe transport setup, unsupported methods,
and provider config loading.
- Kept the provider catalog aligned with the existing RAGFlow PPIO
factory model set.
- Cleaned up pre-existing Go model package validation blockers so the
scoped provider tests can run normally with vet enabled.
## Why
The existing Python/provider catalog path includes PPIO, but the Go
model-provider layer did not have a PPIO driver, so the Go API server
could not instantiate or use PPIO as requested in #15089.
### What problem does this PR solve?
implement rerank, asr, tts for TogetherAI
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
1. update python version to 3.13
2. upgrade ormsgpack to 1.6.0
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
implement ASR and TTS for Xinference
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
## Summary
Fixes 10 unguarded `response.choices[0]` accesses that cause
`IndexError` or `AttributeError` when the LLM returns an empty `choices`
list — the scenario described in #14711.
- `rag/llm/cv_model.py`
- `rag/llm/chat_model.py`
Each access site is now guarded with:
```python
if not response.choices:
raise ValueError("LLM returned empty response")
```
## Verification
Detected and verified by [pact](https://github.com/qizwiz/pact) — a
sheaf-cohomological LLM contract checker using Z3 as a local theory
solver.
**pact sheaf-cohomological proof status after fix:**
| File | Ȟ¹ (after) | Z3 |
|------|-----------|-----|
| `rag/llm/cv_model.py` | 0 | UNSAT ✓ |
| `rag/llm/chat_model.py` | 0 | UNSAT ✓ |
All access sites proven safe (Z3 UNSAT certificate).
The checker was also used to verify the autogen streaming-None fix in
[microsoft/autogen#7711](https://github.com/microsoft/autogen/pull/7711).
## Test plan
- [ ] Existing test suite passes
- [ ] Manually test with a provider that returns empty `choices` under
load (e.g. Vertex AI)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Signed-off-by: Jonathan Hill <jonathan.f.hill@gmail.com>
`GET /agents/<agent_id>/sessions/<session_id>` crashed with
`AttributeError: 'NoneType' object has no attribute 'to_dict'` when the
session lookup failed: `_, conv =
API4ConversationService.get_by_id(...)` returned `(False, None)`, then
`conv.to_dict()` was called unconditionally.
This is reachable in multi-instance deployments: the session row may not
yet be visible on the node servicing the immediate follow-up GET after a
session is created on a different node.
Add the same `if not exists` guard already used by every other call site
of `API4ConversationService.get_by_id` (see agent_api.py:1147,
sdk/session.py:179, conversation_service.py:248, canvas_service.py:323).
Closes#14989
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Replace the RuntimeError with a warning + first-address fallback so a
single email whose From header contains multiple addresses no longer
crashes the entire IMAP sync task. Also add regression tests covering:
- #14963: RFC 5322 quoted display names with commas (e.g. "Schlüter,
Sabine" <s@x>) parsed as one address, not two.
- #14964: multi-address headers warn instead of raising.
Closes#14964
Refs #14963
## Summary
- Bump pinned nginx in `Dockerfile` from `1.29.5-1~noble` (vulnerable)
to `1.31.0-1~noble` to remediate **CVE-2026-42945**.
## Root Cause
`Dockerfile:58` pinned `ARG NGINX_VERSION=1.29.5-1~noble`. Per the
official nginx security advisory, **CVE-2026-42945** is a buffer
overflow in `ngx_http_rewrite_module` triggered via the `rewrite` and
`set` directives, affecting nginx **0.6.27 through 1.30.0**. `1.29.5`
falls inside that range, so the shipped image is vulnerable.
References:
- nginx security advisories:
https://nginx.org/en/security_advisories.html
- Vendor advisory: https://my.f5.com/manage/s/article/K000161019
- Fixed versions: `1.31.0` (mainline) and `1.30.1` (stable)
## Fix
Single-line change in `Dockerfile:58`:
```diff
-ARG NGINX_VERSION=1.29.5-1~noble
+ARG NGINX_VERSION=1.31.0-1~noble
### What problem does this PR solve?
Fixes#14997.
RAPTOR builds on the Infinity backend have been broken since v0.25.2
introduced the `extra` field in code (`rag/svr/task_executor.py:1011`)
without declaring it in `conf/infinity_mapping.json`. Every RAPTOR job
fails with:
```
infinity.common.InfinityException: (3013, 'Fail to bind the expression: extra@src/planner/expression_binder_impl.cpp:99')
```
The auto-migration in
`common/doc_store/infinity_conn_base.py:_migrate_db()` adds any columns
it finds in the mapping JSON to existing tables — so the only thing
standing between users and a working RAPTOR build is that one missing
declaration. OceanBase, ES, and OpenSearch were unaffected because they
store `extra` as a native JSON type; only Infinity (which has a strict
`varchar`/`integer`/`float` schema) needed the addition.
### The fix
Two-part change:
1. **`conf/infinity_mapping.json`**: declare `"extra": {"type":
"varchar", "default": ""}`. On next startup, `_migrate_db()` adds the
column to all existing chunk tables — no manual DDL needed for upgrading
installations.
2. **`rag/utils/infinity_conn.py` `insert()`**: serialize the `extra`
dict to a JSON string at write time, since Infinity's `varchar` can't
store a Python dict directly. Modelled on the existing `chunk_data`
handling a few lines above.
The read path (`rag/utils/raptor_utils.py:_as_extra_dict`) already
normalises both dict and JSON-string inputs, so no read-side change is
needed. Other backends are untouched — `task_executor.py` still writes
the dict, and the OceanBase/ES/OpenSearch insert paths handle dicts
natively.
### Verification
Tested on a v0.25.4 deployment with the Infinity backend by applying the
same two changes via mounted-volume override:
- Confirmed `_migrate_db()` adds the `extra` column to all pre-existing
chunk tables on startup (column visible via Infinity's
`show_columns()`).
- Triggered RAPTOR builds on four datasets (~21k chunks total) via `POST
/api/v1/datasets/<id>/index?type=raptor`.
- All four progressed past the previously-failing
`get_raptor_chunk_methods()` call into actual entity-extraction and
clustering work without the (3013) error.
- GraphRAG builds (which can trigger the same path indirectly via
`task_executor.py:857`) also progressed cleanly.
### Type of change
- [X] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
`UpstageModel.ChatStreamlyWithSender` (in the driver merged via #14819)
only extracted `delta.content` from each SSE event. For the `solar-pro3`
reasoning family (and any future Upstage model that follows the same
wire shape), the chain-of-thought is streamed in a **separate
`delta.reasoning` field**, and the driver was silently dropping all of
it.
The non-streaming path already extracts `message.reasoning` into
`ChatResponse.ReasonContent` (added earlier in this PR's history), so
the same model produced **inconsistent behavior** between streaming and
non-streaming: a tenant calling `solar-pro3` with `reasoning_effort:
high` would see the reasoning trace if they used `ChatWithMessages` but
not if they used `ChatStreamlyWithSender`.
### Live evidence
Probed against `api.upstage.ai/v1/chat/completions` with `solar-pro3` +
`reasoning_effort: high` + `stream: true` (8000-token budget so the
reasoning has room to finish):
```
$ curl -sN -H "Authorization: Bearer <key>" -H "Content-Type: application/json" \
-X POST https://api.upstage.ai/v1/chat/completions \
-d '{"model":"solar-pro3","messages":[{"role":"user","content":"Compute 15% of 80."}],
"max_tokens":8000,"stream":true,"reasoning_effort":"high"}'
# across 168 SSE events:
# delta keys seen: [content reasoning role]
# delta.content total len: 121 chars (the visible answer)
# delta.reasoning total len: 159 chars (the chain-of-thought) <- driver dropped this
```
A representative event showing both fields side by side:
```json
data: {"choices":[{"index":0,"delta":{"reasoning":"15% = 0.15."}}]}
data: {"choices":[{"index":0,"delta":{"content":"15% of 80 is "}}]}
```
The 159 chars of reasoning were arriving on the wire and being thrown
away. `solar-pro2` was also probed (625 events); it does **not** emit
`delta.reasoning` — its reasoning is inlined into `delta.content` — so
this change is a no-op for it and for `solar-mini`.
### What this PR includes
- `internal/entity/models/upstage.go`: in the SSE scanner loop, extract
`delta.reasoning` before `delta.content` and forward each non-empty
chunk via the sender's second arg (the existing `reasonContent` channel
the non-stream path already populates).
The ordering contract is documented inline: reasoning chunks within a
single SSE event are emitted before content chunks, so a UI that pipes
both sees the chain-of-thought start before the answer for that token,
matching the wire order Upstage emits.
- `internal/entity/models/upstage_test.go`: three new tests pinning the
new behavior:
- `TestUpstageStreamExtractsReasoningDelta` — reasoning + content
forwarded to the right sender args; one-of invariant per call
- `TestUpstageStreamReasoningChunksArriveBeforeContent` — ordering
pinned within a single SSE event that carries both fields
- `TestUpstageStreamWithoutReasoningStillWorks` — regression net:
non-reasoning models (`solar-mini`, `solar-pro2`) continue to work; the
reason callback never fires
No interface change. No factory change. No config change.
### How was this tested?
```
$ go test -vet=off -run TestUpstage -count=1 -v ./internal/entity/models/...
... (existing tests 1..9 still pass) ...
=== RUN TestUpstageStreamExtractsReasoningDelta
--- PASS: TestUpstageStreamExtractsReasoningDelta (0.01s)
=== RUN TestUpstageStreamReasoningChunksArriveBeforeContent
--- PASS: TestUpstageStreamReasoningChunksArriveBeforeContent (0.01s)
=== RUN TestUpstageStreamWithoutReasoningStillWorks
--- PASS: TestUpstageStreamWithoutReasoningStillWorks (0.00s)
PASS
ok ragflow/internal/entity/models 0.034s
```
12/12 Upstage tests pass on go 1.25. `go build
./internal/entity/models/...` exits 0.
**Live integration test** (smoke test not committed) — the patched
driver was run directly against `api.upstage.ai/v1` with the same prompt
that produced the curl evidence above:
```
=== RUN TestUpstageStreamReasoningLiveSmoke
[OK] visible content: 50 chunks, 84 chars
[OK] reasoning: 39 chunks, 90 chars
content head 200: "\\(15\\% = \\frac{15}{100}=0.15\\).\n\n\\[\n0.15 \\times 80 = 12.\n\\]\n\n**15 % of 80 is 12.**"
reasoning head 200: "We need to compute 15% of 80. That's 0.15 * 80 = 12. So answer is 12. Provide explanation."
UPSTAGE STREAM REASONING SMOKE PASSED
--- PASS: TestUpstageStreamReasoningLiveSmoke (1.97s)
```
Before this fix, the same call would have produced **0 reasoning
chunks**. The 90 chars of reasoning that the patched driver now surfaces
are the chain-of-thought solar-pro3 emits when reasoning_effort is high.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
`MistralModel.ChatWithMessages` (in the driver merged via #14807)
assumes that `choices[0].message.content` from `/v1/chat/completions` is
always a string and falls through to `return nil, fmt.Errorf("invalid
content format")` on anything else.
That assumption breaks for the **magistral reasoning family**
(`magistral-small-*`, `magistral-medium-*`). When the model needs a
chain-of-thought to answer, Mistral returns `content` as a **structured
array of typed parts**:
```json
"content": [
{"type": "thinking",
"thinking": [{"type": "text", "text": "Combined speed is 150 mph. 300 / 150 = 2 hours."}],
"closed": true},
{"type": "text", "text": "They will meet after **2 hours**."}
]
```
Concretely, this is what the live API returns today (probed against
`api.mistral.ai/v1`):
```
$ curl -H "Authorization: Bearer <key>" -H "Content-Type: application/json" \
-X POST https://api.mistral.ai/v1/chat/completions \
-d '{"model":"magistral-medium-latest",
"messages":[{"role":"user","content":"two trains 60mph and 90mph, 300mi apart, when do they meet? step by step."}],
"max_tokens":1024}'
HTTP 200
{ "choices":[{"message":{
"role":"assistant",
"content":[
{"type":"thinking","thinking":[{"type":"text","text":"Okay, let's see..."}],"closed":true},
{"type":"text","text":"To determine when the two trains meet..."}
]}}] }
```
With the current driver, every call like that returns the generic
`"invalid content format"` error. Trivial prompts that happen to fit in
a string answer still succeed, so the breakage is **non-deterministic
from the tenant's POV**: same model, same provider, sometimes works,
sometimes 500s with no useful error.
A secondary issue: `conf/models/mistral.json` does not include any
magistral model. The picker hid the broken path, which is why this
wasn't caught during #14807's review.
### What this PR includes
- New helper `extractMistralContent(raw interface{}) (answer,
reasonContent string, err error)` in
`internal/entity/models/mistral.go`, which normalizes both shapes
Mistral can return:
- `string` → historical path. `Answer = content`, `ReasonContent = ""`.
Preserves behavior for every non-reasoning model (`mistral-large-*`,
`mistral-small-*`, `ministral-*`, `codestral-*`, `pixtral-*`,
`open-mistral-nemo`).
- `[]interface{}` → walk the parts. Concatenate every `{"type":"text",
"text":...}` part into `Answer`; concatenate the inner text inside every
`{"type":"thinking", "thinking":[...]}` part into `ReasonContent`.
- `ChatWithMessages` now calls the helper instead of doing the raw
`.(string)` cast.
- Unknown part types are **skipped, not failed**. Mistral has been
adding new content variants quickly (audio chunks, citations, etc.);
this driver should not 500 every call when a new part type appears.
- `conf/models/mistral.json`: add `magistral-medium-latest` and
`magistral-small-latest`. Both are visible in `/v1/models` today.
No interface change. No factory change. No new dependencies.
### How was this tested?
**Unit tests** — 5 new tests in `internal/entity/models/mistral_test.go`
on top of the 27 already shipped via #14807:
- `TestMistralChatHandlesStringContent` — regression net for the
historical path
- `TestMistralChatExtractsReasoningFromStructuredContent` — the fixture
body is a trimmed copy of the actual `magistral-medium-latest` response
captured above; asserts both `Answer` and `ReasonContent` are populated
correctly
- `TestMistralChatHandlesStructuredContentWithoutThinking` —
`magistral-*` with a trivial answer returns a structured shape that has
only a `text` part; `ReasonContent` must stay empty
- `TestMistralChatIgnoresUnknownContentPartTypes` — `audio_url` and
`future_part_type` parts are skipped, `text` parts still flow through
- `TestExtractMistralContent` — table-driven unit coverage of the helper
for string, empty string, nil, empty array, text-only, thinking+text,
unsupported root type
```
$ go test -vet=off -run "TestMistral|TestExtractMistralContent" -count=1 -v ./internal/entity/models/...
=== RUN TestMistralChatHandlesStringContent
--- PASS: TestMistralChatHandlesStringContent (0.00s)
=== RUN TestMistralChatExtractsReasoningFromStructuredContent
--- PASS: TestMistralChatExtractsReasoningFromStructuredContent (0.00s)
=== RUN TestMistralChatHandlesStructuredContentWithoutThinking
--- PASS: TestMistralChatHandlesStructuredContentWithoutThinking (0.00s)
=== RUN TestMistralChatIgnoresUnknownContentPartTypes
--- PASS: TestMistralChatIgnoresUnknownContentPartTypes (0.00s)
=== RUN TestExtractMistralContent
=== RUN TestExtractMistralContent/plain_string
=== RUN TestExtractMistralContent/empty_string
=== RUN TestExtractMistralContent/nil
=== RUN TestExtractMistralContent/empty_array
=== RUN TestExtractMistralContent/text_only
=== RUN TestExtractMistralContent/thinking_then_text
=== RUN TestExtractMistralContent/unknown_root_type
--- PASS: TestExtractMistralContent (0.00s)
PASS
ok ragflow/internal/entity/models 0.046s
```
All 32 Mistral tests pass on go 1.25. `go build
./internal/entity/models/...` exits 0.
**Live integration test** — driver exercised against `api.mistral.ai/v1`
with the patched code:
```
=== RUN TestMistralMagistralSmoke
[OK] "magistral-small-latest" present upstream
[OK] "magistral-medium-latest" present upstream
[OK trivial] Answer="7" ReasonContent=""
[OK reasoning] Answer len=797 head="To determine when the two trains meet, we can follow these steps:\n\n1. **Identify..."
ReasonContent len=1069 head="Okay, let's see. There are two trains, one going 60 mph and the other going 90 mph. They're moving towards each other, s..."
MAGISTRAL SMOKE PASSED
--- PASS: TestMistralMagistralSmoke (18.09s)
PASS
ok ragflow/internal/entity/models 18.112s
```
What the live run proves on the wire:
- `magistral-small-latest` with a trivial prompt still uses the
string-content shape; the regression-net path is exercised against the
real server, not just the mock.
- `magistral-medium-latest` with a reasoning prompt uses the
structured-array shape; the new code path extracts a 1069-character
reasoning trace into `ChatResponse.ReasonContent` and a 797-character
visible answer into `ChatResponse.Answer`. Before this fix, the same
call returned `"invalid content format"` and the caller saw nothing.
The smoke-test file itself is not committed (live tests live outside the
PR diff, same convention used for prior provider PRs).
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Problem
The Go server build pipeline (`build.sh` + CMake + CGO bindings) was
tested on Ubuntu only. On macOS arm64 with Homebrew it fails in five
orthogonal places. None of these require platform-specific code paths —
the same source builds on both Linux and Darwin after these fixes.
## Reproduction (before)
```
$ uname -a
Darwin … 25.4.0 arm64
$ brew install cmake pcre2 simde
$ bash build.sh
…
error: 'simde/x86/sse4.1.h' file not found
error: implicit instantiation of undefined template 'std::basic_istringstream<char>'
error: no matching function for call to 'Join'
…
clang: error: no such file or directory: '/usr/local/lib/libpcre2-8.a'
```
## Fix (5 small, orthogonal changes)
### 1. `internal/cpp/CMakeLists.txt` — find Homebrew + libpcre2-8
portably
- Detect Apple platforms via `if(APPLE)`, call `brew --prefix` once, add
`${HOMEBREW_PREFIX}/include` and `${HOMEBREW_PREFIX}/lib`. No effect on
Linux.
- Replace the literal `libpcre2-8.a` link token (which only the Linux
linker finds in `/usr/local/lib` by default) with
`find_library(PCRE2_LIB NAMES pcre2-8 REQUIRED)`. Works on
`/usr/lib/x86_64-linux-gnu` (Debian/Ubuntu), `/usr/local/lib` (Intel Mac
& legacy Linux), `/opt/homebrew/lib` (Apple Silicon).
### 2. `internal/cpp/wordnet_lemmatizer.cpp` +
`internal/cpp/rag_analyzer.cpp` — explicit `#include <sstream>`
libstdc++ (Linux) pulls `<sstream>` in transitively via `<fstream>`;
libc++ (Apple Clang) doesn't, so the existing `std::istringstream` /
`std::ostringstream` uses fail to compile on macOS. One-line include in
each file.
### 3. `internal/cpp/rag_analyzer.cpp` — `Join` template overload fix
`Join(tokens, start, tokens.size(), delim)` at line 146 passes `size_t`
to an `int` parameter. C++23 strict mode in Apple Clang refuses the
implicit narrowing and reports the 4-arg overload as a substitution
failure, leaving the call ambiguous between the 3-arg and 4-arg
templates. Fix: explicit `static_cast<int>(tokens.size())`. Behaviour
identical on libstdc++ — the narrowing was always intentional.
### 4. `internal/binding/rag_analyzer.go` — split darwin CGO LDFLAGS
The existing `#cgo darwin LDFLAGS: ... /usr/local/lib/libpcre2-8.a` only
matches Intel Macs. Apple Silicon Homebrew installs to `/opt/homebrew`.
Split into `darwin,arm64` and `darwin,amd64` build constraints with the
right absolute path on each.
### 5. `build.sh` — accept Homebrew path in the pcre2 sanity check
The sanity check looked at two Linux paths only and then fell through to
`sudo apt -y install libpcre2-dev` on failure. Added
`/opt/homebrew/lib/libpcre2-8.a`, and on Darwin failure now exits
cleanly with the right `brew install pcre2` hint instead of trying
`apt`.
## Verified
- `bash build.sh` now completes on macOS arm64 (Apple Silicon, brew 4.x,
cmake 4.x, Apple Clang 17, Go 1.25, pcre2 10.x, simde 0.8.x).
- Produced binaries: `bin/server_main`, `bin/admin_server`,
`bin/ragflow_cli`.
- `bin/server_main` boots, connects MySQL, runs migrations, loads the 64
model provider configs cleanly.
- Still builds on Linux — the CMake additions are inside an `if(APPLE)`
guard, the `find_library` call matches Linux paths too, the build.sh
check still tries `apt` when not on Darwin.
## Out of scope
The Go server itself currently fails at runtime when not pointing at
Elasticsearch (`Failed to initialize doc engine: failed to ping
Elasticsearch`), but that's the placeholder Infinity engine documented
in `internal/engine/README.md` — unrelated to this build patchset.
---
Happy to split this into smaller PRs if you'd prefer (one per file). The
five changes are independent.
## What
- Add Perplexity as a chat and embedding provider backed by its
OpenAI-compatible `/chat/completions` and `/v1/embeddings` APIs
- Register Perplexity in the Go model factory and provider config
- Support non-streaming chat, SSE streaming chat, embeddings, model
listing, and connection checks
Refs #14736
---------
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
- Adds an `Astraflow` Go driver so the new API server can route
Astraflow (UCloud ModelVerse) chat instances, matching the existing
Python `AstraflowChat` (`rag/llm/chat_model.py:1237`). Follows the same
SaaS-driver shape used for Avian, Novita, TogetherAI, Replicate,
DeepInfra, Upstage, and LongCat.
Closes#15062
---------
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Closes#15044.
Avian was listed unchecked in the Go-rewrite tracker #14736 and already
had an llm_factories.json entry with 4 preconfigured chat models
(deepseek-v3.2, kimi-k2.5, glm-5, minimax-m2.5), but the Go API server
had no driver to route them. The Python side has supported Avian at
rag/llm/chat_model.py:1220 (AvianChat) via the LiteLLM openai/ provider
with default base https://api.avian.io/v1.
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
`ReplicateModel.Embed` in `internal/entity/models/replicate.go` was a
`"replicate, no such method"` stub. Tracking issue #14736 lists
Replicate's embedding surface as not implemented. This PR wires it up
against Replicate's documented embedding schema.
Until this PR, a tenant who selected a Replicate embedding model got the
sentinel error on every embed call.
Co-authored-by: sxxtony <sxxtony@users.noreply.github.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
This PR adds a new `Browser` operator to Agent workflows, enabling
prompt-driven browser automation in RAGFlow.Technically based
‘Browser-Use’
It includes:
- Backend browser component execution with tenant LLM integration
- Upload source support (file IDs, URLs, variables, CSV/JSON array)
- Downloaded file persistence to RAGFlow storage
- Frontend node/operator integration, form config, icon, and i18n
updates
- Unit tests for upload/download and ID parsing logic
- Dependency and Docker updates for browser-use runtime support
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
## Summary
- Adds a lightweight `@tool` decorator and `FunctionToolSession` adapter
in `rag/llm/tool_decorator.py` that let callers register plain Python
functions as LLM tools without hand-writing OpenAI function schemas or
building an MCP-style session.
- Refactors `Base.bind_tools` and `LiteLLMBase.bind_tools` in
`rag/llm/chat_model.py` to accept either the new decorator form
`bind_tools(tools=[fn1, fn2])` or the existing `(toolcall_session,
tools_schemas)` form, so existing agent/dialog call-sites in
`agent/component/agent_with_tools.py`, `api/db/services/llm_service.py`,
and `api/db/services/dialog_service.py` are unaffected.
- Adds 8 unit tests in `test/unit_test/rag/llm/test_tool_decorator.py`
covering schema shape, required/optional inference, sync + async
dispatch, and bad-input rejection.
## Usage
```python
from rag.llm.tool_decorator import tool
@tool
def get_weather(city: str) -> str:
"""Get current weather for a city.
:param city: City name to look up.
"""
return f"{city}: 21 C, partly cloudy"
chat_mdl.bind_tools(tools=[get_weather])
ans, tk = await chat_mdl.async_chat_with_tools(system, history)
```
The decorator introspects `inspect.signature` + type hints + the
docstring (`:param name:` style) and attaches an OpenAI-format
`openai_schema` to the callable. `FunctionToolSession` duck-types the
existing `ToolCallSession` protocol, dispatching async callables
directly and sync ones through `thread_pool_exec` so the event loop is
never blocked.
## Design notes
- `tool_decorator.py` deliberately does **not** live inside
`rag/llm/__init__.py` to avoid forcing every consumer through the heavy
provider auto-discovery loop and to sidestep a circular import
(`__init__.py` imports `chat_model`, which would otherwise need symbols
from `__init__.py`).
- `FunctionToolSession` is duck-typed against
`common.mcp_tool_call_conn.ToolCallSession` rather than explicitly
inheriting from it, so importing the decorator doesn't pull the MCP
client SDK into the import graph.
- Docstring parsing is intentionally minimal (`:param name:` only) to
keep this dependency-free; Google/NumPy styles can be added later via
`docstring_parser` if needed.
## Test plan
- [x] `python -m pytest test/unit_test/rag/llm/test_tool_decorator.py
-v` — 8 passed
- [x] `python -m pytest test/unit_test/rag/llm/
--ignore=test/unit_test/rag/llm/test_perplexity_embed.py` — 11 passed
(the ignored test has a pre-existing `numpy` import that's unrelated)
- [ ] Reviewer: smoke-test the new path end-to-end with a live model via
`chat_mdl.bind_tools(tools=[my_fn])` to confirm the OpenAI-format
schemas pass through unchanged
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
### What problem does this PR solve?
Closes#15048.
Several SDK session routes in `api/apps/sdk/session.py` called
`.split()` directly on `request.headers.get("Authorization")`. When
clients omitted the header, the handlers raised `AttributeError` before
returning the existing `Authorization is not valid!` response.
This PR centralizes SDK Authorization parsing in a small helper and
keeps the existing error response for missing, empty, or malformed
headers.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Tests
- `ZHIPU_AI_API_KEY=dummy uv run --python 3.13 --group test pytest
test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py::test_sdk_session_routes_missing_authorization_unit
-q`
- `uv run --python 3.13 --group test ruff check api/apps/sdk/session.py
test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py`
- `python3 -m py_compile api/apps/sdk/session.py
test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py`
- `git diff --check`
### What problem does this PR solve?
Remove duplicate function definitions in
`api/db/services/dialog_service.py`.
**Problem:** Two helper functions were defined twice in the same file,
but with different parameter orders:
- First definition (line 57):
`_resolve_reference_metadata(request_payload=None, config=None)`
- Second definition (line 136): `_resolve_reference_metadata(config,
request_payload=None)`
**Solution:** Keep the second definition (which is actually used by
other modules) and remove the first one to avoid confusion.
Additionally, remove duplicate `_enrich_chunks_with_document_metadata`
definition (keep line 140 version).
<img width="1584" height="313" alt="image"
src="https://github.com/user-attachments/assets/7daee832-244f-4bb2-8488-e3b65012a3f9"
/>
<img width="1672" height="359" alt="image"
src="https://github.com/user-attachments/assets/4fd2f523-273c-4b20-a7c9-ab35740b7834"
/>
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [x] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
## Summary
- Align **GET `/api/v1/documents/<doc_id>/download`** with
**`/preview`**: resolve extension and MIME type from the stored document
name when the **`ext` query parameter is omitted**, instead of
defaulting to `markdown`.
- When **`?ext=`** is present, behavior stays the same as before
(explicit extension / `Content-Type` mapping).
- Enforce the same access + document lookup pattern as preview
(**`accessible`** + **`get_by_id`**).
- Extend unit tests for the no-`ext` PDF filename case.
## Test plan
- [x] `uv run pytest
test/testcases/test_web_api/test_document_app/test_document_metadata.py::TestDocumentMetadataUnit::test_download_attachment_success_and_exception_unit`
- [x] Optional: `curl -sSI` against
`/api/v1/documents/<pdf_doc_id>/download` without `ext` and confirm
`Content-Type: application/pdf`
Fixes#15052.
POST /api/v1/dify/retrieval resolved the caller via @apikey_required
(injecting tenant_id) but then fetched the requested knowledge_id with
no tenant filter and ran the full retrieval pipeline against
kb.tenant_id (the owner). Any valid Dify-compatible API key could
retrieve chunks from any tenant whose KB UUID was known. Adds the
missing ownership check.
## Root Cause
api/apps/sdk/dify_retrieval.py line 253:
KnowledgebaseService.get_by_id(kb_id) fetched the KB by id alone, then
the handler used kb.tenant_id (the OWNER) to build the embedding model
and call the retriever. The caller tenant_id was only used downstream at
line 278 for retrieval_by_children, well after cross-tenant data was
already retrieved.
grep confirmed there was no KnowledgebaseService.accessible call
anywhere in the handler.
## Fix
Two-line guard immediately after the existing get_by_id lookup,
mirroring the pattern PR #14749 lands for the sibling sdk/doc.py routes
(download, parse, stop_parsing, retrieval_test):
e, kb = KnowledgebaseService.get_by_id(kb_id)
if not e:
return build_error_result(message="Knowledgebase not found!",
code=RetCode.NOT_FOUND)
+ if not KnowledgebaseService.accessible(kb_id, tenant_id):
+ return build_error_result(message="No authorization.",
code=RetCode.AUTHENTICATION_ERROR)
if kb.tenant_embd_id:
...
KnowledgebaseService.accessible already handles solo-tenant ownership,
team membership via TenantService.get_joined_tenants_by_user_id, and the
permission=ME distinction. No behavior change for legitimate callers;
cross-tenant callers now receive RetCode.AUTHENTICATION_ERROR (109).
## Test Plan
- [x] Regression test added:
test/unit_test/api/apps/sdk/test_dify_retrieval.py
- test_cross_tenant_request_is_rejected -- attacker tenant calling owner
tenant KB gets 109; retriever is not invoked
- test_same_tenant_request_succeeds -- owner tenant gets the records
back
- test_missing_knowledge_base_returns_not_found -- missing KB returns
404 BEFORE the access check fires (legit callers see the clearer
message)
- [x] All 3 tests pass after the fix
- [x] Cross-tenant test FAILS on pre-fix main (KeyError on result[code]
because handler leaks records dict instead of returning auth error)
- [x] ruff check clean on both changed files
- [x] No drive-by reformatting in dify_retrieval.py -- only the 2 added
lines
### Post-fix output
test_cross_tenant_request_is_rejected PASSED [ 33%]
test_same_tenant_request_succeeds PASSED [ 66%]
test_missing_knowledge_base_returns_not_found PASSED [100%]
============================== 3 passed in 0.04s
===============================
Closes#15027
### What problem does this PR solve?
Closes#15076
Two endpoints in `api/apps/restful_apis/chat_api.py` accepted a
`user_id` field from the request body and used it directly when creating
a session:
```python
# before (vulnerable)
"user_id": req.get("user_id", current_user.id) # create_session
conv = await _create_session_for_completion(chat_id, dia, req.get("user_id", current_user.id)) # session_completion
```
Any authenticated caller could supply an arbitrary `user_id` and have
the new session attributed to a different user — effectively spoofing
session ownership. Both call sites are now fixed to always use
`current_user.id`, which is set by the authentication middleware and
cannot be tampered with via the request payload.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Changes
| File | Change |
|------|--------|
| `api/apps/restful_apis/chat_api.py` | Remove `req.get("user_id", ...)`
fallback in `create_session` and `session_completion`; always use
`current_user.id` |
|
`test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py`
| Add `test_create_session_user_id_not_spoofable` and
`test_session_completion_user_id_not_spoofable` (both `@pytest.mark.p2`)
|
### Testing
Two new unit tests assert that a `user_id` value supplied in the request
body is silently ignored and the session is always owned by the
authenticated user:
```
test_create_session_user_id_not_spoofable
test_session_completion_user_id_not_spoofable
```
Run with:
```bash
uv run pytest test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py -k "spoofable" -v
```
## What problem does this PR solve?
Closes#15021.
The Go model-provider layer had no support for **Azure OpenAI**. Azure
OpenAI is *not* a drop-in base-URL swap of the OpenAI driver — it
differs in authentication, endpoint structure, and how models are listed
— so it needs its own `ModelDriver` implementation.
## Type of change
- [x] New feature (non-breaking change which adds functionality)
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fixes#15023
GPUStack is listed as unchecked in the Go-rewrite tracker #14736, and
`internal/service/llm.go:171` already classifies it as a self-deployed
provider alongside Ollama, Xinference, LocalAI, and LM Studio — but
`internal/entity/models/` had no `gpustack.go` driver, so the new Go API
server could not route GPUStack instances. This PR adds the chat surface
for GPUStack so it lines up with the existing self-hosted Go drivers.
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
## Summary
- Replaces the `"no such method"` stub on `XinferenceModel.Embed`
(`internal/entity/models/xinference.go`) with a real implementation
against Xinference's OpenAI-compatible `/v1/embeddings` endpoint.
- Adds the `"embedding": "v1/embeddings"` URL suffix to
`conf/models/xinference.json`.
- Mirrors the Python `XinferenceEmbed` class in
`rag/llm/embedding_model.py:407` for payload shape (OpenAI-compatible
`model + input` → `data[*].index + data[*].embedding`) and tolerates the
same no-auth default Xinference deployments use. Authorization is only
sent when a non-empty API key is configured, via the existing
`setXinferenceAuth` helper.
- Reuses the existing `normalizeXinferenceBaseURL` + `baseURLForRegion`
helpers so both `http://127.0.0.1:9997` and `http://127.0.0.1:9997/v1`
resolve to the same `/v1/embeddings` target without doubled `/v1`.
- Validates response indices — duplicate, missing, or out-of-range
`data[*].index` values fail with a clear error rather than silently
producing misaligned vectors.
- Returns `[]EmbeddingData` in original input order (placed by `Index`)
so downstream callers can index positionally without re-sorting.
- Forwards `EmbeddingConfig.Dimension` as `dimensions` when `> 0`,
matching the OpenAI cluster pattern.
Closes#14810
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fixes#15012
The Novita Go driver landed in #14850 and shipped a stub `Rerank` method
that returned `"novita, no such method"`, so Novita could not be used as
a rerank provider in RAGFlow. This PR fills that gap, in the same way
#14895 filled the Embed gap on the same driver.
Novita exposes a public rerank endpoint at `POST
https://api.novita.ai/openai/v1/rerank` that accepts the
Cohere-compatible request shape (`{model, query, documents, top_n}`)
with `Authorization: Bearer <api_key>`. `baai/bge-reranker-v2-m3` is
documented in Novita's model library with a 1024-token limit.
### What problem does this PR solve?
Fixes#14816
The Xinference Go driver landed chat in #14938 and Embed is in review in
#14932, but `Rerank` shipped as a stub that returns `"xinference, no
such method"`. Tenants who launch a rerank model with `--model-type
rerank` on their Xinference instance cannot route it through the Go API
server. This PR fills the gap.
Xinference exposes an OpenAI-compatible REST API. The rerank endpoint is
at `POST <base>/v1/rerank` and accepts the Cohere-shaped body `{model,
query, documents, top_n}`, returning `{results: [{index,
relevance_score}]}` — the same wire shape used by the merged NVIDIA
(#14778), Aliyun (#14676), Gitee (#14656), ZhipuAI (#14608), Novita
(#15014), and LocalAI (#14813) Rerank implementations. Documented in
[Xinference rerank
docs](https://inference.readthedocs.io/en/v1.6.1/models/model_abilities/rerank.html);
the [builtin rerank model
catalog](https://inference.readthedocs.io/en/stable/models/builtin/rerank/)
lists `bge-reranker-base`, `bge-reranker-large`, `bge-reranker-v2-m3`,
and others.