### What problem does this PR solve?
This PR implements ASR and TTS support for the ZhipuAI Go driver.
The ZhipuAI model config already advertises `glm-asr-2512` as an ASR
model, but the Go driver returned `zhipu, no such method` from
`TranscribeAudio`. This adds the documented audio transcription endpoint
suffix and sends multipart transcription requests with `model`,
`stream=false`, and `file` fields.
Per maintainer review, this also adds the ZhipuAI TTS endpoint suffix
and implements `AudioSpeech` / `AudioSpeechWithSender` for `glm-tts`.
Closes#15133
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Closes#15089.
Adds PPIO support to the Go model-provider layer so PPIO instances can
be routed through the Go API server with the same OpenAI-compatible
chat, streaming, model listing, and connection-check flow used by other
SaaS providers.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
## Summary
- Added a PPIO Go model driver.
- Added the PPIO provider catalog and default OpenAI-compatible API URL.
- Registered PPIO in the model factory.
- Added focused provider and provider-manager tests.
## What changed
- Implemented chat completions, SSE streaming, ListModels, and
CheckConnection for PPIO.
- Covered request shape, stream termination, reasoning fallback, model
listing, custom base URLs, safe transport setup, unsupported methods,
and provider config loading.
- Kept the provider catalog aligned with the existing RAGFlow PPIO
factory model set.
- Cleaned up pre-existing Go model package validation blockers so the
scoped provider tests can run normally with vet enabled.
## Why
The existing Python/provider catalog path includes PPIO, but the Go
model-provider layer did not have a PPIO driver, so the Go API server
could not instantiate or use PPIO as requested in #15089.
### What problem does this PR solve?
implement rerank, asr, tts for TogetherAI
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
1. update python version to 3.13
2. upgrade ormsgpack to 1.6.0
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
implement ASR and TTS for Xinference
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
## Summary
Fixes 10 unguarded `response.choices[0]` accesses that cause
`IndexError` or `AttributeError` when the LLM returns an empty `choices`
list — the scenario described in #14711.
- `rag/llm/cv_model.py`
- `rag/llm/chat_model.py`
Each access site is now guarded with:
```python
if not response.choices:
raise ValueError("LLM returned empty response")
```
## Verification
Detected and verified by [pact](https://github.com/qizwiz/pact) — a
sheaf-cohomological LLM contract checker using Z3 as a local theory
solver.
**pact sheaf-cohomological proof status after fix:**
| File | Ȟ¹ (after) | Z3 |
|------|-----------|-----|
| `rag/llm/cv_model.py` | 0 | UNSAT ✓ |
| `rag/llm/chat_model.py` | 0 | UNSAT ✓ |
All access sites proven safe (Z3 UNSAT certificate).
The checker was also used to verify the autogen streaming-None fix in
[microsoft/autogen#7711](https://github.com/microsoft/autogen/pull/7711).
## Test plan
- [ ] Existing test suite passes
- [ ] Manually test with a provider that returns empty `choices` under
load (e.g. Vertex AI)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Signed-off-by: Jonathan Hill <jonathan.f.hill@gmail.com>
`GET /agents/<agent_id>/sessions/<session_id>` crashed with
`AttributeError: 'NoneType' object has no attribute 'to_dict'` when the
session lookup failed: `_, conv =
API4ConversationService.get_by_id(...)` returned `(False, None)`, then
`conv.to_dict()` was called unconditionally.
This is reachable in multi-instance deployments: the session row may not
yet be visible on the node servicing the immediate follow-up GET after a
session is created on a different node.
Add the same `if not exists` guard already used by every other call site
of `API4ConversationService.get_by_id` (see agent_api.py:1147,
sdk/session.py:179, conversation_service.py:248, canvas_service.py:323).
Closes#14989
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Replace the RuntimeError with a warning + first-address fallback so a
single email whose From header contains multiple addresses no longer
crashes the entire IMAP sync task. Also add regression tests covering:
- #14963: RFC 5322 quoted display names with commas (e.g. "Schlüter,
Sabine" <s@x>) parsed as one address, not two.
- #14964: multi-address headers warn instead of raising.
Closes#14964
Refs #14963
## Summary
- Bump pinned nginx in `Dockerfile` from `1.29.5-1~noble` (vulnerable)
to `1.31.0-1~noble` to remediate **CVE-2026-42945**.
## Root Cause
`Dockerfile:58` pinned `ARG NGINX_VERSION=1.29.5-1~noble`. Per the
official nginx security advisory, **CVE-2026-42945** is a buffer
overflow in `ngx_http_rewrite_module` triggered via the `rewrite` and
`set` directives, affecting nginx **0.6.27 through 1.30.0**. `1.29.5`
falls inside that range, so the shipped image is vulnerable.
References:
- nginx security advisories:
https://nginx.org/en/security_advisories.html
- Vendor advisory: https://my.f5.com/manage/s/article/K000161019
- Fixed versions: `1.31.0` (mainline) and `1.30.1` (stable)
## Fix
Single-line change in `Dockerfile:58`:
```diff
-ARG NGINX_VERSION=1.29.5-1~noble
+ARG NGINX_VERSION=1.31.0-1~noble
### What problem does this PR solve?
Fixes#14997.
RAPTOR builds on the Infinity backend have been broken since v0.25.2
introduced the `extra` field in code (`rag/svr/task_executor.py:1011`)
without declaring it in `conf/infinity_mapping.json`. Every RAPTOR job
fails with:
```
infinity.common.InfinityException: (3013, 'Fail to bind the expression: extra@src/planner/expression_binder_impl.cpp:99')
```
The auto-migration in
`common/doc_store/infinity_conn_base.py:_migrate_db()` adds any columns
it finds in the mapping JSON to existing tables — so the only thing
standing between users and a working RAPTOR build is that one missing
declaration. OceanBase, ES, and OpenSearch were unaffected because they
store `extra` as a native JSON type; only Infinity (which has a strict
`varchar`/`integer`/`float` schema) needed the addition.
### The fix
Two-part change:
1. **`conf/infinity_mapping.json`**: declare `"extra": {"type":
"varchar", "default": ""}`. On next startup, `_migrate_db()` adds the
column to all existing chunk tables — no manual DDL needed for upgrading
installations.
2. **`rag/utils/infinity_conn.py` `insert()`**: serialize the `extra`
dict to a JSON string at write time, since Infinity's `varchar` can't
store a Python dict directly. Modelled on the existing `chunk_data`
handling a few lines above.
The read path (`rag/utils/raptor_utils.py:_as_extra_dict`) already
normalises both dict and JSON-string inputs, so no read-side change is
needed. Other backends are untouched — `task_executor.py` still writes
the dict, and the OceanBase/ES/OpenSearch insert paths handle dicts
natively.
### Verification
Tested on a v0.25.4 deployment with the Infinity backend by applying the
same two changes via mounted-volume override:
- Confirmed `_migrate_db()` adds the `extra` column to all pre-existing
chunk tables on startup (column visible via Infinity's
`show_columns()`).
- Triggered RAPTOR builds on four datasets (~21k chunks total) via `POST
/api/v1/datasets/<id>/index?type=raptor`.
- All four progressed past the previously-failing
`get_raptor_chunk_methods()` call into actual entity-extraction and
clustering work without the (3013) error.
- GraphRAG builds (which can trigger the same path indirectly via
`task_executor.py:857`) also progressed cleanly.
### Type of change
- [X] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
`UpstageModel.ChatStreamlyWithSender` (in the driver merged via #14819)
only extracted `delta.content` from each SSE event. For the `solar-pro3`
reasoning family (and any future Upstage model that follows the same
wire shape), the chain-of-thought is streamed in a **separate
`delta.reasoning` field**, and the driver was silently dropping all of
it.
The non-streaming path already extracts `message.reasoning` into
`ChatResponse.ReasonContent` (added earlier in this PR's history), so
the same model produced **inconsistent behavior** between streaming and
non-streaming: a tenant calling `solar-pro3` with `reasoning_effort:
high` would see the reasoning trace if they used `ChatWithMessages` but
not if they used `ChatStreamlyWithSender`.
### Live evidence
Probed against `api.upstage.ai/v1/chat/completions` with `solar-pro3` +
`reasoning_effort: high` + `stream: true` (8000-token budget so the
reasoning has room to finish):
```
$ curl -sN -H "Authorization: Bearer <key>" -H "Content-Type: application/json" \
-X POST https://api.upstage.ai/v1/chat/completions \
-d '{"model":"solar-pro3","messages":[{"role":"user","content":"Compute 15% of 80."}],
"max_tokens":8000,"stream":true,"reasoning_effort":"high"}'
# across 168 SSE events:
# delta keys seen: [content reasoning role]
# delta.content total len: 121 chars (the visible answer)
# delta.reasoning total len: 159 chars (the chain-of-thought) <- driver dropped this
```
A representative event showing both fields side by side:
```json
data: {"choices":[{"index":0,"delta":{"reasoning":"15% = 0.15."}}]}
data: {"choices":[{"index":0,"delta":{"content":"15% of 80 is "}}]}
```
The 159 chars of reasoning were arriving on the wire and being thrown
away. `solar-pro2` was also probed (625 events); it does **not** emit
`delta.reasoning` — its reasoning is inlined into `delta.content` — so
this change is a no-op for it and for `solar-mini`.
### What this PR includes
- `internal/entity/models/upstage.go`: in the SSE scanner loop, extract
`delta.reasoning` before `delta.content` and forward each non-empty
chunk via the sender's second arg (the existing `reasonContent` channel
the non-stream path already populates).
The ordering contract is documented inline: reasoning chunks within a
single SSE event are emitted before content chunks, so a UI that pipes
both sees the chain-of-thought start before the answer for that token,
matching the wire order Upstage emits.
- `internal/entity/models/upstage_test.go`: three new tests pinning the
new behavior:
- `TestUpstageStreamExtractsReasoningDelta` — reasoning + content
forwarded to the right sender args; one-of invariant per call
- `TestUpstageStreamReasoningChunksArriveBeforeContent` — ordering
pinned within a single SSE event that carries both fields
- `TestUpstageStreamWithoutReasoningStillWorks` — regression net:
non-reasoning models (`solar-mini`, `solar-pro2`) continue to work; the
reason callback never fires
No interface change. No factory change. No config change.
### How was this tested?
```
$ go test -vet=off -run TestUpstage -count=1 -v ./internal/entity/models/...
... (existing tests 1..9 still pass) ...
=== RUN TestUpstageStreamExtractsReasoningDelta
--- PASS: TestUpstageStreamExtractsReasoningDelta (0.01s)
=== RUN TestUpstageStreamReasoningChunksArriveBeforeContent
--- PASS: TestUpstageStreamReasoningChunksArriveBeforeContent (0.01s)
=== RUN TestUpstageStreamWithoutReasoningStillWorks
--- PASS: TestUpstageStreamWithoutReasoningStillWorks (0.00s)
PASS
ok ragflow/internal/entity/models 0.034s
```
12/12 Upstage tests pass on go 1.25. `go build
./internal/entity/models/...` exits 0.
**Live integration test** (smoke test not committed) — the patched
driver was run directly against `api.upstage.ai/v1` with the same prompt
that produced the curl evidence above:
```
=== RUN TestUpstageStreamReasoningLiveSmoke
[OK] visible content: 50 chunks, 84 chars
[OK] reasoning: 39 chunks, 90 chars
content head 200: "\\(15\\% = \\frac{15}{100}=0.15\\).\n\n\\[\n0.15 \\times 80 = 12.\n\\]\n\n**15 % of 80 is 12.**"
reasoning head 200: "We need to compute 15% of 80. That's 0.15 * 80 = 12. So answer is 12. Provide explanation."
UPSTAGE STREAM REASONING SMOKE PASSED
--- PASS: TestUpstageStreamReasoningLiveSmoke (1.97s)
```
Before this fix, the same call would have produced **0 reasoning
chunks**. The 90 chars of reasoning that the patched driver now surfaces
are the chain-of-thought solar-pro3 emits when reasoning_effort is high.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
`MistralModel.ChatWithMessages` (in the driver merged via #14807)
assumes that `choices[0].message.content` from `/v1/chat/completions` is
always a string and falls through to `return nil, fmt.Errorf("invalid
content format")` on anything else.
That assumption breaks for the **magistral reasoning family**
(`magistral-small-*`, `magistral-medium-*`). When the model needs a
chain-of-thought to answer, Mistral returns `content` as a **structured
array of typed parts**:
```json
"content": [
{"type": "thinking",
"thinking": [{"type": "text", "text": "Combined speed is 150 mph. 300 / 150 = 2 hours."}],
"closed": true},
{"type": "text", "text": "They will meet after **2 hours**."}
]
```
Concretely, this is what the live API returns today (probed against
`api.mistral.ai/v1`):
```
$ curl -H "Authorization: Bearer <key>" -H "Content-Type: application/json" \
-X POST https://api.mistral.ai/v1/chat/completions \
-d '{"model":"magistral-medium-latest",
"messages":[{"role":"user","content":"two trains 60mph and 90mph, 300mi apart, when do they meet? step by step."}],
"max_tokens":1024}'
HTTP 200
{ "choices":[{"message":{
"role":"assistant",
"content":[
{"type":"thinking","thinking":[{"type":"text","text":"Okay, let's see..."}],"closed":true},
{"type":"text","text":"To determine when the two trains meet..."}
]}}] }
```
With the current driver, every call like that returns the generic
`"invalid content format"` error. Trivial prompts that happen to fit in
a string answer still succeed, so the breakage is **non-deterministic
from the tenant's POV**: same model, same provider, sometimes works,
sometimes 500s with no useful error.
A secondary issue: `conf/models/mistral.json` does not include any
magistral model. The picker hid the broken path, which is why this
wasn't caught during #14807's review.
### What this PR includes
- New helper `extractMistralContent(raw interface{}) (answer,
reasonContent string, err error)` in
`internal/entity/models/mistral.go`, which normalizes both shapes
Mistral can return:
- `string` → historical path. `Answer = content`, `ReasonContent = ""`.
Preserves behavior for every non-reasoning model (`mistral-large-*`,
`mistral-small-*`, `ministral-*`, `codestral-*`, `pixtral-*`,
`open-mistral-nemo`).
- `[]interface{}` → walk the parts. Concatenate every `{"type":"text",
"text":...}` part into `Answer`; concatenate the inner text inside every
`{"type":"thinking", "thinking":[...]}` part into `ReasonContent`.
- `ChatWithMessages` now calls the helper instead of doing the raw
`.(string)` cast.
- Unknown part types are **skipped, not failed**. Mistral has been
adding new content variants quickly (audio chunks, citations, etc.);
this driver should not 500 every call when a new part type appears.
- `conf/models/mistral.json`: add `magistral-medium-latest` and
`magistral-small-latest`. Both are visible in `/v1/models` today.
No interface change. No factory change. No new dependencies.
### How was this tested?
**Unit tests** — 5 new tests in `internal/entity/models/mistral_test.go`
on top of the 27 already shipped via #14807:
- `TestMistralChatHandlesStringContent` — regression net for the
historical path
- `TestMistralChatExtractsReasoningFromStructuredContent` — the fixture
body is a trimmed copy of the actual `magistral-medium-latest` response
captured above; asserts both `Answer` and `ReasonContent` are populated
correctly
- `TestMistralChatHandlesStructuredContentWithoutThinking` —
`magistral-*` with a trivial answer returns a structured shape that has
only a `text` part; `ReasonContent` must stay empty
- `TestMistralChatIgnoresUnknownContentPartTypes` — `audio_url` and
`future_part_type` parts are skipped, `text` parts still flow through
- `TestExtractMistralContent` — table-driven unit coverage of the helper
for string, empty string, nil, empty array, text-only, thinking+text,
unsupported root type
```
$ go test -vet=off -run "TestMistral|TestExtractMistralContent" -count=1 -v ./internal/entity/models/...
=== RUN TestMistralChatHandlesStringContent
--- PASS: TestMistralChatHandlesStringContent (0.00s)
=== RUN TestMistralChatExtractsReasoningFromStructuredContent
--- PASS: TestMistralChatExtractsReasoningFromStructuredContent (0.00s)
=== RUN TestMistralChatHandlesStructuredContentWithoutThinking
--- PASS: TestMistralChatHandlesStructuredContentWithoutThinking (0.00s)
=== RUN TestMistralChatIgnoresUnknownContentPartTypes
--- PASS: TestMistralChatIgnoresUnknownContentPartTypes (0.00s)
=== RUN TestExtractMistralContent
=== RUN TestExtractMistralContent/plain_string
=== RUN TestExtractMistralContent/empty_string
=== RUN TestExtractMistralContent/nil
=== RUN TestExtractMistralContent/empty_array
=== RUN TestExtractMistralContent/text_only
=== RUN TestExtractMistralContent/thinking_then_text
=== RUN TestExtractMistralContent/unknown_root_type
--- PASS: TestExtractMistralContent (0.00s)
PASS
ok ragflow/internal/entity/models 0.046s
```
All 32 Mistral tests pass on go 1.25. `go build
./internal/entity/models/...` exits 0.
**Live integration test** — driver exercised against `api.mistral.ai/v1`
with the patched code:
```
=== RUN TestMistralMagistralSmoke
[OK] "magistral-small-latest" present upstream
[OK] "magistral-medium-latest" present upstream
[OK trivial] Answer="7" ReasonContent=""
[OK reasoning] Answer len=797 head="To determine when the two trains meet, we can follow these steps:\n\n1. **Identify..."
ReasonContent len=1069 head="Okay, let's see. There are two trains, one going 60 mph and the other going 90 mph. They're moving towards each other, s..."
MAGISTRAL SMOKE PASSED
--- PASS: TestMistralMagistralSmoke (18.09s)
PASS
ok ragflow/internal/entity/models 18.112s
```
What the live run proves on the wire:
- `magistral-small-latest` with a trivial prompt still uses the
string-content shape; the regression-net path is exercised against the
real server, not just the mock.
- `magistral-medium-latest` with a reasoning prompt uses the
structured-array shape; the new code path extracts a 1069-character
reasoning trace into `ChatResponse.ReasonContent` and a 797-character
visible answer into `ChatResponse.Answer`. Before this fix, the same
call returned `"invalid content format"` and the caller saw nothing.
The smoke-test file itself is not committed (live tests live outside the
PR diff, same convention used for prior provider PRs).
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Problem
The Go server build pipeline (`build.sh` + CMake + CGO bindings) was
tested on Ubuntu only. On macOS arm64 with Homebrew it fails in five
orthogonal places. None of these require platform-specific code paths —
the same source builds on both Linux and Darwin after these fixes.
## Reproduction (before)
```
$ uname -a
Darwin … 25.4.0 arm64
$ brew install cmake pcre2 simde
$ bash build.sh
…
error: 'simde/x86/sse4.1.h' file not found
error: implicit instantiation of undefined template 'std::basic_istringstream<char>'
error: no matching function for call to 'Join'
…
clang: error: no such file or directory: '/usr/local/lib/libpcre2-8.a'
```
## Fix (5 small, orthogonal changes)
### 1. `internal/cpp/CMakeLists.txt` — find Homebrew + libpcre2-8
portably
- Detect Apple platforms via `if(APPLE)`, call `brew --prefix` once, add
`${HOMEBREW_PREFIX}/include` and `${HOMEBREW_PREFIX}/lib`. No effect on
Linux.
- Replace the literal `libpcre2-8.a` link token (which only the Linux
linker finds in `/usr/local/lib` by default) with
`find_library(PCRE2_LIB NAMES pcre2-8 REQUIRED)`. Works on
`/usr/lib/x86_64-linux-gnu` (Debian/Ubuntu), `/usr/local/lib` (Intel Mac
& legacy Linux), `/opt/homebrew/lib` (Apple Silicon).
### 2. `internal/cpp/wordnet_lemmatizer.cpp` +
`internal/cpp/rag_analyzer.cpp` — explicit `#include <sstream>`
libstdc++ (Linux) pulls `<sstream>` in transitively via `<fstream>`;
libc++ (Apple Clang) doesn't, so the existing `std::istringstream` /
`std::ostringstream` uses fail to compile on macOS. One-line include in
each file.
### 3. `internal/cpp/rag_analyzer.cpp` — `Join` template overload fix
`Join(tokens, start, tokens.size(), delim)` at line 146 passes `size_t`
to an `int` parameter. C++23 strict mode in Apple Clang refuses the
implicit narrowing and reports the 4-arg overload as a substitution
failure, leaving the call ambiguous between the 3-arg and 4-arg
templates. Fix: explicit `static_cast<int>(tokens.size())`. Behaviour
identical on libstdc++ — the narrowing was always intentional.
### 4. `internal/binding/rag_analyzer.go` — split darwin CGO LDFLAGS
The existing `#cgo darwin LDFLAGS: ... /usr/local/lib/libpcre2-8.a` only
matches Intel Macs. Apple Silicon Homebrew installs to `/opt/homebrew`.
Split into `darwin,arm64` and `darwin,amd64` build constraints with the
right absolute path on each.
### 5. `build.sh` — accept Homebrew path in the pcre2 sanity check
The sanity check looked at two Linux paths only and then fell through to
`sudo apt -y install libpcre2-dev` on failure. Added
`/opt/homebrew/lib/libpcre2-8.a`, and on Darwin failure now exits
cleanly with the right `brew install pcre2` hint instead of trying
`apt`.
## Verified
- `bash build.sh` now completes on macOS arm64 (Apple Silicon, brew 4.x,
cmake 4.x, Apple Clang 17, Go 1.25, pcre2 10.x, simde 0.8.x).
- Produced binaries: `bin/server_main`, `bin/admin_server`,
`bin/ragflow_cli`.
- `bin/server_main` boots, connects MySQL, runs migrations, loads the 64
model provider configs cleanly.
- Still builds on Linux — the CMake additions are inside an `if(APPLE)`
guard, the `find_library` call matches Linux paths too, the build.sh
check still tries `apt` when not on Darwin.
## Out of scope
The Go server itself currently fails at runtime when not pointing at
Elasticsearch (`Failed to initialize doc engine: failed to ping
Elasticsearch`), but that's the placeholder Infinity engine documented
in `internal/engine/README.md` — unrelated to this build patchset.
---
Happy to split this into smaller PRs if you'd prefer (one per file). The
five changes are independent.
## What
- Add Perplexity as a chat and embedding provider backed by its
OpenAI-compatible `/chat/completions` and `/v1/embeddings` APIs
- Register Perplexity in the Go model factory and provider config
- Support non-streaming chat, SSE streaming chat, embeddings, model
listing, and connection checks
Refs #14736
---------
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
- Adds an `Astraflow` Go driver so the new API server can route
Astraflow (UCloud ModelVerse) chat instances, matching the existing
Python `AstraflowChat` (`rag/llm/chat_model.py:1237`). Follows the same
SaaS-driver shape used for Avian, Novita, TogetherAI, Replicate,
DeepInfra, Upstage, and LongCat.
Closes#15062
---------
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Closes#15044.
Avian was listed unchecked in the Go-rewrite tracker #14736 and already
had an llm_factories.json entry with 4 preconfigured chat models
(deepseek-v3.2, kimi-k2.5, glm-5, minimax-m2.5), but the Go API server
had no driver to route them. The Python side has supported Avian at
rag/llm/chat_model.py:1220 (AvianChat) via the LiteLLM openai/ provider
with default base https://api.avian.io/v1.
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
`ReplicateModel.Embed` in `internal/entity/models/replicate.go` was a
`"replicate, no such method"` stub. Tracking issue #14736 lists
Replicate's embedding surface as not implemented. This PR wires it up
against Replicate's documented embedding schema.
Until this PR, a tenant who selected a Replicate embedding model got the
sentinel error on every embed call.
Co-authored-by: sxxtony <sxxtony@users.noreply.github.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
This PR adds a new `Browser` operator to Agent workflows, enabling
prompt-driven browser automation in RAGFlow.Technically based
‘Browser-Use’
It includes:
- Backend browser component execution with tenant LLM integration
- Upload source support (file IDs, URLs, variables, CSV/JSON array)
- Downloaded file persistence to RAGFlow storage
- Frontend node/operator integration, form config, icon, and i18n
updates
- Unit tests for upload/download and ID parsing logic
- Dependency and Docker updates for browser-use runtime support
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
## Summary
- Adds a lightweight `@tool` decorator and `FunctionToolSession` adapter
in `rag/llm/tool_decorator.py` that let callers register plain Python
functions as LLM tools without hand-writing OpenAI function schemas or
building an MCP-style session.
- Refactors `Base.bind_tools` and `LiteLLMBase.bind_tools` in
`rag/llm/chat_model.py` to accept either the new decorator form
`bind_tools(tools=[fn1, fn2])` or the existing `(toolcall_session,
tools_schemas)` form, so existing agent/dialog call-sites in
`agent/component/agent_with_tools.py`, `api/db/services/llm_service.py`,
and `api/db/services/dialog_service.py` are unaffected.
- Adds 8 unit tests in `test/unit_test/rag/llm/test_tool_decorator.py`
covering schema shape, required/optional inference, sync + async
dispatch, and bad-input rejection.
## Usage
```python
from rag.llm.tool_decorator import tool
@tool
def get_weather(city: str) -> str:
"""Get current weather for a city.
:param city: City name to look up.
"""
return f"{city}: 21 C, partly cloudy"
chat_mdl.bind_tools(tools=[get_weather])
ans, tk = await chat_mdl.async_chat_with_tools(system, history)
```
The decorator introspects `inspect.signature` + type hints + the
docstring (`:param name:` style) and attaches an OpenAI-format
`openai_schema` to the callable. `FunctionToolSession` duck-types the
existing `ToolCallSession` protocol, dispatching async callables
directly and sync ones through `thread_pool_exec` so the event loop is
never blocked.
## Design notes
- `tool_decorator.py` deliberately does **not** live inside
`rag/llm/__init__.py` to avoid forcing every consumer through the heavy
provider auto-discovery loop and to sidestep a circular import
(`__init__.py` imports `chat_model`, which would otherwise need symbols
from `__init__.py`).
- `FunctionToolSession` is duck-typed against
`common.mcp_tool_call_conn.ToolCallSession` rather than explicitly
inheriting from it, so importing the decorator doesn't pull the MCP
client SDK into the import graph.
- Docstring parsing is intentionally minimal (`:param name:` only) to
keep this dependency-free; Google/NumPy styles can be added later via
`docstring_parser` if needed.
## Test plan
- [x] `python -m pytest test/unit_test/rag/llm/test_tool_decorator.py
-v` — 8 passed
- [x] `python -m pytest test/unit_test/rag/llm/
--ignore=test/unit_test/rag/llm/test_perplexity_embed.py` — 11 passed
(the ignored test has a pre-existing `numpy` import that's unrelated)
- [ ] Reviewer: smoke-test the new path end-to-end with a live model via
`chat_mdl.bind_tools(tools=[my_fn])` to confirm the OpenAI-format
schemas pass through unchanged
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
### What problem does this PR solve?
Closes#15048.
Several SDK session routes in `api/apps/sdk/session.py` called
`.split()` directly on `request.headers.get("Authorization")`. When
clients omitted the header, the handlers raised `AttributeError` before
returning the existing `Authorization is not valid!` response.
This PR centralizes SDK Authorization parsing in a small helper and
keeps the existing error response for missing, empty, or malformed
headers.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Tests
- `ZHIPU_AI_API_KEY=dummy uv run --python 3.13 --group test pytest
test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py::test_sdk_session_routes_missing_authorization_unit
-q`
- `uv run --python 3.13 --group test ruff check api/apps/sdk/session.py
test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py`
- `python3 -m py_compile api/apps/sdk/session.py
test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py`
- `git diff --check`
### What problem does this PR solve?
Remove duplicate function definitions in
`api/db/services/dialog_service.py`.
**Problem:** Two helper functions were defined twice in the same file,
but with different parameter orders:
- First definition (line 57):
`_resolve_reference_metadata(request_payload=None, config=None)`
- Second definition (line 136): `_resolve_reference_metadata(config,
request_payload=None)`
**Solution:** Keep the second definition (which is actually used by
other modules) and remove the first one to avoid confusion.
Additionally, remove duplicate `_enrich_chunks_with_document_metadata`
definition (keep line 140 version).
<img width="1584" height="313" alt="image"
src="https://github.com/user-attachments/assets/7daee832-244f-4bb2-8488-e3b65012a3f9"
/>
<img width="1672" height="359" alt="image"
src="https://github.com/user-attachments/assets/4fd2f523-273c-4b20-a7c9-ab35740b7834"
/>
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [x] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
## Summary
- Align **GET `/api/v1/documents/<doc_id>/download`** with
**`/preview`**: resolve extension and MIME type from the stored document
name when the **`ext` query parameter is omitted**, instead of
defaulting to `markdown`.
- When **`?ext=`** is present, behavior stays the same as before
(explicit extension / `Content-Type` mapping).
- Enforce the same access + document lookup pattern as preview
(**`accessible`** + **`get_by_id`**).
- Extend unit tests for the no-`ext` PDF filename case.
## Test plan
- [x] `uv run pytest
test/testcases/test_web_api/test_document_app/test_document_metadata.py::TestDocumentMetadataUnit::test_download_attachment_success_and_exception_unit`
- [x] Optional: `curl -sSI` against
`/api/v1/documents/<pdf_doc_id>/download` without `ext` and confirm
`Content-Type: application/pdf`
Fixes#15052.
POST /api/v1/dify/retrieval resolved the caller via @apikey_required
(injecting tenant_id) but then fetched the requested knowledge_id with
no tenant filter and ran the full retrieval pipeline against
kb.tenant_id (the owner). Any valid Dify-compatible API key could
retrieve chunks from any tenant whose KB UUID was known. Adds the
missing ownership check.
## Root Cause
api/apps/sdk/dify_retrieval.py line 253:
KnowledgebaseService.get_by_id(kb_id) fetched the KB by id alone, then
the handler used kb.tenant_id (the OWNER) to build the embedding model
and call the retriever. The caller tenant_id was only used downstream at
line 278 for retrieval_by_children, well after cross-tenant data was
already retrieved.
grep confirmed there was no KnowledgebaseService.accessible call
anywhere in the handler.
## Fix
Two-line guard immediately after the existing get_by_id lookup,
mirroring the pattern PR #14749 lands for the sibling sdk/doc.py routes
(download, parse, stop_parsing, retrieval_test):
e, kb = KnowledgebaseService.get_by_id(kb_id)
if not e:
return build_error_result(message="Knowledgebase not found!",
code=RetCode.NOT_FOUND)
+ if not KnowledgebaseService.accessible(kb_id, tenant_id):
+ return build_error_result(message="No authorization.",
code=RetCode.AUTHENTICATION_ERROR)
if kb.tenant_embd_id:
...
KnowledgebaseService.accessible already handles solo-tenant ownership,
team membership via TenantService.get_joined_tenants_by_user_id, and the
permission=ME distinction. No behavior change for legitimate callers;
cross-tenant callers now receive RetCode.AUTHENTICATION_ERROR (109).
## Test Plan
- [x] Regression test added:
test/unit_test/api/apps/sdk/test_dify_retrieval.py
- test_cross_tenant_request_is_rejected -- attacker tenant calling owner
tenant KB gets 109; retriever is not invoked
- test_same_tenant_request_succeeds -- owner tenant gets the records
back
- test_missing_knowledge_base_returns_not_found -- missing KB returns
404 BEFORE the access check fires (legit callers see the clearer
message)
- [x] All 3 tests pass after the fix
- [x] Cross-tenant test FAILS on pre-fix main (KeyError on result[code]
because handler leaks records dict instead of returning auth error)
- [x] ruff check clean on both changed files
- [x] No drive-by reformatting in dify_retrieval.py -- only the 2 added
lines
### Post-fix output
test_cross_tenant_request_is_rejected PASSED [ 33%]
test_same_tenant_request_succeeds PASSED [ 66%]
test_missing_knowledge_base_returns_not_found PASSED [100%]
============================== 3 passed in 0.04s
===============================
Closes#15027
### What problem does this PR solve?
Closes#15076
Two endpoints in `api/apps/restful_apis/chat_api.py` accepted a
`user_id` field from the request body and used it directly when creating
a session:
```python
# before (vulnerable)
"user_id": req.get("user_id", current_user.id) # create_session
conv = await _create_session_for_completion(chat_id, dia, req.get("user_id", current_user.id)) # session_completion
```
Any authenticated caller could supply an arbitrary `user_id` and have
the new session attributed to a different user — effectively spoofing
session ownership. Both call sites are now fixed to always use
`current_user.id`, which is set by the authentication middleware and
cannot be tampered with via the request payload.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Changes
| File | Change |
|------|--------|
| `api/apps/restful_apis/chat_api.py` | Remove `req.get("user_id", ...)`
fallback in `create_session` and `session_completion`; always use
`current_user.id` |
|
`test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py`
| Add `test_create_session_user_id_not_spoofable` and
`test_session_completion_user_id_not_spoofable` (both `@pytest.mark.p2`)
|
### Testing
Two new unit tests assert that a `user_id` value supplied in the request
body is silently ignored and the session is always owned by the
authenticated user:
```
test_create_session_user_id_not_spoofable
test_session_completion_user_id_not_spoofable
```
Run with:
```bash
uv run pytest test/testcases/test_http_api/test_session_management/test_session_sdk_routes_unit.py -k "spoofable" -v
```
## What problem does this PR solve?
Closes#15021.
The Go model-provider layer had no support for **Azure OpenAI**. Azure
OpenAI is *not* a drop-in base-URL swap of the OpenAI driver — it
differs in authentication, endpoint structure, and how models are listed
— so it needs its own `ModelDriver` implementation.
## Type of change
- [x] New feature (non-breaking change which adds functionality)
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fixes#15023
GPUStack is listed as unchecked in the Go-rewrite tracker #14736, and
`internal/service/llm.go:171` already classifies it as a self-deployed
provider alongside Ollama, Xinference, LocalAI, and LM Studio — but
`internal/entity/models/` had no `gpustack.go` driver, so the new Go API
server could not route GPUStack instances. This PR adds the chat surface
for GPUStack so it lines up with the existing self-hosted Go drivers.
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
## Summary
- Replaces the `"no such method"` stub on `XinferenceModel.Embed`
(`internal/entity/models/xinference.go`) with a real implementation
against Xinference's OpenAI-compatible `/v1/embeddings` endpoint.
- Adds the `"embedding": "v1/embeddings"` URL suffix to
`conf/models/xinference.json`.
- Mirrors the Python `XinferenceEmbed` class in
`rag/llm/embedding_model.py:407` for payload shape (OpenAI-compatible
`model + input` → `data[*].index + data[*].embedding`) and tolerates the
same no-auth default Xinference deployments use. Authorization is only
sent when a non-empty API key is configured, via the existing
`setXinferenceAuth` helper.
- Reuses the existing `normalizeXinferenceBaseURL` + `baseURLForRegion`
helpers so both `http://127.0.0.1:9997` and `http://127.0.0.1:9997/v1`
resolve to the same `/v1/embeddings` target without doubled `/v1`.
- Validates response indices — duplicate, missing, or out-of-range
`data[*].index` values fail with a clear error rather than silently
producing misaligned vectors.
- Returns `[]EmbeddingData` in original input order (placed by `Index`)
so downstream callers can index positionally without re-sorting.
- Forwards `EmbeddingConfig.Dimension` as `dimensions` when `> 0`,
matching the OpenAI cluster pattern.
Closes#14810
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fixes#15012
The Novita Go driver landed in #14850 and shipped a stub `Rerank` method
that returned `"novita, no such method"`, so Novita could not be used as
a rerank provider in RAGFlow. This PR fills that gap, in the same way
#14895 filled the Embed gap on the same driver.
Novita exposes a public rerank endpoint at `POST
https://api.novita.ai/openai/v1/rerank` that accepts the
Cohere-compatible request shape (`{model, query, documents, top_n}`)
with `Authorization: Bearer <api_key>`. `baai/bge-reranker-v2-m3` is
documented in Novita's model library with a 1024-token limit.
### What problem does this PR solve?
Fixes#14816
The Xinference Go driver landed chat in #14938 and Embed is in review in
#14932, but `Rerank` shipped as a stub that returns `"xinference, no
such method"`. Tenants who launch a rerank model with `--model-type
rerank` on their Xinference instance cannot route it through the Go API
server. This PR fills the gap.
Xinference exposes an OpenAI-compatible REST API. The rerank endpoint is
at `POST <base>/v1/rerank` and accepts the Cohere-shaped body `{model,
query, documents, top_n}`, returning `{results: [{index,
relevance_score}]}` — the same wire shape used by the merged NVIDIA
(#14778), Aliyun (#14676), Gitee (#14656), ZhipuAI (#14608), Novita
(#15014), and LocalAI (#14813) Rerank implementations. Documented in
[Xinference rerank
docs](https://inference.readthedocs.io/en/v1.6.1/models/model_abilities/rerank.html);
the [builtin rerank model
catalog](https://inference.readthedocs.io/en/stable/models/builtin/rerank/)
lists `bge-reranker-base`, `bge-reranker-large`, `bge-reranker-v2-m3`,
and others.
### What problem does this PR solve?
Add a Go driver for **n1n.ai** (https://docs.n1n.ai), one of the
unchecked providers on the umbrella tracking issue #14736. n1n.ai is an
OpenAI-compatible aggregator hosting a 450+ model catalog (GPT, Claude,
Gemini, DeepSeek, Kimi, Qwen, embedding + reranker families) under
`https://api.n1n.ai/v1`.
Until this PR, a tenant who configured `n1n` as a model provider in the
Go layer fell through to the default branch of
`internal/entity/models/factory.go` and got the dummy driver.
---------
Co-authored-by: sxxtony <sxxtony@users.noreply.github.com>
### What problem does this PR solve?
Fixes#15015
The TogetherAI Go driver in `internal/entity/models/togetherai.go`
shipped a stub `Embed` method that returned `"TogetherAI, no such
method"`, so TogetherAI could not be used as an embedding provider in
RAGFlow. This PR fills that gap.
TogetherAI exposes a public OpenAI-compatible embeddings endpoint at
`POST https://api.together.ai/v1/embeddings` that accepts the standard
`{model, input}` shape with `Authorization: Bearer <api_key>` (confirmed
in TogetherAI's official docs:
https://docs.together.ai/docs/embeddings-overview). Documented embedding
models include `intfloat/multilingual-e5-large-instruct`,
`BAAI/bge-large-en-v1.5`, and `BAAI/bge-base-en-v1.5`.
### Changes
- `internal/entity/models/togetherai.go`: implement
`TogetherAIModel.Embed`.
- Validate inputs (api key, model name) and short-circuit on empty
texts.
- Resolve region with the existing `baseURLForRegion` helper.
- Build URL from `URLSuffix.Embedding`.
- Send `{model, input}` POST body, add `dimensions` when
`embeddingConfig.Dimension > 0` (matches the pattern in #14735).
- Bearer auth + JSON content type, mirroring the chat path.
- Parse `{data: [{embedding, index}]}` and reorder by `index`, rejecting
out-of-range indices, duplicates, and missing entries so the output
always lines up with the input. Same shape as the merged Mistral,
Upstage, and Novita Embed implementations.
- `conf/models/togetherai.json`:
- Add `"embedding": "embeddings"` to `url_suffix`.
- Add default embedding model entries for
`intfloat/multilingual-e5-large-instruct`, `BAAI/bge-large-en-v1.5`, and
`BAAI/bge-base-en-v1.5`.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix: The logs on the data source details page are not fully displayed.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
1. Add model types when add model
---
```
RAGFlow(user)> add model 'pipeline' to provider 'mineru_local' instance 'test' with tokens 131072 doc_parse;
SUCCESS
```
2. implement provider: MinerU_Local
---
**Verified from CLI**
```
RAGFlow(user)> parse with 'pipeline@test@mineru_local' file './internal/test.pdf'
+--------------------------------------+
| task_id |
+--------------------------------------+
| c7260e31-b6e2-4b36-955d-e9c60510c669 |
+--------------------------------------+
RAGFlow(user)> show 'test@mineru_local' task 'c7260e31-b6e2-4b36-955d-e9c60510c669'
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
| content | index |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+
| # Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
Bingxin Ke Anton Obukhov Shengyu Huang Nando Metzger Rodrigo Caye Daudt Konrad Schindler Photogrammetry and Remote Sensing, ETH Zurich ¨

### What problem does this PR solve?
RuntimeError: Cannot run the event loop while another loop is running
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: add local & ssh provider in admin panel
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Closes#15025
Langfuse-enabled `dialog_service.async_chat()` regressed to
`langfuse_tracer.start_generation(...)` after the earlier Langfuse v4
migration. Langfuse v4 uses `start_observation(as_type="generation")`,
so the remaining `start_generation` call can fail when chat tracing is
enabled.
This restores the migrated `start_observation(as_type="generation")`
call for chat observations while preserving the existing trace context,
model, input payload, and update/end flow. It also adds a regression
test with a fake Langfuse v4-style client that exposes
`start_observation()` but not `start_generation()`.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Tests
- `.venv/bin/pytest
test/unit_test/api/db/services/test_dialog_service_final_answer.py -q`
- `.venv/bin/ruff check api/db/services/dialog_service.py
test/unit_test/api/db/services/test_dialog_service_final_answer.py`
### What problem does this PR solve?
Fix: The folder tree menu for moving folders cannot be scrolled.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Closes#15029.
Some custom `base_url` paths in `ModelProviderService` call
`NewInstance(newURL)` and then immediately invoke methods on the
returned driver. Several real Go model drivers still return `nil` from
`NewInstance`, so those paths can panic instead of returning a normal
error.
This PR:
- centralizes custom base URL driver creation in `model_service.go`
- skips request-local driver creation when `base_url` is blank or
whitespace
- preserves the existing region key behavior when building the
request-local base URL map
- returns a clear error when the provider driver is missing or
`NewInstance` returns `nil`
- routes list/check/task and active model paths through the guarded
helper
- adds focused unit coverage for empty-region preservation, regional
base URLs, blank base URLs, nil drivers, and nil `NewInstance` results
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Test plan
- [x] `git diff --check upstream/main...HEAD`
- [x] `/root/go/bin/gofmt -w internal/service/model_service.go
internal/service/model_service_test.go`
- [x] `GOPATH=/root/gopath GOTOOLCHAIN=local /root/go/bin/go test
./internal/service -run TestNewModelDriverForBaseURL -count=1 -vet=off`
- [x] `GOPATH=/root/gopath GOTOOLCHAIN=local /root/go/bin/go build
./internal/service/... ./internal/entity/models/...`
Note: the same targeted `go test` command without `-vet=off` is
currently blocked by an existing unrelated vet finding in
`internal/service/llm.go:355` (`non-constant format string in call to
fmt.Errorf`).
### What problem does this PR solve?
extend restful api suite
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Other (please describe): test
### What problem does this PR solve?
This PR implement implement provider 302.AI and JieKouAI
**The following functionalities are now supported:**
**302.ai**
- [x] chat / think chat / stream chat / stream think chat
- [x] Embedding
- [x] ASR
- [x] ListModels
- [x] Provider connection checking
- [x] Balance
- [x] Rerank
- [x] OCR
- [x] Doc Parse
- [x] Show task
- [ ] ~~List Tasks!~~
- [ ] TTS
**JieKouAI**
- [x] chat / think chat / stream chat / stream think chat
- [x] Embedding
- [x] Rerank
- [x] ListModels
**Verified examples from the CLI:**
```palintext
# jiekouAI
RAGFlow(user)> stream think chat with 'zai-org/glm-4.5@test@jiekouai' message 'Hi'
Thinking: Let me think about how to respond to this simple greeting. The user just said "Hi", which is a basic and friendly way to start a conversation. I should respond in a similarly warm and welcoming manner.First, I need to acknowledge their greeting and reciprocate with enthusiasm. Something like "Hello!" or "Hi there!" would work well to create a positive atmosphere right from the start.Next, I should make it clear that I'm ready to help. Since they haven't asked anything specific yet, I'll keep it open-ended and inviting. Perhaps offering assistance with a question or task would encourage them to engage further.I should also maintain a professional yet approachable tone. Being an AI assistant, I want to convey that I'm knowledgeable and capable, but also friendly and easy to talk to.Let me put this all together into a concise response. I'll start with a cheerful greeting, express my readiness to help, and finish with an open invitation for them to share what's on their mind. This should create a welcoming environment for whatever they want to discuss next.
Answer: ! I'm Claude, an AI assistant created by Anthropic. I'm here to help you with information, answer questions, or assist you with tasks. What can I help you with today?
RAGFlow(user)> think chat with 'zai-org/glm-4.5@test@jiekouai' message 'Hi'
Thinking: Let me consider how to respond to this greeting. The user initiated with a simple "Hi," so a friendly and open response would be most appropriate to encourage further conversation. I should maintain a welcoming tone while offering assistance.
The response should accomplish a few key things: return the greeting warmly, show openness to conversation, and offer specific ways I can help. This approach demonstrates both approachability and usefulness.
I'll start with a greeting in return, then express my availability to help, and finish by suggesting some areas where I can provide assistance. This creates a natural flow from acknowledgment to support.
It's important to keep the response concise but inviting. Since the user hasn't specified their needs yet, I'll present a few broad categories of assistance to spark their thinking about what they might want to discuss or ask about.
The response should end with an encouraging note that prompts them to share what's on their mind, keeping the conversational ball in their court while making it clear I'm ready to engage with whatever they need.
Answer: Hello! How can I help you today? Whether you have questions, need information, or just want to chat, I'm here to assist.
RAGFlow(user)> embed text 'walkerwhat' 'jumperwho' with 'text-embedding-3-large@test@jiekouai' dimension 16
+-----------+-------+
| dimension | index |
+-----------+-------+
| 3072 | 0 |
| 3072 | 1 |
+-----------+-------+
RAGFlow(user)> rerank query 'what is rag' document 'rag is retrieval augment generation' 'rag need llm' 'famous rag project includes ragflow' with 'baai/bge-reranker-v2-m3@test@jiekouai' top 3
+-------+-----------------+
| index | relevance_score |
+-------+-----------------+
| 0 | 0.9830034 |
| 2 | 0.06399203 |
| 1 | 0.04665664 |
+-------+-----------------+
# 302.ai
RAGFlow(user)> think chat with 'kimi-k2.6@test@302.ai' message 'who r u'
Thinking: The user is asking "who r u" which is a casual way of asking "who are you." I need to identify myself as an AI assistant created by Moonshot AI. I should be friendly, concise, and helpful.
Key points to include:
- I am Kimi, an AI assistant made by Moonshot AI
- I can help with various tasks like answering questions, writing, analysis, coding, etc.
- Keep it casual but informative since the user used "r u" (text speak)
I should not:
- Pretend to be human
- Claim to have personal experiences or emotions
- Be overly formal or robotic
Simple, friendly response is best.
Answer: I'm Kimi, an AI assistant made by Moonshot AI. I can help you with answering questions, writing, coding, analysis, or just chatting. What can I do for you?
Time: 17.687750
RAGFlow(user)> stream think chat with 'kimi-k2.6@test@302.ai' message 'who r u'
Thinking: user asked "who r u" which is a casual way of asking "who are you." I should introduce myself as Kimi, an AI assistant developed by Moonshot AI. I need to be friendly, concise, and accurate. I should mention my capabilities briefly and keep the tone helpful. Since the user used casual text speak ("r u"), I can match that energy with a friendly but still informative tone.Key points:- I'm Kimi, an AI assistant made by Moonshot AI- I can help with various tasks like answering questions, writing, coding, analysis, etc.- Keep it brief but warm- Don't claim to be human- Don't over-explainDraft:"I'm Kimi, an AI assistant created by Moonshot AI. I can help with answering questions, writing, coding, analysis, brainstorming, and lots of other tasks. What can I do for you?"This is good - direct, accurate, and inviting.
Answer: Kimi, an AI assistant made by Moonshot AI. I can help with answering questions, writing, coding, analysis, brainstorming, and lots of other stuff. What can I do for you?
Time: 14.912576
RAGFlow(user)> asr with 'whisper-v3-turbo@test@302.ai' audio './internal/test.wav' param ''
+---------------------------------------------------------------------------------------------------------------------+
| text |
+---------------------------------------------------------------------------------------------------------------------+
| The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired |
+---------------------------------------------------------------------------------------------------------------------+
RAGFlow(user)> ocr with 'mistral-ocr-latest@test@302.ai' file './internal/test.pdf'
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| text |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| # Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
Bingxin Ke
Nando Metzger
Anton Obukhov
Rodrigo Caye Daudt
Shengyu Huang
Konrad Schindler
Photogrammetry and Remote Sensing, ETH Zürich

Figur... |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
RAGFlow(user)> parse with 'vlm@test@302.ai' file 'https://arxiv.org/pdf/2505.09358'
+--------------------------------------+
| task_id |
+--------------------------------------+
| 6de6eae6-c122-4b67-91e8-b061a0b8c087 |
+--------------------------------------+
RAGFlow(user)> show 'test@302.ai' task '6de6eae6-c122-4b67-91e8-b061a0b8c087'
+----------------------------------------------------------------------------+-------+
| content | index |
+----------------------------------------------------------------------------+-------+
| https://file.302.ai/gpt/imgs/20260519/b340fdff4774699c287fe4ee4658b317.zip | 0 |
+----------------------------------------------------------------------------+-------+
RAGFlow(user)> embed text 'walkerwhat' 'jumperwho' with 'jina-embeddings-v3@test@302.ai' dimension 16
+-----------+-------+
| dimension | index |
+-----------+-------+
| 1024 | 0 |
| 1024 | 1 |
+-----------+-------+
RAGFlow(user)> rerank query 'what is rag' document 'rag is retrieval augment generation' 'rag need llm' 'famous rag project includes ragflow' with 'jina-reranker-v2-base-multilingual@test@302.ai' top 3;
+-------+-----------------+
| index | relevance_score |
+-------+-----------------+
| 0 | 0.74167407 |
| 2 | 0.18832397 |
| 1 | 0.15713684 |
+-------+-----------------+
```
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring