ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-01 05:17:51 +08:00

Author	SHA1	Message	Date
bitloi	d69518ea42	fix(go): guard custom base URL driver creation (#15030 ) ### What problem does this PR solve? Closes #15029. Some custom `base_url` paths in `ModelProviderService` call `NewInstance(newURL)` and then immediately invoke methods on the returned driver. Several real Go model drivers still return `nil` from `NewInstance`, so those paths can panic instead of returning a normal error. This PR: - centralizes custom base URL driver creation in `model_service.go` - skips request-local driver creation when `base_url` is blank or whitespace - preserves the existing region key behavior when building the request-local base URL map - returns a clear error when the provider driver is missing or `NewInstance` returns `nil` - routes list/check/task and active model paths through the guarded helper - adds focused unit coverage for empty-region preservation, regional base URLs, blank base URLs, nil drivers, and nil `NewInstance` results ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Test plan - [x] `git diff --check upstream/main...HEAD` - [x] `/root/go/bin/gofmt -w internal/service/model_service.go internal/service/model_service_test.go` - [x] `GOPATH=/root/gopath GOTOOLCHAIN=local /root/go/bin/go test ./internal/service -run TestNewModelDriverForBaseURL -count=1 -vet=off` - [x] `GOPATH=/root/gopath GOTOOLCHAIN=local /root/go/bin/go build ./internal/service/... ./internal/entity/models/...` Note: the same targeted `go test` command without `-vet=off` is currently blocked by an existing unrelated vet finding in `internal/service/llm.go:355` (`non-constant format string in call to fmt.Errorf`).	2026-05-20 14:58:20 +08:00
Haruko386	2836a934b5	Go: implement provider: 302.AI and JieKou-AI (#15034 ) ### What problem does this PR solve? This PR implement implement provider 302.AI and JieKouAI The following functionalities are now supported: 302.ai - [x] chat / think chat / stream chat / stream think chat - [x] Embedding - [x] ASR - [x] ListModels - [x] Provider connection checking - [x] Balance - [x] Rerank - [x] OCR - [x] Doc Parse - [x] Show task - [ ] ~~List Tasks!~~ - [ ] TTS JieKouAI - [x] chat / think chat / stream chat / stream think chat - [x] Embedding - [x] Rerank - [x] ListModels Verified examples from the CLI: ```palintext # jiekouAI RAGFlow(user)> stream think chat with 'zai-org/glm-4.5@test@jiekouai' message 'Hi' Thinking: Let me think about how to respond to this simple greeting. The user just said "Hi", which is a basic and friendly way to start a conversation. I should respond in a similarly warm and welcoming manner.First, I need to acknowledge their greeting and reciprocate with enthusiasm. Something like "Hello!" or "Hi there!" would work well to create a positive atmosphere right from the start.Next, I should make it clear that I'm ready to help. Since they haven't asked anything specific yet, I'll keep it open-ended and inviting. Perhaps offering assistance with a question or task would encourage them to engage further.I should also maintain a professional yet approachable tone. Being an AI assistant, I want to convey that I'm knowledgeable and capable, but also friendly and easy to talk to.Let me put this all together into a concise response. I'll start with a cheerful greeting, express my readiness to help, and finish with an open invitation for them to share what's on their mind. This should create a welcoming environment for whatever they want to discuss next. Answer: ! I'm Claude, an AI assistant created by Anthropic. I'm here to help you with information, answer questions, or assist you with tasks. What can I help you with today? RAGFlow(user)> think chat with 'zai-org/glm-4.5@test@jiekouai' message 'Hi' Thinking: Let me consider how to respond to this greeting. The user initiated with a simple "Hi," so a friendly and open response would be most appropriate to encourage further conversation. I should maintain a welcoming tone while offering assistance. The response should accomplish a few key things: return the greeting warmly, show openness to conversation, and offer specific ways I can help. This approach demonstrates both approachability and usefulness. I'll start with a greeting in return, then express my availability to help, and finish by suggesting some areas where I can provide assistance. This creates a natural flow from acknowledgment to support. It's important to keep the response concise but inviting. Since the user hasn't specified their needs yet, I'll present a few broad categories of assistance to spark their thinking about what they might want to discuss or ask about. The response should end with an encouraging note that prompts them to share what's on their mind, keeping the conversational ball in their court while making it clear I'm ready to engage with whatever they need. Answer: Hello! How can I help you today? Whether you have questions, need information, or just want to chat, I'm here to assist. RAGFlow(user)> embed text 'walkerwhat' 'jumperwho' with 'text-embedding-3-large@test@jiekouai' dimension 16 +-----------+-------+ \| dimension \| index \| +-----------+-------+ \| 3072 \| 0 \| \| 3072 \| 1 \| +-----------+-------+ RAGFlow(user)> rerank query 'what is rag' document 'rag is retrieval augment generation' 'rag need llm' 'famous rag project includes ragflow' with 'baai/bge-reranker-v2-m3@test@jiekouai' top 3 +-------+-----------------+ \| index \| relevance_score \| +-------+-----------------+ \| 0 \| 0.9830034 \| \| 2 \| 0.06399203 \| \| 1 \| 0.04665664 \| +-------+-----------------+ # 302.ai RAGFlow(user)> think chat with 'kimi-k2.6@test@302.ai' message 'who r u' Thinking: The user is asking "who r u" which is a casual way of asking "who are you." I need to identify myself as an AI assistant created by Moonshot AI. I should be friendly, concise, and helpful. Key points to include: - I am Kimi, an AI assistant made by Moonshot AI - I can help with various tasks like answering questions, writing, analysis, coding, etc. - Keep it casual but informative since the user used "r u" (text speak) I should not: - Pretend to be human - Claim to have personal experiences or emotions - Be overly formal or robotic Simple, friendly response is best. Answer: I'm Kimi, an AI assistant made by Moonshot AI. I can help you with answering questions, writing, coding, analysis, or just chatting. What can I do for you? Time: 17.687750 RAGFlow(user)> stream think chat with 'kimi-k2.6@test@302.ai' message 'who r u' Thinking: user asked "who r u" which is a casual way of asking "who are you." I should introduce myself as Kimi, an AI assistant developed by Moonshot AI. I need to be friendly, concise, and accurate. I should mention my capabilities briefly and keep the tone helpful. Since the user used casual text speak ("r u"), I can match that energy with a friendly but still informative tone.Key points:- I'm Kimi, an AI assistant made by Moonshot AI- I can help with various tasks like answering questions, writing, coding, analysis, etc.- Keep it brief but warm- Don't claim to be human- Don't over-explainDraft:"I'm Kimi, an AI assistant created by Moonshot AI. I can help with answering questions, writing, coding, analysis, brainstorming, and lots of other tasks. What can I do for you?"This is good - direct, accurate, and inviting. Answer: Kimi, an AI assistant made by Moonshot AI. I can help with answering questions, writing, coding, analysis, brainstorming, and lots of other stuff. What can I do for you? Time: 14.912576 RAGFlow(user)> asr with 'whisper-v3-turbo@test@302.ai' audio './internal/test.wav' param '' +---------------------------------------------------------------------------------------------------------------------+ \| text \| +---------------------------------------------------------------------------------------------------------------------+ \| The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired \| +---------------------------------------------------------------------------------------------------------------------+ RAGFlow(user)> ocr with 'mistral-ocr-latest@test@302.ai' file './internal/test.pdf' +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| text \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| # Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation Bingxin Ke Nando Metzger Anton Obukhov Rodrigo Caye Daudt Shengyu Huang Konrad Schindler Photogrammetry and Remote Sensing, ETH Zürich ![img-0.jpeg](img-0.jpeg) Figur... \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ RAGFlow(user)> parse with 'vlm@test@302.ai' file 'https://arxiv.org/pdf/2505.09358' +--------------------------------------+ \| task_id \| +--------------------------------------+ \| 6de6eae6-c122-4b67-91e8-b061a0b8c087 \| +--------------------------------------+ RAGFlow(user)> show 'test@302.ai' task '6de6eae6-c122-4b67-91e8-b061a0b8c087' +----------------------------------------------------------------------------+-------+ \| content \| index \| +----------------------------------------------------------------------------+-------+ \| https://file.302.ai/gpt/imgs/20260519/b340fdff4774699c287fe4ee4658b317.zip \| 0 \| +----------------------------------------------------------------------------+-------+ RAGFlow(user)> embed text 'walkerwhat' 'jumperwho' with 'jina-embeddings-v3@test@302.ai' dimension 16 +-----------+-------+ \| dimension \| index \| +-----------+-------+ \| 1024 \| 0 \| \| 1024 \| 1 \| +-----------+-------+ RAGFlow(user)> rerank query 'what is rag' document 'rag is retrieval augment generation' 'rag need llm' 'famous rag project includes ragflow' with 'jina-reranker-v2-base-multilingual@test@302.ai' top 3; +-------+-----------------+ \| index \| relevance_score \| +-------+-----------------+ \| 0 \| 0.74167407 \| \| 2 \| 0.18832397 \| \| 1 \| 0.15713684 \| +-------+-----------------+ ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-05-20 14:10:15 +08:00
qinling0210	77834870fc	Refact functions in engine in GO (#14981 ) ### What problem does this PR solve? Refact functions in engine in GO ### Type of change - [x] Refactoring	2026-05-19 17:34:59 +08:00
tmimmanuel	243d9ed281	Add TogetherAI chat provider (#14957 ) ## What - Add TogetherAI as a chat provider backed by its OpenAI-compatible `/v1/chat/completions` API - Register TogetherAI in the Go model factory and provider config - Support non-streaming chat, SSE streaming chat, model listing, and connection checks ## Notes - Uses the current TogetherAI OpenAI-compatible base URL `https://api.together.ai/v1` - Forwards documented chat parameters from `ChatConfig`: `max_tokens`, `temperature`, `top_p`, `stop`, and GPT-OSS `reasoning_effort` - Routes Together reasoning traces from `reasoning` / `reasoning_content` into `ReasonContent` ## Tests - `go test -vet=off -run TestTogetherAI -count=1 ./internal/entity/models` - `go test -vet=off -count=1 ./internal/entity/models` Refs #14736	2026-05-19 15:10:42 +08:00
tmimmanuel	09a06f1b00	Go: implement provider: Xinference (#14938 ) ### What problem does this PR solve? Closes #14808. Adds a Go model driver for Xinference so self-hosted Xinference chat models can be used through the Go provider layer instead of falling through to the dummy driver. Xinference exposes an OpenAI-compatible API under `/v1`; the driver accepts either a root endpoint such as `http://127.0.0.1:9997` or an OpenAI-compatible endpoint such as `http://127.0.0.1:9997/v1` and normalizes it before calling chat or model-listing routes. ### What is changed? - Add `internal/entity/models/xinference.go` implementing `ModelDriver` for Xinference chat. - Route provider name `xinference` in `internal/entity/models/factory.go`. - Add `conf/models/xinference.json` as a local provider config. - Add focused unit tests in `internal/entity/models/xinference_test.go`. Initial method coverage: - `ChatWithMessages`: POST `/v1/chat/completions`. - `ChatStreamlyWithSender`: SSE streaming from `/v1/chat/completions`. - `ListModels`: GET `/v1/models`. - `CheckConnection`: lightweight `ListModels` probe. - Optional auth: send `Authorization: Bearer <api_key>` only when a non-empty key is configured, matching Xinference no-auth and auth-enabled deployments. - `Balance`, `Embed`, `Rerank`, ASR, TTS, and OCR return `no such method` for this initial chat-provider PR. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Bug Fix (non-breaking change which fixes an issue) ### Tests - `go test -vet=off -run TestXinference -count=1 ./internal/entity/models/...` - `go test -vet=off -count=1 ./internal/entity/models/...` ### References - Xinference docs: https://inference.readthedocs.io/zh-cn/latest/index.html - OpenAI-compatible chat usage: https://inference.readthedocs.io/zh-cn/latest/getting_started/using_xinference.html - API key auth: https://inference.readthedocs.io/zh-cn/latest/user_guide/auth_system.html --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-19 15:10:13 +08:00
OrbisAI Security	f17a66d4f0	fix: the opencc c library uses fgets() to read dicti... in text.c (#13970 ) ## Summary Fix critical severity security issue in `internal/cpp/opencc/dictionary/text.c`. ## Vulnerability \| Field \| Value \| \|-------\|-------\| \| ID \| V-001 \| \| Severity \| CRITICAL \| \| Scanner \| multi_agent_ai \| \| Rule \| `V-001` \| \| File \| `internal/cpp/opencc/dictionary/text.c:107` \| Description: The OpenCC C library uses fgets() to read dictionary and configuration files without proper bounds validation on subsequent buffer operations. While fgets() itself is bounds-checked, the sprintf() call at config_reader.c:174 constructs file paths by concatenating home_path and filename without verifying the result fits in pkg_filename buffer. An attacker providing malformed OpenCC configuration files with excessively long path components can overflow the fixed-size buffer, overwriting adjacent memory including return addresses and function pointers. ## Changes - `internal/cpp/opencc/config_reader.c` - `internal/cpp/opencc/dictionary/text.c` - `internal/cpp/opencc/utils.c` ## Verification - [x] Build passes - [x] Scanner re-scan confirms fix - [x] LLM code review passed --- Automated security fix by [OrbisAI Security](https://orbisappsec.com) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Bug Fixes * Improved error detection and handling for malformed configuration and dictionary entries during file parsing. * Enhanced memory cleanup in error recovery paths to prevent potential issues. * Strengthened robustness of string operations and buffer handling throughout the library. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Co-authored-by: Ubuntu <ubuntu@ip-172-31-32-15.us-west-2.compute.internal>	2026-05-19 13:55:33 +08:00
tmimmanuel	4c9529ef36	Add Replicate chat provider (#14958 ) ## What - Add Replicate as a chat provider backed by the documented predictions API - Register Replicate in the Go model factory and provider config - Support non-streaming chat through sync predictions, polling fallback, streaming through `urls.stream`, model listing, and connection checks ## Notes - Uses `POST /v1/predictions` with Replicate model identifiers in `version`, which supports official and community model identifiers - Maps RAGFlow messages into Replicate prompt-shaped inputs (`prompt`, optional `system_prompt`) and forwards common documented LLM inputs: `max_new_tokens`, `temperature`, `top_p` - Preserves whitespace in SSE output chunks and emits RAGFlow `[DONE]` at stream completion ## Tests - `go test -vet=off -run TestReplicate -count=1 ./internal/entity/models` - `go test -vet=off -count=1 ./internal/entity/models` Refs #14736	2026-05-19 11:10:36 +08:00
Haruko386	db9e782747	Go: implement provider: MinerU (#14990 ) ### What problem does this PR solve? Implement MinerU Provider The following functionalities are now supported: MinerU ---- - [x] Parse file - [x] Show task - [ ] ~~List tasks~~ Verified examples from the CLI: ```plaintext RAGFlow(user)> parse with 'vlm@test@mineru' file 'https://arxiv.org/pdf/2505.09358' +--------------------------------------+ \| task_id \| +--------------------------------------+ \| 142ac8ea-d9d0-4a68-a2d1-d3af67635dc9 \| +--------------------------------------+ RAGFlow(user)> show 'test@mineru' task '142ac8ea-d9d0-4a68-a2d1-d3af67635dc9' +--------------------------------------------+-------+ \| content \| index \| +--------------------------------------------+-------+ \| Task is running... Progress: 17 / 18 pages \| 0 \| +--------------------------------------------+-------+ RAGFlow(user)> show 'test@mineru' task '142ac8ea-d9d0-4a68-a2d1-d3af67635dc9' +--------------------------------------------------------------------------------------------+-------+ \| content \| index \| +--------------------------------------------------------------------------------------------+-------+ \| https://cdn-mineru.openxlab.org.cn/pdf/2026-05-18/142ac8ea-d9d0-4a68-a2d1-d3af67635dc9.zip \| 0 \| +--------------------------------------------------------------------------------------------+-------+ ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-05-19 10:49:33 +08:00
buua436	41a9fc0030	Go: add dataset graph api (#14984 ) ### What problem does this PR solve? add dataset graph api ### Type of change - [x] Refactoring	2026-05-18 20:02:53 +08:00
buua436	d7fb4bdb4e	Go: align document list response (#14982 ) ### What problem does this PR solve? align document list response ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-18 20:00:11 +08:00
buua436	3290257014	Go: fix forgetting policy validation and fix memory update diff checks (#14976 ) ### What problem does this PR solve? fix forgetting policy validation and fix memory update diff checks ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-18 19:21:47 +08:00
Jake Armstrong	93d3deb5e4	Fix admin CLI system variable commands (#14956 ) ## What Fixes #12409. Implements admin CLI support for: - `list vars;` - `show var <name-or-prefix>;` - `set var <name> <value>;` ## Changes - Wire Go CLI variable commands to the admin API. - Support integer and quoted string values in `SET VAR`. - Return variable rows as `data_type`, `name`, `setting_type`, and `value`. - Add exact-name lookup with prefix fallback for `SHOW VAR`. - Validate values by stored data type: `string`, `integer`, `bool`, and `json`. - Keep the legacy Python admin CLI/server behavior aligned. - Update admin CLI docs and add focused tests. ## Verification - `go test -count=1 ./internal/cli` - `python3.12 -m py_compile admin/server/services.py admin/server/routes.py api/db/services/system_settings_service.py admin/client/parser.py admin/client/ragflow_client.py` - Python admin CLI parser smoke test for `SET VAR`, quoted values, `SHOW VAR`, and `LIST VARS`. - Attempted `./run_go_tests.sh`; local environment is missing native tokenizer/linker artifacts: - `internal/cpp/cmake-build-release/librag_tokenizer_c_api.a` - `-lstdc++` Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-18 19:08:45 +08:00
Haruko386	92145dc764	Go: implement provider: DeepInfra, XunFei (#14978 ) ### What problem does this PR solve? This PR implement implement provider and Mistral, DeepInfra, XunFei The following functionalities are now supported: DeepInfra - [x] chat / think chat / stream chat / stream think chat - [x] Embedding - [x] ASR - [x] TTS - [x] ListModels - [x] Provider connection checking - [x] Balance - [ ] ~~Rerank~~ XunFei - [x] chat / think chat / stream chat / stream think chat ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-05-18 16:57:42 +08:00
buua436	b8ac997606	Go: add restful api route aliases (#14977 ) ### What problem does this PR solve? add restful api route aliases ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-18 16:57:14 +08:00
buua436	b40b0bf996	Go: fix siliconflow embedding response (#14975 ) ### What problem does this PR solve? fix siliconflow embedding response ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-18 15:07:07 +08:00
tmimmanuel	b09da6e347	Go: implement provider: CometAPI (#14930 ) ### What problem does this PR solve? Adds the Go model provider driver for CometAPI, which is listed as unchecked in the Go provider tracking issue #14736 and requested in #14804. Without this, the Go layer falls back to the dummy driver for the `cometapi` provider. Fixes #14804 ### What this PR includes - New `internal/entity/models/cometapi.go` implementing `ModelDriver` for CometAPI. - New `conf/models/cometapi.json` with CometAPI base URLs and representative chat / embedding models from the public catalog. - `factory.go`: route `"cometapi"` to `NewCometAPIModel`. - Unit tests in `internal/entity/models/cometapi_test.go`. ### Method coverage - `ChatWithMessages`: `POST /v1/chat/completions`. - `ChatStreamlyWithSender`: SSE streaming on the same endpoint. - `Embed`: `POST /v1/embeddings`, including optional `dimensions`. - `ListModels`: `GET /api/models` public catalog. - `Balance`: `GET https://query.cometapi.com/user/quota?key=...`. - `CheckConnection`: delegates to the quota query to verify the key. - `Rerank`, ASR, TTS, OCR: return `no such method` for now. No ModelDriver interface change. No new dependencies. ### How was this tested? ```bash go test -vet=off -run TestCometAPI -count=1 ./internal/entity/models/... go test -vet=off -count=1 ./internal/entity/models/... ``` --------- Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Jin Hai <haijin.chn@gmail.com> Signed-off-by: majiayu000 <1835304752@qq.com> Co-authored-by: 加帆 <Jiafan@users.noreply.github.com> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: bulexu <baiheng527@gmail.com> Co-authored-by: xubh <xubh@wikiflyer.cn> Co-authored-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: Carve_ <75568342+Rynzie02@users.noreply.github.com> Co-authored-by: Paul Y Hui <paulhui@seismic.com> Co-authored-by: LIRUI YU <128563231+LiruiYu33@users.noreply.github.com> Co-authored-by: yun.kou <koopking@gmail.com> Co-authored-by: Yun.kou <yunkou@deepglint.com> Co-authored-by: Ahmad Intisar <168020872+ahmadintisar@users.noreply.github.com> Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local> Co-authored-by: chanx <1243304602@qq.com> Co-authored-by: Syed Shahmeer Ali <syedshahmeerali196@gmail.com> Co-authored-by: Octopus <liyuan851277048@icloud.com> Co-authored-by: lif <1835304752@qq.com>	2026-05-18 14:31:16 +08:00
carlos4s	2eba2c4d75	Add Anthropic Go model provider (#14940 ) ### What problem does this PR solve? Adds the missing Anthropic provider implementation for the Go model provider layer. Closes #14939 ### What changed - Add `conf/models/anthropic.json` with Anthropic Claude chat/vision models and API endpoints. - Add `internal/entity/models/anthropic.go` implementing non-streaming Messages API chat, model listing, and connection checking. - Register `anthropic` in the Go model factory. - Add httptest coverage for headers, payload mapping, response parsing, validation errors, provider errors, model listing, connection checking, factory registration, and unsupported methods. ### Notes Streaming chat is left as an explicit `no such method` follow-up because this initial provider focuses on non-streaming chat and connection checking. ### Tests - `docker run --rm -v /home/ubuntu/Documents/gitTensor_repos/carlos/ragflow:/work -v /tmp/ragflow-go-cache:/go/pkg/mod -v /tmp/ragflow-go-build:/root/.cache/go-build -w /work golang:1.25 go test -vet=off ./internal/entity/models -run Anthropic -count=1 -v` - `docker run --rm -v /home/ubuntu/Documents/gitTensor_repos/carlos/ragflow:/work -v /tmp/ragflow-go-cache:/go/pkg/mod -v /tmp/ragflow-go-build:/root/.cache/go-build -w /work golang:1.25 go test -vet=off ./internal/entity -count=1` - `git diff --check` - `jq . conf/models/anthropic.json >/dev/null` Plain `go test ./internal/entity/models` currently hits pre-existing unrelated vet findings in other provider files (`baidu.go`, `cohere.go`, `fishaudio.go`, `openrouter.go`). --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-18 12:03:33 +08:00
Jake Armstrong	fe1433d1ff	Go: add Jina chat completions support (#14935 ) ### What problem does this PR solve? This PR adds non-streaming chat support for the Jina Go model provider. The Jina provider was added with embedding, rerank, model listing, and connection checking, but `ChatWithMessages` still returned a not-implemented error even though Jina exposes an OpenAI-compatible `/v1/chat/completions` endpoint. Closes #14933 The following functionalities are now supported: ### Jina: - [x] Chat - [ ] Stream Chat - [x] Embedding - [x] Rerank - [x] Model listing - [x] Provider connection checking - [ ] Balance ### Implementation details: - Implements `JinaModel.ChatWithMessages` - Sends `Authorization: Bearer <api-key>` and JSON chat completion requests - Validates API key, model name, messages, and configured region before making requests - Forwards supported chat config fields: `max_tokens`, `temperature`, `top_p`, and `stop` - Parses the first chat completion choice into `ChatResponse.Answer` - Adds `jina-ai/jina-vlm` as a chat-capable model in `conf/models/jina.json` - Adds focused unit tests for request construction, auth, response parsing, validation errors, provider errors, and region handling Verification: ```plaintext docker run --rm -v $PWD:/repo -w /repo golang:1.25 sh -c '/usr/local/go/bin/gofmt -w internal/entity/models/jina.go internal/entity/models/jina_test.go && /usr/local/go/bin/go test -vet=off ./internal/entity/models -run TestJina -count=1' ok ragflow/internal/entity/models 0.037s ``` Note: `go test ./internal/entity/models -run TestJina -count=1` currently hits unrelated existing vet findings in other provider files, so the focused Jina tests were run with `-vet=off`. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-18 12:03:12 +08:00
Panda Dev	6794ad2f70	Go: implement Embed (embeddings) in Novita driver (#14895 ) ### What problem does this PR solve? Fixes #14893 The Novita Go driver landed in #14850 and shipped a stub `Embed` method that returned `"novita, no such method"`, so Novita could not be used as an embedding provider in RAGFlow. This PR fills that gap. Novita exposes a public embeddings endpoint at `POST https://api.novita.ai/v3/embeddings` that accepts the standard OpenAI-compatible request shape (`{model, input}`) with `Authorization: Bearer <api_key>`. Two embedding models are documented in Novita's model library: `baai/bge-m3` (multilingual, 8192 tokens) and `baai/bge-large-en-v1.5`. ### Changes - `internal/entity/models/novita.go`: implement `NovitaModel.Embed`. - Validate inputs (api key, model name) and short-circuit on empty texts. - Resolve region with the existing `baseURLForRegion` helper. - Build URL from `URLSuffix.Embedding` (the embeddings path lives under `/v3/`, separate from the chat path under `/openai/v1/`). - Send `{model, input}` POST body, add `dimensions` when `embeddingConfig.Dimension > 0` (matches the pattern in #14735). - Bearer auth + JSON content type, mirroring the chat path. - Parse `{data: [{embedding, index}]}` and reorder by `index`, rejecting out-of-range indices, duplicates, and missing entries so the output always lines up with the input. Same shape as the merged Mistral and Upstage Embed implementations. - `conf/models/novita.json`: - Add `"embedding": "v3/embeddings"` to `url_suffix`. - Add default embedding model entries for `baai/bge-m3` and `baai/bge-large-en-v1.5` so they appear in the model picker. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-18 12:02:28 +08:00
Haruko386	bf41d35729	Go: implement PaddleOCR provider and implement ASR for CoHere (#14954 ) ### What problem does this PR solve? This PR implement implement OCR for Baidu and Mistral, implement PaddleOCR provider and implement ASR for CoHere Verified examples from the CLI: ``` RAGFlow(user)> ocr with 'mistral-ocr-2512@test@mistral' file './internal/text.jpg' +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| text \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| Parallel to these organizational innovations there were significant complementary technical innovations (e.g., improved methods of manufacturing cast-iron pipe and of coating interiors for pressure maintenance, and newer paving and construction material... \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ RAGFlow(user)> ocr with 'paddleocr-vl-0.9b@test@baidu' file './internal/text.jpg' +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| text \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| Parallel to these organizational innovations there were significant complementary technical innovations (e.g., improved methods of manufacturing cast-iron pipe and of coating interiors for pressure maintenance, and newer paving and construction material... \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ # PaddleOCR RAGFlow(user)> ocr with 'PaddleOCR-VL-1.5@test@paddleocr' file './internal/test.pdf' +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| text \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| # Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation Bingxin Ke Nando Metzger Photogra Anton Obukhov Rodrigo Caye Daudt netry and Remote Sensing, Shengyu Huang Konrad Schindler ETH Zürich <div style="text-align: c... \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ # Cohere RAGFlow(user)> asr with 'cohere-transcribe-03-2026@test@cohere' audio './internal/test.wav' param '{"language": "en"}' +-----------------------------------------------------------------------------------------------------------------------+ \| text \| +-----------------------------------------------------------------------------------------------------------------------+ \| The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired. \| +-----------------------------------------------------------------------------------------------------------------------+ ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-05-15 18:41:43 +08:00
Haruko386	c2863173b0	Go: implement TTS, ASR for Siliconflow and TTs for StepFun (#14944 ) ### What problem does this PR solve? This PRimplement TTS, ASR for Siliconflow and TTs for StepFun The following functionalities are now supported: SiliConFlow: - [x] Text To Speech - [x] Audio To Text - [x] Stream Audio To Text StrepFun: - [x] Audio To Text - [x] Stream Audio To Text Verified examples from the CLI: ```plaintext # SiliconFlow RAGFlow(user)> tts with 'FunAudioLLM/CosyVoice2-0.5B@test@Siliconflow' text 'hello? show yourself' play format 'wav' param '{"voice": "fnlp/MOSS-TTSD-v0.5:alex"}' SUCCESS RAGFlow(user)> asr with 'FunAudioLLM/SenseVoiceSmall@test@siliconflow' audio './internal/test.wav' param '' +----------------------------------------------------------------------------------------------------------------------+ \| text \| +----------------------------------------------------------------------------------------------------------------------+ \| The examination and testimony of the experts enabled the commission to conclude that five shots may have been fired. \| +----------------------------------------------------------------------------------------------------------------------+ RAGFlow(user)> stream asr with 'FunAudioLLM/SenseVoiceSmall@test@siliconflow' audio './internal/test.wav' param '' +----------------------------------------------------------------------------------------------------------------------+ \| text \| +----------------------------------------------------------------------------------------------------------------------+ \| The examination and testimony of the experts enabled the commission to conclude that five shots may have been fired. \| +----------------------------------------------------------------------------------------------------------------------+ ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2026-05-15 14:03:33 +08:00
Jin Hai	335dd5a263	Go: add cli command, list dataset documents (#14948 ) ### What problem does this PR solve? ``` +---------------------+----------------------------------+-------------+-----------------+---------+--------+------+ \| created_at \| id \| meta_fields \| name \| size \| status \| type \| +---------------------+----------------------------------+-------------+-----------------+---------+--------+------+ \| 2026-05-08 19:35:08 \| f6aa38bb4ad111f1ba6338a74640adcc \| map[] \| abc.pdf \| 3387987 \| 1 \| pdf \| +---------------------+----------------------------------+-------------+-----------------+---------+--------+------+ ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-15 14:00:45 +08:00
Hunnyboy1217	86bcf9767d	Go: implement Rerank in vLLM driver (#14878 ) (#14880 ) ### What problem does this PR solve? Closes #14878. `VllmModel.Rerank()` in [internal/entity/models/vllm.go:551](internal/entity/models/vllm.go#L551) is currently a stub returning `nil, fmt.Errorf("%s, Rerank not implemented", z.Name())`, and [conf/models/vllm.json](conf/models/vllm.json) is missing a `rerank` entry in `url_suffix`. Chat (long-standing) and embeddings (#14688) already work, so rerank is the last missing leg of the retrieval pipeline for operators running everything on a single self-hosted vLLM server — today they have to point rerank at a different provider, which defeats the point of a fully local deployment. Upstream vLLM has supported a Jina/Cohere-compatible `POST /v1/rerank` endpoint since v0.7 ([vllm-project/vllm#12376](https://github.com/vllm-project/vllm/pull/12376)). The request/response shape is essentially identical to the NVIDIA driver landed in #14778, so this PR mirrors that structure with two vLLM-specific adjustments. This PR replaces the stub with a real implementation against vLLM's `/v1/rerank`: - `POST {baseURL}/rerank` - Request body: `{"model": "<modelName>", "query": "<query>", "documents": [...], "top_n": <int>}` — documents are a flat `[]string`, not wrapped as `{text: "..."}` like NVIDIA's `/ranking`. - Response body: `{"results": [{"index": int, "relevance_score": float}, ...]}` (Jina-compatible; the optional `document` field is ignored since callers reconstruct text via `Index`). - `Authorization: Bearer <ApiKey>` is set only when `APIConfig.ApiKey` is non-empty, matching the existing `Embed`/`ListModels` behaviour in this file. vLLM is a local driver and can be deployed without an API key. The return shape matches the existing `RerankResponse` contract used by the NVIDIA ([nvidia.go:461](internal/entity/models/nvidia.go#L461)), Aliyun ([aliyun.go:507](internal/entity/models/aliyun.go#L507)), and ZhipuAI ([zhipu-ai.go:554](internal/entity/models/zhipu-ai.go#L554)) drivers, i.e. `Data []RerankResult` carrying `{Index, RelevanceScore}` in the API's ranking order. Callers that need original-input order sort by `Index`. Behaviour requirements from the issue, all covered: 1. Empty `documents` → returns `&RerankResponse{}` without an HTTP call. 2. Missing `modelName` → `"model name is required"` validation error. 3. `rerankConfig.TopN` honored when `0 < TopN < len(documents)`; otherwise `top_n` defaults to `len(documents)` so callers get a score per input. 4. Non-200 responses return an error including upstream status and body (`"vLLM rerank API error: <status>, body: <body>"`). 5. Response `index` values are bounds-checked against `len(documents)`. Scope:* - [internal/entity/models/vllm.go](internal/entity/models/vllm.go) — replaces the `Rerank` stub at line 551 with a real implementation; adds `vllmRerankRequest`/`vllmRerankResponse` types for the slim subset of the payload we need. Region/baseURL resolution, 30s context timeout, conditional bearer header, and error wrapping all follow the existing patterns in this file. - [conf/models/vllm.json](conf/models/vllm.json) — adds `"rerank": "rerank"` to `url_suffix`, joined to the operator-configured vLLM base URL the same way the NVIDIA driver joins at [nvidia.go:485](internal/entity/models/nvidia.go#L485). - [internal/entity/models/vllm_rerank_test.go](internal/entity/models/vllm_rerank_test.go) — adds 7 `httptest`-backed tests mirroring `nvidia_rerank_test.go`: happy path (out-of-order ranking → Index preservation), `top_n` clamp to `RerankConfig.TopN`, empty-documents short-circuit, missing-model-name validation, HTTP error propagation, out-of-range index rejection, and a vLLM-specific `TestVllmRerankWithoutAPIKey` locking in the optional-auth behaviour that distinguishes this driver from NVIDIA. Out of scope: no interface change, no DDL, no frontend change. Chat, embeddings, and balance paths are untouched. No new user-facing docs required beyond the existing rerank model setup page — vLLM joins the list of providers whose rerank model can be selected once `/v1/rerank` is exposed by the server. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-15 13:27:22 +08:00
Jin Hai	3a5df08c76	Go: add file parse command (#14892 ) ### What problem does this PR solve? ``` RAGFlow(user)> ocr with 'hunyuanocr@test@gitee' file './picture.png' +----------------------------------------------------------+ \| text \| +----------------------------------------------------------+ \| 生活不是等待风暴过去，而是学会在雨中翩翩起舞。 ——佚名 \| +----------------------------------------------------------+ RAGFlow(user)> list 'test@gitee' tasks; +---------+----------------------------------+ \| status \| task_id \| +---------+----------------------------------+ \| success \| C3FX4MQNKY5MGC6ZFMIXIAMJKHCEBQB5 \| +---------+----------------------------------+ RAGFlow(user)> show 'test@gitee' task 'C3FX4MQNKY5MGC6ZFMIXIAMJKHCEBQB5'; +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ \| content \| index \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ \| # PDF 1: Purpose of RAGFlow RAGFlow is an open source Retrieval-Augmented Generation (RAG) engine designed to turn raw documents into reliable context for large language models.Its purpose is to make it practical to build an Al assistant that can ans... \| 1 \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-15 12:29:52 +08:00
Haruko386	106f4b777e	Go: implement TTS for fishaudio, openrouter and asr for fishaudio (#14926 ) ### What problem does this PR solve? This PR implement TTS for FishAudio and MiniMax provider and ASR for FishAudio The following functionalities are now supported: FishAudio: - [x] Text To Speech - [x] Stream Text To Speech - [x] Audio To Text OpenRouter: - [x] Text To Speech Verified examples from the CLI: ```plaintext FishAudio RAGFlow(user)> tts with 's1@test@fishaudio' text 'He who desires but acts not, breeds pestilence.' play format 'wav' save './internal' param '{"reference_id": "90e65eaaf50e4470b8e6d43ee6afd7d5", "temperature": 0.7, "top_p": 0.7, "prosody": {"speed": 1, "volume": 0, "normalize_loudness": true}, "chunk_length": 300, "normalize": true, "sample_rate": 44100, "mp3_bitrate": 128, "latency": "normal", "max_new_tokens": 1024, "repetition_penalty": 1.2, "min_chunk_length": 50, "condition_on_previous_chunks": true, "early_stop_threshold": 1}' Saved to directory: /home/infiniflow/Documents/development/ragflow/internal/s1_output.wav SUCCESS RAGFlow(user)> stream tts with 's1@test@fishaudio' text 'He who desires but acts not, breeds pestilence.' play format 'wav' save './internal' param '{"reference_id": "90e65eaaf50e4470b8e6d43ee6afd7d5", "temperature": 0.7, "top_p": 0.7, "prosody": {"speed": 1, "volume": 0, "normalize_loudness": true}, "chunk_length": 300, "normalize": true, "sample_rate": 44100, "mp3_bitrate": 128, "latency": "normal", "max_new_tokens": 1024, "repetition_penalty": 1.2, "min_chunk_length": 50, "condition_on_previous_chunks": true, "early_stop_threshold": 1}' Saved to directory: /home/infiniflow/Documents/development/ragflow/internal/s1_output.wav SUCCESS RAGFlow(user)> asr with 'transcribe-1@test@fishaudio' audio './internal/test.wav' param '{"language": "en", "ignore_timestamps": true}' +----------------------------------------------------------------------------------------------------------------------+ \| text \| +----------------------------------------------------------------------------------------------------------------------+ \| The examination and testimony of the experts enabled the commission to conclude that five shots may have been fired. \| +----------------------------------------------------------------------------------------------------------------------+ ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-05-14 18:58:00 +08:00
buua436	3c68ad03be	Go: update user settings fields (#14918 ) ### What problem does this PR solve? update user settings fields ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-14 14:47:15 +08:00
buua436	0450400efd	Go: fix LastLoginTime update (#14917 ) ### What problem does this PR solve? fix LastLoginTime update ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-14 14:46:39 +08:00
buua436	f0122179dd	GO: align time units with Python and centralize timestamp injection in BaseModel (#14875 ) ### What problem does this PR solve? align time units with Python and centralize timestamp injection in BaseModel ### Type of change - [x] Refactoring	2026-05-14 13:46:46 +08:00
Haruko386	ef46005ef1	Go: implement TTS for MiniMax provider and CLI testing for TTS (#14911 ) ### What problem does this PR solve? This PR implement TTS for MiniMax provider and CLI testing for TTS The following functionalities are now supported: MiniMax: - [x] Chat / Stream Chat - [x] Embedding - [x] Rerank - [x] Model listing - [x] Provider connection checking - [x] Text To Speech - [ ] OCRFile - [ ] ~~Audio To Text~~ - [ ] ~~Balance~~ Verified examples from the CLI: ```plaintext RAGFlow(user)> tts with 'speech-2.8-hd@test@minimax' text 'He who desires but acts not, breeds pestilence.' play format 'wav' save './internal' param '{"voice_setting": {"voice_id": "English_radiant_girl", "speed": 1, "vol": 1, "pitch": 0}, "audio_setting": {"sample_rate": 32000, "bitrate": 128000, "format": "wav", "channel": 1}, "output_format": "hex"}' Saved to directory: /home/infiniflow/Documents/development/ragflow/internal/speech-2.8-hd_output.wav SUCCESS RAGFlow(user)> stream tts with 'speech-2.8-hd@test@minimax' text 'He who desires but acts not, breeds pestilence.' play format 'wav' save './internal' param '{"voice_setting": {"voice_id": "English_radiant_girl", "speed": 1, "vol": 1, "pitch": 0}, "audio_setting": {"sample_rate": 32000, "bitrate": 128000, "format": "wav", "channel": 1}, "output_format": "hex"}' Saved to directory: /home/infiniflow/Documents/development/ragflow/internal/speech-2.8-hd_output.wav SUCCESS ``` Set `Play` to play audio in CLI Set `Save` `PATH_TO_SAVE` to save file Set `format` to save file in wav or mp3 Set `Param` align with official request body ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-14 13:19:31 +08:00
tmimmanuel	cb01529d8b	Go: implement provider: Voyage AI (#14811 ) ### What problem does this PR solve? Add a Go driver for Voyage AI (https://voyageai.com), one of the unchecked providers on the umbrella tracking issue #14736. Voyage AI is embed + rerank only — no chat, no streaming, no `/v1/models` endpoint. It's the first provider in the Go layer of this shape. Until this PR, a tenant who configured `voyage` as a model provider in the Go layer fell through to the default branch of `internal/entity/models/factory.go` and got the dummy driver. ### What this PR includes - New `internal/entity/models/voyage.go` with a `VoyageModel` implementing the `ModelDriver` interface. - New `conf/models/voyage.json` with 6 embedding models (`voyage-3.5`, `voyage-3.5-lite`, `voyage-3-large`, `voyage-code-3`, `voyage-law-2`, `voyage-finance-2`) and 2 rerank models (`rerank-2`, `rerank-2-lite`). - `factory.go`: route `"voyage"` to `NewVoyageModel`. - `internal/entity/models/voyage_test.go`: 19 unit tests. ### How the driver works - Embed: `POST /v1/embeddings`. Response is OpenAI-shaped (`{data: [{embedding, index, object, text}], model, usage}`). Driver reorders by `index`, rejects duplicate / out-of-range / missing slots, and short-circuits empty input without an HTTP call. - Rerank: `POST /v1/rerank`. Voyage uses `top_k` as the request param name (not `top_n` like Aliyun/SiliconFlow); the driver translates `RerankConfig.TopN` → `top_k`. Response is Cohere-shaped (`{data: [{relevance_score, index}], model}`), so the existing `RerankResponse{Data: []RerankResult{Index, RelevanceScore}}` shape fits cleanly. - `ListModels`: returns a hardcoded list of `voyageKnownModels`. Voyage does not expose `/v1/models` (probed live, returns 404), so the driver synthesizes the list from the same set the config ships. New upstream models are added by extending one slice. - `CheckConnection`: pings a 1-input embed call against `voyage-3.5`. Without `/v1/models`, this is the cheapest way to verify the API key + network path before a tenant tries a real workload. - `ChatWithMessages` / `ChatStreamlyWithSender` / `Balance` / `TranscribeAudio` / `AudioSpeech` / `OCRFile`: all return `"no such method"`. Voyage does not host any of these surfaces. No interface change. No new dependencies. ### How was this tested? 19 unit tests in `internal/entity/models/voyage_test.go` — all pass on go 1.25: ``` $ go test -vet=off -run TestVoyage -count=1 ./internal/entity/models/... ok ragflow/internal/entity/models 0.036s ``` Coverage: Name; Embed (happy path, reorder, empty-input, missing key/model, duplicate index, out-of-range index, missing slot); Rerank (happy path with `top_k` assertion, default-to-len-documents, empty documents, out-of-range index); ListModels (static list, missing key); CheckConnection (happy, 401); chat methods sentinels; Balance sentinel; audio/OCR sentinels. `go build ./internal/entity/models/...` exits 0. Live integration test against `api.voyageai.com`: ``` === RUN TestVoyageLiveSmoke [OK] Name() = "voyage" [OK] ListModels (static): 8 models -> [voyage-3.5 voyage-3.5-lite voyage-3-large voyage-code-3 voyage-law-2 voyage-finance-2 rerank-2 rerank-2-lite] [OK] CheckConnection [OK] Embed vectors=3 dim=1024 indices=[0 1 2] [OK] Embed(empty) -> 0 vectors [OK] Rerank results=3 scores=[0.8125 0.59765625 0.39453125] [OK] ChatWithMessages returns voyage, no such method [OK] Balance returns voyage, no such method VOYAGE LIVE SMOKE PASSED --- PASS: TestVoyageLiveSmoke (0.81s) ``` What the live run proves on the wire: - Auth (`Bearer <key>`) accepted by `api.voyageai.com`. - Embed `voyage-3.5` on 3 inputs returns 3 vectors at dim 1024 with `index` field preserved as `[0, 1, 2]` — the reorder-by-index code is exercised on real data. - Empty input short-circuits without an HTTP call (mock server would have been hit if it did). - Rerank `rerank-2` on 3 docs returns 3 real `relevance_score` floats `[0.8125, 0.598, 0.395]`. The `top_k` translation works on the live wire. - All sentinel methods return the documented `"no such method"` strings. ### Note on PR history This branch was previously named for LocalAI Embed work which is now consolidated into PR #14813. The branch was reset to `upstream/main` and rebuilt for Voyage. Diff against `main` is a clean +838 lines across 4 files. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Tracking: #14736 --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-14 09:46:54 +08:00
Jin Hai	30d1c1dc28	Fix go compilation (#14900 ) ### What problem does this PR solve? As title ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-13 20:05:56 +08:00
tmimmanuel	0a4b733b2a	Go: implement Rerank in LocalAI driver (#14813 ) ### What problem does this PR solve? The LocalAI Go driver landed in #14809 and Embed landed in #14811. `Rerank` was left as a stub that returns `"not implemented"`. This PR fills the gap. LocalAI exposes a public rerank endpoint at `<tenant-url>/v1/rerank` with a Cohere-shaped request and response (`{model, query, documents, top_n}` → `{results: [{index, relevance_score}]}`). The Python side has had `LocalAIRerank` in `rag/llm/rerank_model.py` for a long time. Until this PR, a tenant who wanted to use LocalAI for reranking in the Go layer got `"not implemented"`. ### What this PR includes - `conf/models/localai.json`: add `"rerank": "rerank"` under `url_suffix` so the driver can build the URL from config. This matches the `URLSuffix.Rerank` field already used by aliyun and siliconflow. - `internal/entity/models/localai.go`: replace the `Rerank` stub with a real implementation that POSTs to `/v1/rerank`. Adds local request/response types `localAIRerankRequest` and `localAIRerankResponse`. No factory change. No interface change. ### How the implementation works - Validate the model name and resolve the tenant-supplied base URL with the existing `resolveBaseURL` helper. - Wrap the request with `context.WithTimeout(nonStreamCallTimeout)` so the call has a clear deadline. Same pattern `ChatWithMessages`, `ListModels`, and `Embed` already use in this file. - Only set the `Authorization` header when a non-empty API key was supplied. LocalAI accepts an empty key by default, so this preserves the optional-auth contract. - Default `top_n` to `len(documents)` when `rerankConfig.TopN == 0`, matching the existing Aliyun and SiliconFlow rerank implementations. - Validate every `results[].index` against `len(documents)`. If the upstream returns an out-of-range index, fail clearly instead of silently writing past the slice. - An empty `documents` slice returns `&RerankResponse{}` with no HTTP call. - Non-200 responses propagate the upstream status line and body. ### Note on stacking This PR builds on #14809 (LocalAI driver) and #14811 (LocalAI Embed). Until both merge, this PR's diff on GitHub will include all three commits. After #14809 and #14811 land on `main`, GitHub will auto-reduce this PR to only the `Rerank` changes (one commit, ~99 line diff in `localai.go` plus 1 line in `localai.json`). ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - `go build ./internal/entity/models/...` returns exit 0 on go 1.25 (the `go.mod` minimum). - The full method set on `LocalAIModel` still matches the `ModelDriver` interface. - Pattern parity with the existing Aliyun Rerank (`internal/entity/models/aliyun.go`) and SiliconFlow Rerank (`internal/entity/models/siliconflow.go`) implementations. Closes #14812 Depends on #14809, #14811 Tracking: #14736 Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-13 19:35:19 +08:00
Jin Hai	b18640d228	Go: fix OCR command (#14891 ) ### What problem does this PR solve? RAGFlow(user)> ocr with 'hunyuanocr@test@gitee' file './picture.png' +----------------------------------------------------------+ \| text \| +----------------------------------------------------------+ \| 生活不是等待风暴过去，而是学会在雨中翩翩起舞。 ——佚名 \| +----------------------------------------------------------+ ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-13 17:29:53 +08:00
Panda Dev	bf90c8948a	Go: implement ListModels in ZhipuAI driver (#14886 ) ### What problem does this PR solve? Fixes #14884 The ZhipuAI Go driver in `internal/entity/models/zhipu-ai.go` had a stub `ListModels` method that always returned `"zhipu-ai, no such method"`. The DeepSeek, Gitee, NVIDIA, OpenAI, SiliconFlow, and OpenRouter drivers in the same package already implement `ListModels` against the OpenAI-compatible `/models` endpoint, and the model picker UI relies on it. This PR brings ZhipuAI in line with that pattern. ### Changes - `internal/entity/models/zhipu-ai.go`: implement `ZhipuAIModel.ListModels`. - Resolve region with default fallback. - GET `${BaseURL[region]}/${URLSuffix.Models}` (resolves to `https://open.bigmodel.cn/api/paas/v4/models` with the default region). - Send `Authorization: Bearer <api_key>` when an API key is configured. Omit the header when the key is empty, so an unauthenticated caller gets a clear `401` from upstream. - Surface non-200 responses with the upstream status line and body, matching the other Go drivers. - Parse the response via the package-level `DSModelList` / `DSModel` types already used by DeepSeek, Gitee, and SiliconFlow. - When the response includes `owned_by`, render the entry as `id@owned_by`, matching the convention of Gitee and SiliconFlow. - `conf/models/zhipu-ai.json`: add `"models": "models"` to `url_suffix`. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-13 16:39:14 +08:00
tmimmanuel	8b53960819	Go: implement provider: LongCat (#14809 ) ### What problem does this PR solve? Add a Go driver for LongCat (Meituan, https://longcat.chat), one of the unchecked providers on the umbrella tracking issue #14736. LongCat exposes an OpenAI-compatible REST API at `https://api.longcat.chat/openai/v1` with three public chat models including `LongCat-Flash-Thinking`, a reasoning model that returns chain-of-thought in `reasoning_content` (OpenAI o-series shape). Until this PR, a tenant who configured `longcat` as a model provider in the Go layer fell through to the default branch of `internal/entity/models/factory.go` and got the dummy driver. ### What this PR includes - New `internal/entity/models/longcat.go` with a `LongCatModel` implementing the `ModelDriver` interface. - New `conf/models/longcat.json` with the 3 public chat models (Flash-Chat, Flash-Lite, Flash-Thinking) and `url_suffix` for `chat` and `models`. - `factory.go`: route `"longcat"` to `NewLongCatModel`. Method coverage: - `ChatWithMessages`: `POST /openai/v1/chat/completions`, non-streaming - `ChatStreamlyWithSender`: SSE stream against the same endpoint - `ListModels` / `CheckConnection`: `GET /openai/v1/models` - Reasoning extraction: `message.reasoning_content` (non-stream) and `delta.reasoning_content` (stream) flow into `ChatResponse.ReasonContent` / the sender's second arg. Matches the OpenAI o-series convention also used by kimi-k2.6 and DeepSeek-R1. - `reasoning_effort` propagation: `ChatConfig.Effort` → request body `reasoning_effort` (LongCat-Flash-Thinking honors it; non-reasoning models ignore it). - `Embed` / `Rerank` / `Balance` / `TranscribeAudio` / `AudioSpeech` / `OCRFile` return `"no such method"` (LongCat does not expose any of these surfaces). No interface change. No new dependencies. ### How was this tested? 21 unit tests in `internal/entity/models/longcat_test.go` — all pass: ``` $ go test -vet=off -run TestLongCat -count=1 -v ./internal/entity/models/... === RUN TestLongCatName --- PASS: TestLongCatName (0.00s) === RUN TestLongCatChatHappyPath --- PASS: TestLongCatChatHappyPath (0.00s) === RUN TestLongCatChatExtractsReasoningContent --- PASS: TestLongCatChatExtractsReasoningContent (0.00s) === RUN TestLongCatChatPropagatesReasoningEffort --- PASS: TestLongCatChatPropagatesReasoningEffort (0.00s) === RUN TestLongCatChatOmitsReasoningEffortWhenUnset --- PASS: TestLongCatChatOmitsReasoningEffortWhenUnset (0.00s) === RUN TestLongCatChatRequiresAPIKey --- PASS: TestLongCatChatRequiresAPIKey (0.00s) === RUN TestLongCatChatRequiresMessages --- PASS: TestLongCatChatRequiresMessages (0.00s) === RUN TestLongCatChatRejectsHTTPError --- PASS: TestLongCatChatRejectsHTTPError (0.00s) === RUN TestLongCatStreamHappyPath --- PASS: TestLongCatStreamHappyPath (0.00s) === RUN TestLongCatStreamExtractsReasoningContent --- PASS: TestLongCatStreamExtractsReasoningContent (0.00s) === RUN TestLongCatStreamRejectsExplicitFalse --- PASS: TestLongCatStreamRejectsExplicitFalse (0.00s) === RUN TestLongCatStreamRequiresSender --- PASS: TestLongCatStreamRequiresSender (0.00s) === RUN TestLongCatStreamFailsWithoutTerminal --- PASS: TestLongCatStreamFailsWithoutTerminal (0.00s) === RUN TestLongCatListModelsHappyPath --- PASS: TestLongCatListModelsHappyPath (0.00s) === RUN TestLongCatListModelsRequiresAPIKey --- PASS: TestLongCatListModelsRequiresAPIKey (0.00s) === RUN TestLongCatCheckConnectionDelegatesToListModels --- PASS: TestLongCatCheckConnectionDelegatesToListModels (0.00s) === RUN TestLongCatEmbedReturnsNoSuchMethod --- PASS: TestLongCatEmbedReturnsNoSuchMethod (0.00s) === RUN TestLongCatRerankReturnsNoSuchMethod --- PASS: TestLongCatRerankReturnsNoSuchMethod (0.00s) === RUN TestLongCatBalanceReturnsNoSuchMethod --- PASS: TestLongCatBalanceReturnsNoSuchMethod (0.00s) === RUN TestLongCatAudioOCRReturnNoSuchMethod --- PASS: TestLongCatAudioOCRReturnNoSuchMethod (0.00s) PASS ok ragflow/internal/entity/models 0.020s ``` `go build ./internal/entity/models/...` exits 0 on go 1.25. Live integration test against `api.longcat.chat`: ``` === RUN TestLongCatLiveSmoke [OK] Name() = "longcat" [OK] CheckConnection [OK] ListModels: 5 models -> [LongCat-Flash-Lite LongCat-Flash-Chat LongCat-Flash-Thinking-2601 LongCat-Flash-Omni-2603 LongCat-2.0-Preview] [OK] Chat (Flash-Chat) answer="Got it! Let me know if you" reason="" [OK] Chat (Flash-Thinking) answer len=443 head="To find 15 % of 80, follow these steps:\n\n1. **Convert the percentage to a frac..." ReasonContent len=557 head="The user asks: \"15% of 80?\" They want step by step reasoning and final answer in \\boxed{}. So we need to compute 15% of ..." [OK] Stream content: 78 chunks, 351 chars [OK] Stream reasoning: 107 chunks, 537 chars [OK] Balance returns longcat, no such method [OK] Embed returns longcat, no such method [OK] Rerank returns longcat, no such method LONGCAT LIVE SMOKE PASSED --- PASS: TestLongCatLiveSmoke (31.01s) ``` What the live run proves on the wire: - Auth header (`Bearer <key>`) is accepted by `api.longcat.chat`. - `/openai/v1/models` parser handles the real 5-model response (note: live API returns versioned aliases `LongCat-Flash-Thinking-2601`, `LongCat-Flash-Omni-2603`, `LongCat-2.0-Preview` plus the un-versioned `LongCat-Flash-Chat` and `LongCat-Flash-Lite`). - Non-stream chat against `LongCat-Flash-Chat`: visible answer parses correctly, `ReasonContent` correctly empty. - Non-stream chat against `LongCat-Flash-Thinking`: 443-char answer flows into `Answer`, 557-char chain-of-thought flows into `ReasonContent` via the new `message.reasoning_content` extraction. - Streaming chat against `LongCat-Flash-Thinking`: 107 reasoning chunks (537 chars) reach the sender's second arg via `delta.reasoning_content`; 78 content chunks (351 chars) reach the first arg. Before this code, the reasoning chunks would have been silently dropped. - All sentinel methods (Balance, Embed, Rerank, audio/OCR) return the documented `"no such method"` strings. ### Note on PR history This branch was previously named for LocalAI work which is now consolidated into PR #14813. The branch was reset to `upstream/main` and rebuilt for LongCat. The diff against `main` is a clean +969 lines across 4 files. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Tracking: #14736 --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-13 16:27:56 +08:00
tmimmanuel	d63d3bb7d2	Go: implement provider: Novita.ai (#14850 ) ### What problem does this PR solve? Add a Go driver for Novita.ai (https://novita.ai), one of the unchecked providers on the umbrella tracking issue #14736. Novita exposes an OpenAI-compatible REST API at `https://api.novita.ai/v3/openai` and proxies a large catalog of third-party models (DeepSeek, Llama, Qwen3, Kimi, Gemma, Mistral, MiniMax, GLM, etc.) behind a single OpenAI-shaped surface — 102 models live at the time of writing. Until this PR, a tenant who configured `novita` as a model provider in the Go layer fell through to the default branch of `internal/entity/models/factory.go` and got the dummy driver. ### What this PR includes - New `internal/entity/models/novita.go` with a `NovitaModel` implementing the `ModelDriver` interface (~520 lines). - New `conf/models/novita.json` with 7 representative chat models (DeepSeek-V4, Llama-3.3-70B, Qwen3-30B/235B reasoning, Kimi-K2, Gemma-3-27B, Mistral-Nemo). - `factory.go`: route `"novita"` to `NewNovitaModel`. - `internal/entity/models/novita_test.go`: 23 unit tests. ### Notable design point: `<think>...</think>` reasoning extraction Novita-routed reasoning models like `qwen3-` and `deepseek-r1-` embed their chain-of-thought inline inside content as `<think>...</think>` tags, rather than in a separate `reasoning_content` field. Verified live by probing `api.novita.ai`: ``` content head 200: <think> Okay, let's see. I need to find 15% of 80. Hmm, percentages can sometimes be tricky, but I think content tail 100: h, that works. Alternatively, 0.15 × 80. If I move the decimal two places to the left for </think> ``` Without handling, a tenant picking qwen3 via Novita would see raw `<think>` tags in their UI answer — different from every other reasoning provider in the Go layer. The driver detects those tags and routes the inner text to `ChatResponse.ReasonContent` (non-stream) or the sender's second arg (stream), keeping the visible answer clean of tag clutter: - `splitNovitaThink` — scans a complete content string. Used by the non-streaming path. Handles multiple `<think>` blocks, unclosed tags (the model got cut off mid-reasoning), pure-text content with no tags. - `novitaThinkExtractor` — stateful streaming version. Buffers trailing bytes that might be the start of a tag (e.g. `<thi` held back when the next chunk completes `nk>`), then emits segments in routing order so callers can pipe them to a UI. Tested with byte-level chunk boundaries and tag-spanning scenarios. ### Method coverage \| Method \| Behavior \| \|---\|---\| \| `ChatWithMessages` \| `POST /v3/openai/chat/completions`, `<think>` extraction on response \| \| `ChatStreamlyWithSender` \| SSE stream, stateful `<think>` extraction across deltas \| \| `ListModels` / `CheckConnection` \| `GET /v3/openai/models` (102 live) \| \| `Embed` / `Rerank` / `Balance` / `TranscribeAudio` / `AudioSpeech` / `OCRFile` \| `"no such method"` — Novita's OpenAI-compatible surface does not expose any \| No interface change. No new dependencies. ### How was this tested? 23 unit tests in `internal/entity/models/novita_test.go` — all pass: ``` $ go test -vet=off -run "TestNovita\|TestSplitNovita" -count=1 ./internal/entity/models/... ok ragflow/internal/entity/models 0.020s ``` Coverage: - `splitNovitaThink` (5 cases: pure text, single block, leading text, multiple blocks, unclosed tag) - `novitaThinkExtractor` (6 cases: single-chunk, opening tag span, closing tag span, byte-level chunking, no tags, lone `<` not as tag start) - `ChatWithMessages`: pure text, with `<think>` tags, missing API key, empty messages, HTTP error - `ChatStreamlyWithSender`: tag-stripping with spanning deltas, pure content, sender-required, stream-true-required - `ListModels` / `CheckConnection` (happy paths) - All sentinel methods `go build ./internal/entity/models/...` exits 0 on go 1.25. Live integration test against `api.novita.ai/v3/openai`: ``` === RUN TestNovitaLiveSmoke [OK] Name() = "novita" [OK] CheckConnection [OK] ListModels: 102 models (showing first 6) [deepseek/deepseek-v4-pro deepseek/deepseek-v4-flash deepseek/deepseek-v3.2 xiaomimimo/mimo-v2.5-pro moonshotai/kimi-k2.6 zai-org/glm-5.1] [OK] Chat (llama-3.3) answer="ok" reason="" [OK] Chat (qwen3) answer len=0 head="" ReasonContent len=1657 head="Okay, so I need to figure out what 15% of 80 is. Hmm, percentages can sometimes trip me up, but let ..." [OK] Stream content: 0 chunks, 0 chars; reasoning: 600 chunks, 1667 chars [OK] Embed/Rerank/Balance/TranscribeAudio/AudioSpeech/OCRFile all return "novita, no such method" NOVITA LIVE SMOKE PASSED --- PASS: TestNovitaLiveSmoke (26.18s) ``` What the live run proves on the wire: - Auth (`Bearer <key>`) accepted by `api.novita.ai`. - `/v3/openai/models` parser handles the real 102-model response. - Non-stream chat against `meta-llama/llama-3.3-70b-instruct`: clean string answer, empty ReasonContent (non-reasoning model, pure-text path). - Non-stream chat against `qwen/qwen3-30b-a3b-fp8`: 1657-char reasoning extracted from `<think>...</think>` and routed to `ChatResponse.ReasonContent`. Visible answer is 0 chars in this run because qwen3 spent its 600-token budget entirely on reasoning before reaching the answer phase — that's the model's behavior, not a driver bug. The important thing: no `<think>` tags leaked into the visible Answer field. - Streaming against qwen3: 600 reasoning chunks (1667 chars) emitted via the sender's 2nd arg across SSE deltas; no `<think>` tag fragments leaked into the content channel despite tag boundaries crossing chunk boundaries on the wire. - All 6 sentinel methods return the documented `"no such method"` strings. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Tracking: #14736	2026-05-13 14:10:50 +08:00
Joseff	733d75d6a7	Fix(Go): make Baidu Encode fail loudly on malformed responses (#14721 ) ### What problem does this PR solve? The Baidu (Qianfan) `Encode` method silently swallowed malformed responses. If a `data[]` item from the API was missing a field (`index`, `embedding`, or unexpected shape), the loop did `continue` instead of returning an error, leaving `nil` entries in the result slice. Callers got back partial results with no indication anything went wrong, which then crashes downstream consumers when they try to use a `nil` vector. Concrete gaps fixed: - No count-mismatch check between `data` length and input texts (only checked for empty) - No duplicate-index detection (a duplicate would silently overwrite) - No missing-index final scan - No empty-embedding rejection - No per-call context timeout - `EmbeddingConfig.Dimension` (added in #14735) was not propagated This PR replaces `map[string]interface{}` parsing with a typed `baiduEmbeddingResponse` struct, applies the standard four-layer validation (count → out-of-range → duplicate → empty → final missing-index scan), adds `context.WithTimeout(nonStreamCallTimeout)`, and forwards `embeddingConfig.Dimension` as the `dimensions` parameter (Baidu Qianfan v2 uses an OpenAI-compatible API). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-13 12:54:00 +08:00
Jin Hai	ad4717f40a	Go: fix model type check when use the model (#14843 ) ### What problem does this PR solve? ``` RAGFlow(user)> chat with 'glm-ocr@test@zhipu-ai' message 'what is this' CLI error: expect model glm-ocr@zhipu-ai is a chat or multimodal model ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-12 19:44:01 +08:00
Haruko386	45ee5ca9cd	Go: implement provider: Jina (#14838 ) ### What problem does this PR solve? This PR completes the Jina provider The following functionalities are now supported: Jina: - [ ] Chat / Stream Chat (Not available for now: [(Jina chat API docs)](https://api.jina.ai/docs#/Search%20Foundation%20Models/chat_completions_v1_chat_completions_post)) - [x] Embedding - [x] Rerank - [x] Model listing - [x] Provider connection checking - [ ] ~~Balance~~ Verified examples from the CLI: ```plaintext RAGFlow(user)> embed text 'walkerwhat' 'jumperwho' with 'jina-embeddings-v2-base-en@test@jina' dimension 16 +-----------+-------+ \| dimension \| index \| +-----------+-------+ \| 768 \| 0 \| \| 768 \| 1 \| +-----------+-------+ RAGFlow(user)> rerank query 'what is rag' document 'rag is retrieval augment generation' 'rag need llm' 'famous rag project includes ragflow' with 'jina-reranker-v2-base-multilingual@test@jina' top 3; +-------+-----------------+ \| index \| relevance_score \| +-------+-----------------+ \| 0 \| 0.74316794 \| \| 2 \| 0.18713269 \| \| 1 \| 0.15817434 \| +-------+-----------------+ RAGFlow(user)> list supported models from 'jina' 'test' +---------------------------------------------+ \| model_name \| +---------------------------------------------+ \| Jina AI: Jina VLM \| \| Jina AI: Jina Reranker v3 \| \| Jina AI: Jina Code Embeddings 0.5b \| \| Jina AI: Jina Code Embeddings 1.5b \| \| Jina AI: Jina Embeddings v4 \| \| Jina AI: Jina Reranker M0 \| \| Jina AI: ReaderLM v2 \| \| Jina AI: Jina Clip v2 \| \| Jina AI: Jina Embeddings v3 \| \| Jina AI: Jina Colbert v2 \| \| Jina AI: Reader LM 0.5b \| \| Jina AI: Reader LM 1.5b \| \| Jina AI: Jina Reranker v2 Base Multilingual \| \| Jina AI: Jina Clip v1 \| \| Jina AI: Jina Reranker v1 Tiny EN \| \| Jina AI: Jina Reranker v1 Turbo EN \| \| Jina AI: Jina Reranker v1 Base EN \| \| Jina AI: Jina Colbert v1 EN \| \| Jina AI: Jina Embeddings v2 Base ES \| \| Jina AI: Jina Embeddings v2 Base Code \| \| Jina AI: Jina Embeddings v2 Base DE \| \| Jina AI: Jina Embeddings v2 Base ZH \| \| Jina AI: Jina Embeddings v2 Base EN \| \| Jina AI: Jina Embedding B EN v1 \| \| Jina AI: Jina Embeddings v5 Text Small \| \| Jina AI: Jina Embeddings v5 Omni Small \| \| Jina AI: Jina Embeddings v5 Omni Nano \| \| Jina AI: Jina Embeddings v5 Text Nano \| +---------------------------------------------+ RAGFlow(user)> check instance 'test' from 'jina' SUCCESS ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-12 18:03:05 +08:00
tmimmanuel	7d3836907a	Go: implement Embed (embeddings) in Mistral driver (#14807 ) ### What problem does this PR solve? The Mistral Go driver landed in #14805 with chat, list models, and check connection. `Embed` was left as a stub that returns `"not implemented"`. This PR fills the gap. `conf/models/mistral.json` did not list any embedding model out of the box, so a tenant who wanted to use Mistral end to end (chat + embeddings) could not run an embedding call. This PR adds `mistral-embed` to the config and a real `/v1/embeddings` implementation. ### What this PR includes - `conf/models/mistral.json`: add `"embedding": "embeddings"` under `url_suffix` so the driver can build the URL from config (matches the `URLSuffix.Embedding` field already used by openai, siliconflow, zhipu-ai), and add a `mistral-embed` entry under `models` (1024-dimensional vectors, 8192 max input tokens). - `internal/entity/models/mistral.go`: replace the `Embed` stub with a real implementation that POSTs to `/v1/embeddings`. Adds local response types `mistralEmbeddingData` and `mistralEmbeddingResponse`. No factory change. No interface change. ### How the implementation works - Validate `apiConfig`, the API key, and the model name. Use the existing `baseURLForRegion` helper so an unknown region fails fast with a clear error. - Wrap the request with `context.WithTimeout(nonStreamCallTimeout)` so the call has a clear deadline. Same pattern as `ChatWithMessages` and `ListModels` already use in this file. - Send all input texts in one request. The Mistral API accepts the `input` field as an array. - Parse `data[].embedding` and copy each slice into a `[]EmbeddingData` indexed by `data[].index` so the output order matches the input order even if the API returns items in a different order. - An empty input slice returns `[]EmbeddingData{}` with no HTTP call. - Non-200 responses propagate the upstream status line and body. - A final pass checks that every input slot got a vector. If any slot is still empty, return a clear error so the caller does not silently use a zero vector. ### Note on stacking This PR builds on #14805 (the Mistral driver). Until #14805 merges, this PR's diff on GitHub will include both that PR's commits and this one. After #14805 lands on `main`, GitHub will auto-reduce this PR to only the `Embed` changes (one commit, ~111 line diff in `mistral.go` plus 8 lines in `mistral.json`). ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - `go build ./internal/entity/models/...` returns exit 0 on go 1.25 (the `go.mod` minimum). - The full method set on `MistralModel` still matches the `ModelDriver` interface. - Pattern parity with the existing OpenAI Embed implementation (`internal/entity/models/openai.go`). Closes #14806 Depends on #14805 Tracking: #14736 --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-12 17:45:48 +08:00
buua436	14332dd75c	Go: fix dataset time unit (#14837 ) ### What problem does this PR solve? fix dataset time unit ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-12 17:22:16 +08:00
Jin Hai	d08bf02d9b	Go: add ASR, TTS, OCR command (#14836 ) ### What problem does this PR solve? ``` RAGFlow(user)> asr with 'glm-asr-2512@test@zhipu-ai' audio './speech.wav'; CLI error: zhipu, no such method RAGFlow(user)> stream asr with 'glm-asr-2512@test@zhipu-ai' audio './speech.wav'; CLI error: zhipu, no such method RAGFlow(user)> tts with 'glm-tts@test@zhipu-ai' text 'how are you'; CLI error: zhipu, no such method RAGFlow(user)> stream tts with 'glm-tts@test@zhipu-ai' text 'how are you'; CLI error: zhipu, no such method RAGFlow(user)> ocr with 'glm-ocr@test@zhipu-ai' file './test.log'; CLI error: zhipu, no such method ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-12 17:17:44 +08:00
buua436	9ee481807f	GO: implement GET /api/v1/datasets/:dataset_id (#14834 ) ### What problem does this PR solve? implement GET /api/v1/datasets/:dataset_id ### Type of change - [x] Refactoring	2026-05-12 17:16:48 +08:00
tmimmanuel	eaa2e46b1e	Go: implement Embed (embeddings) in Upstage driver (#14819 ) ### What problem does this PR solve? The Upstage Go driver landed in #14817 with chat, list models, and check connection. `Embed` was left as a stub that returns `"not implemented"`. This PR fills the gap. Upstage exposes an OpenAI-compatible embeddings endpoint at `https://api.upstage.ai/v1/solar/embeddings` via the `solar-embedding-1-large` family (`solar-embedding-1-large-query` for queries, `solar-embedding-1-large-passage` for passages), and the Python side has had `UpstageEmbed(OpenAIEmbed)` in `rag/llm/embedding_model.py` for a long time targeting this same path. The existing `conf/models/upstage.json` did not list any embedding model out of the box, so a tenant who wanted to use Upstage end to end could not run an embedding call. This PR fills the gap. ### What this PR includes - `conf/models/upstage.json`: add `"embedding": "embeddings"` under `url_suffix` so the driver can build the URL from config (matches the `URLSuffix.Embedding` field already used by openai, mistral, siliconflow, zhipu-ai), and add `solar-embedding-1-large-query` and `solar-embedding-1-large-passage` entries under `models`. - `internal/entity/models/upstage.go`: replace the `Embed` stub with a real implementation that POSTs to `/v1/solar/embeddings`. Adds local response types `upstageEmbeddingData` and `upstageEmbeddingResponse`. No factory change. No interface change. ### How the implementation works - Validate `apiConfig`, the API key, and the model name. Use the existing `baseURLForRegion` helper so an unknown region fails fast with a clear error. - Wrap the request with `context.WithTimeout(nonStreamCallTimeout)` so the call has a clear deadline. Same pattern as `ChatWithMessages` and `ListModels` already use in this file. - Send all input texts in one request. The Upstage API accepts the `input` field as an array. - Parse `data[].embedding` and copy each slice into a `[]EmbeddingData` indexed by `data[].index` so the output order matches the input order even if the API returns items in a different order. - An empty input slice returns `[]EmbeddingData{}` with no HTTP call. - Non-200 responses propagate the upstream status line and body. - A final pass checks that every input slot got a vector. If any slot is still empty, return a clear error so the caller does not silently use a zero vector. ### Note on stacking This PR builds on #14817 (the Upstage driver). Until #14817 merges, this PR's diff on GitHub will include both that PR's commits and this one. After #14817 lands on `main`, GitHub will auto-reduce this PR to only the `Embed` changes (one commit, ~119 line diff in `upstage.go` plus ~15 lines in `upstage.json`). ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - `go build ./internal/entity/models/...` returns exit 0 on go 1.25 (the `go.mod` minimum). - The full method set on `UpstageModel` still matches the `ModelDriver` interface. - Pattern parity with the existing Mistral Embed (`internal/entity/models/mistral.go`) and OpenAI Embed (`internal/entity/models/openai.go`) implementations. Closes #14818 Depends on #14817 Tracking: #14736 --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-12 16:11:06 +08:00
Haruko386	ebab3513c4	Go: implement provider: Baichuan (#14832 ) ### What problem does this PR solve? This PR completes the Baichuan provider The following functionalities are now supported: Baichuan: - [x] Chat / Stream Chat - [x] Embedding - [ ] ~~Rerank~~ - [ ] ~~Model listing~~ - [ ] ~~Provider connection checking~~ - [ ] ~~Balance~~ Verified examples from the CLI: ```plaintext # Baichuan RAGFlow(user)> embed text 'walkerwhat' 'jumperwho' with 'Baichuan-Text-Embedding@test@baichuan' dimension 16; +-----------+-------+ \| dimension \| index \| +-----------+-------+ \| 1024 \| 0 \| \| 1024 \| 1 \| +-----------+-------+ AGFlow(user)> chat with 'Baichuan-M2@test@baichuan' message 'who r u' Answer: I'm BaiChuan, a helpful AI assistant created by Baichuan-AI. I'm designed to be a knowledgeable, friendly, and reliable assistant for various tasks like answering questions, explaining concepts, writing content, and more. Feel free to ask me anything! 😊 Time: 1.637975 RAGFlow(user)> stream chat with 'Baichuan-M2@test@baichuan' message 'who r u' Answer: I'm BaiChuan-m2, an AI assistant developed by Baichuan-AI. My purpose is to help you with a wide range of tasks by providing information, answering questions, solving problems, and assisting with creative projects. Think of me as a helpful digital companion! If you have any questions or need assistance, just let me know.😊 Time: 1.692321 ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-05-12 16:10:32 +08:00
tmimmanuel	558ea51a0f	Go: implement provider: StepFun (#14815 ) ### What problem does this PR solve? Add a Go driver for StepFun (阶跃星辰), one of the unchecked providers on the umbrella tracking issue #14736. Until this PR, a tenant who configured `stepfun` as a model provider in the Go layer fell through to the default branch of `internal/entity/models/factory.go` and got the dummy driver. Chat, list models, and check connection all returned `"not implemented"` instead of reaching the StepFun API. The Python side has had StepFun registered in `rag/llm/__init__.py` as a `SupportedLiteLLMProvider` with base URL `https://api.stepfun.com/v1`, plus `StepFunCV` for vision and `StepFunSeq2txt` for ASR, but no Go path. StepFun's chat API is OpenAI-compatible, so the implementation pattern is the same as the merged Moonshot driver (#14433) and OpenAI driver (#14605). ### What this PR includes - New file `internal/entity/models/stepfun.go` with a `StepFunModel` that implements the `ModelDriver` interface. - `factory.go`: route the `"stepfun"` provider name to `NewStepFunModel`. - New `conf/models/stepfun.json` with the public StepFun chat models (step-2-16k, step-1 family in 8k/32k/128k/256k context lengths, step-1-flash, and the step-1v / step-1o vision models) and `url_suffix` entries for `chat` and `models`. ### How the driver works - StepFun exposes the OpenAI-compatible API at `https://api.stepfun.com/v1`. - `ChatWithMessages` and `ChatStreamlyWithSender` post to `/chat/completions` in the same shape as the merged moonshot, openrouter, and openai drivers. - `ListModels` and `CheckConnection` call `/models` to list available ids and confirm the API key works. - `Embed` is left as `"not implemented"`. StepFun has not advertised a public embeddings endpoint in the API reference linked from the umbrella issue (`https://platform.stepfun.com/docs/en/api-reference/chat/chat-completion-create` is the chat endpoint), so any real implementation belongs in a separate follow-up only after the endpoint is verified. - `Rerank` and `Balance` return `"no such method"` because StepFun does not expose either. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - `go build ./internal/entity/models/...` returns exit 0 with no errors on go 1.25 (the `go.mod` minimum). - Method set of `StepFunModel` matches the `ModelDriver` interface: `NewInstance`, `Name`, `ChatWithMessages`, `ChatStreamlyWithSender`, `Embed`, `Rerank`, `ListModels`, `Balance`, `CheckConnection`. - Pattern parity with the merged moonshot (#14433), openai (#14605), openrouter (#14652), and xai (#14550) drivers. Closes #14814 Tracking: #14736	2026-05-12 13:49:35 +08:00
Haruko386	128a64eae5	Refactor(Go): remove hardcode in huggingface provider (#14822 ) ### What problem does this PR solve? remove hardcode in `huggingface` provider ### Type of change - [x] Refactoring	2026-05-12 11:35:26 +08:00
Jin Hai	2f2d1569e6	Go: fix retrieval test error (#14794 ) ### What problem does this PR solve? 1. Add region check in zhipu-ai embed method 2. Fix retrieval test ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-11 20:19:08 +08:00
Haruko386	3e90d303e0	Go: implement provider: CoHere and FishAudio (#14790 ) ### What problem does this PR solve? This PR completes the Cohere provider integration (upgrading to the new Cohere V2 API) and enhances the Fish Audio provider in RAGFlow. The following functionalities are now supported: Cohere: - [x] Chat / Think Chat / Stream Chat / Stream Think Chat - [x] Embedding - [x] Rerank - [x] Model listing - [x] Provider connection checking - [ ] Balance Fish Audio: - [x] Model listing (`ListModels`) - [x] Balance (`Balance`) ----- Verified examples from the CLI: ```plaintext # Cohere RAGFlow(user)> think chat with 'command-a-reasoning-08-2025@test3@cohere' message 'jumperwho' Thinking: Okay, the user wrote "jumperwho". Let me try to figure out what they might be asking. First, I'll check if it's a misspelling. "Jumper" ...... Hmm. Since the query is unclear, the best approach is to ask the user to provide more context or correct any possible typos. Answer: It seems there might be a typo or missing context in your query "jumperwho." Could you clarify what you're referring to? For example: - Are you asking about a jumper (a type of sweater, a person who jumps, or a component in electronics)? - Is this related to a specific context, like a movie (e.g., the 2008 film Jumper) or a game? - Did you mean to ask about a person ("who") associated with jumping (e.g., a parachutist)? Let me know so I can provide a helpful response! 😊 Time: 6.710331 RAGFlow(user)> stream think chat with 'command-a-reasoning-08-2025@test3@cohere' message 'jumperwho' Thinking: , the user mentioned "jumperwho". Let me try to figure out what they're referring to. First, I'll check if it's a misspelling. "Jumper" could be a typo for "jumper" or maybe a username. Alternatively, it might be a combination of words like "jumper who",....... the best approach is to inform the user that I don't recognize the term and ask if they can provide more context or clarify what they mean by "jumperwho". That way, I can assist them better once I have more information. Answer: seems "jumperwho" isn't a widely recognized term, proper noun, or acronym in common usage. Could you provide more context or clarify what you mean by "jumperwho"? This will help me understand your question or request better! Time: 4.513596 RAGFlow(user)> embed text 'walkerwhat' 'jumperwho' with 'embed-v4.0@test3@cohere' dimension 16; +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ \| embedding \| index \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ \| [-0.016643638 -0.001957038 0.0055713872 0.009027058 0.05275187 -0.024542313 -0.044006906 0.024119169 0.0014192933 0.006558722 0.0019129605 -0.021016119 -0.026516981 -0.017489925 0.021298215 0.017772019 0.04569948 0.008886009 0.012059584 -0.0014721862 0.... \| 0 \| \| [0.018778935 -0.0063459855 -0.0006839742 0.0046623563 0.0067668925 -0.018001877 -0.03963003 0.035744734 -0.014246088 -0.0020721585 -0.006313608 0.025124922 -0.010749322 0.01217393 -0.010231283 -0.025254432 0.021498645 -0.028880708 0.019167464 -0.0058279... \| 1 \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ RAGFlow(user)> rerank query 'what is rag' document 'rag is retrieval augment generation' 'rag need llm' 'famous rag project includes ragflow' with 'rerank-v4.0-pro@test@cohere' top 3; +-------+-----------------+ \| index \| relevance_score \| +-------+-----------------+ \| 0 \| 0.91744334 \| \| 1 \| 0.7458429 \| \| 2 \| 0.68729424 \| +-------+-----------------+ RAGFlow(user)> list supported models from 'cohere' 'test' +-------------------------------------+ \| model_name \| +-------------------------------------+ \| c4ai-aya-expanse-32b \| \| c4ai-aya-vision-32b \| \| cohere-transcribe-03-2026 \| \| command-a-03-2025 \| \| command-a-reasoning-08-2025 \| \| command-a-translate-08-2025 \| \| command-a-vision-07-2025 \| \| command-r-08-2024 \| \| command-r-plus-08-2024 \| \| command-r7b-12-2024 \| \| command-r7b-arabic-02-2025 \| \| embed-english-light-v3.0 \| \| embed-english-light-v3.0-image \| \| embed-english-v3.0 \| \| embed-english-v3.0-image \| \| embed-multilingual-light-v3.0 \| \| embed-multilingual-light-v3.0-image \| \| embed-multilingual-v3.0 \| \| embed-multilingual-v3.0-image \| \| embed-v4.0 \| +-------------------------------------+ RAGFlow(user)> check instance 'test' from 'cohere' SUCCESS # FishAudio RAGFlow(user)> list supported models from 'fishaudio' 'test' +----------------------------------------+ \| model_name \| +----------------------------------------+ \| Valentino Narración Biblica Fer \| \| Super Smash Bros. 4/Ultimate Announcer \| \| Farid Dieck \| \| عصام الشوالي \| \| ALEX_CHIKNA \| \| Energetic Male \| \| voz de locutor k \| \| يي \| \| ELITE \| \| Mortal Kombat \| +----------------------------------------+ RAGFlow(user)> show balance from 'fishaudio' 'test' +----------------------------------+-----------------------------+--------+-----------------+------------------+-----------------------------+----------------------------------+ \| _id \| created_at \| credit \| has_free_credit \| has_phone_sha256 \| updated_at \| user_id \| +----------------------------------+-----------------------------+--------+-----------------+------------------+-----------------------------+----------------------------------+ \| 82ffec12cf984d88a30ec504d7909812 \| 2026-05-09T07:52:16.119000Z \| 0 \| \| false \| 2026-05-09T07:52:16.119000Z \| 2578ab1126804d6eaa630552400d7ff3 \| +----------------------------------+-----------------------------+--------+-----------------+------------------+-----------------------------+----------------------------------+ ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-05-11 20:18:38 +08:00
Renzo	39ee2fb120	Go: implement Rerank in NVIDIA driver (#14778 ) ## Summary - Replaces the `"no such method"` stub on `NvidiaModel.Rerank` (`internal/entity/models/nvidia.go`) with a real implementation against NVIDIA NIM's `/ranking` endpoint. - Mirrors the existing Python `NvidiaRerank` class at `rag/llm/rerank_model.py:149-190` for behavior parity: same `passages`/`query.text`/`logit` payload shape; `top_n` set to `len(documents)` so every input gets a score returned in original order (the issue body's spec omitted `top_n`, which would cause silent data loss). - Adds the `"rerank": "ranking"` URL suffix and two NIM rerank model entries (`nvidia/nv-rerankqa-mistral-4b-v3`, `nvidia/llama-3.2-nv-rerankqa-1b-v2`) to `conf/models/nvidia.json` so the picker exposes them. - Follows the same shape as the recently merged Aliyun (#14676), Gitee (#14656), and ZhipuAI (#14608) Rerank implementations: lowercase per-driver request/response types, conversion to the project-wide `RerankResponse{Data: []RerankResult}`, per-call `context.WithTimeout` of 30s. Closes #14720 ## Test plan - [x] `gofmt -l internal/entity/models/nvidia.go` — clean - [x] `go vet ./internal/entity/models/...` — no new errors introduced (the two pre-existing vet errors in `baidu.go:642` and `openrouter.go:566` are unrelated to this PR) - [x] `go build ./internal/entity/models/...` — succeeds - [x] `python3 -c "import json; json.load(open('conf/models/nvidia.json'))"` — JSON valid - [ ] Live smoke test against NVIDIA NIM with a real API key (requires reviewer with NIM credentials) ## Notes for reviewers - The issue body suggested omitting `top_n`. The Python reference includes it (`top_n: len(texts)`), and without it NVIDIA returns only the default top-K rankings rather than scores for every input. This PR follows the Python. - The URL host is `integrate.api.nvidia.com` (kept consistent with the existing chat/embeddings BaseURL in `nvidia.go`), not the legacy `ai.api.nvidia.com` host the Python uses. NIM's unified endpoint accepts the model names as-is, so no per-model URL transform is needed.	2026-05-11 17:21:16 +08:00

1 2 3 4

199 Commits