ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-08 08:07:21 +08:00

Author	SHA1	Message	Date
Jin Hai	2f2d1569e6	Go: fix retrieval test error (#14794 ) ### What problem does this PR solve? 1. Add region check in zhipu-ai embed method 2. Fix retrieval test ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-11 20:19:08 +08:00
Haruko386	3e90d303e0	Go: implement provider: CoHere and FishAudio (#14790 ) ### What problem does this PR solve? This PR completes the Cohere provider integration (upgrading to the new Cohere V2 API) and enhances the Fish Audio provider in RAGFlow. The following functionalities are now supported: Cohere: - [x] Chat / Think Chat / Stream Chat / Stream Think Chat - [x] Embedding - [x] Rerank - [x] Model listing - [x] Provider connection checking - [ ] Balance Fish Audio: - [x] Model listing (`ListModels`) - [x] Balance (`Balance`) ----- Verified examples from the CLI: ```plaintext # Cohere RAGFlow(user)> think chat with 'command-a-reasoning-08-2025@test3@cohere' message 'jumperwho' Thinking: Okay, the user wrote "jumperwho". Let me try to figure out what they might be asking. First, I'll check if it's a misspelling. "Jumper" ...... Hmm. Since the query is unclear, the best approach is to ask the user to provide more context or correct any possible typos. Answer: It seems there might be a typo or missing context in your query "jumperwho." Could you clarify what you're referring to? For example: - Are you asking about a jumper (a type of sweater, a person who jumps, or a component in electronics)? - Is this related to a specific context, like a movie (e.g., the 2008 film Jumper) or a game? - Did you mean to ask about a person ("who") associated with jumping (e.g., a parachutist)? Let me know so I can provide a helpful response! 😊 Time: 6.710331 RAGFlow(user)> stream think chat with 'command-a-reasoning-08-2025@test3@cohere' message 'jumperwho' Thinking: , the user mentioned "jumperwho". Let me try to figure out what they're referring to. First, I'll check if it's a misspelling. "Jumper" could be a typo for "jumper" or maybe a username. Alternatively, it might be a combination of words like "jumper who",....... the best approach is to inform the user that I don't recognize the term and ask if they can provide more context or clarify what they mean by "jumperwho". That way, I can assist them better once I have more information. Answer: seems "jumperwho" isn't a widely recognized term, proper noun, or acronym in common usage. Could you provide more context or clarify what you mean by "jumperwho"? This will help me understand your question or request better! Time: 4.513596 RAGFlow(user)> embed text 'walkerwhat' 'jumperwho' with 'embed-v4.0@test3@cohere' dimension 16; +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ \| embedding \| index \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ \| [-0.016643638 -0.001957038 0.0055713872 0.009027058 0.05275187 -0.024542313 -0.044006906 0.024119169 0.0014192933 0.006558722 0.0019129605 -0.021016119 -0.026516981 -0.017489925 0.021298215 0.017772019 0.04569948 0.008886009 0.012059584 -0.0014721862 0.... \| 0 \| \| [0.018778935 -0.0063459855 -0.0006839742 0.0046623563 0.0067668925 -0.018001877 -0.03963003 0.035744734 -0.014246088 -0.0020721585 -0.006313608 0.025124922 -0.010749322 0.01217393 -0.010231283 -0.025254432 0.021498645 -0.028880708 0.019167464 -0.0058279... \| 1 \| +------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+ RAGFlow(user)> rerank query 'what is rag' document 'rag is retrieval augment generation' 'rag need llm' 'famous rag project includes ragflow' with 'rerank-v4.0-pro@test@cohere' top 3; +-------+-----------------+ \| index \| relevance_score \| +-------+-----------------+ \| 0 \| 0.91744334 \| \| 1 \| 0.7458429 \| \| 2 \| 0.68729424 \| +-------+-----------------+ RAGFlow(user)> list supported models from 'cohere' 'test' +-------------------------------------+ \| model_name \| +-------------------------------------+ \| c4ai-aya-expanse-32b \| \| c4ai-aya-vision-32b \| \| cohere-transcribe-03-2026 \| \| command-a-03-2025 \| \| command-a-reasoning-08-2025 \| \| command-a-translate-08-2025 \| \| command-a-vision-07-2025 \| \| command-r-08-2024 \| \| command-r-plus-08-2024 \| \| command-r7b-12-2024 \| \| command-r7b-arabic-02-2025 \| \| embed-english-light-v3.0 \| \| embed-english-light-v3.0-image \| \| embed-english-v3.0 \| \| embed-english-v3.0-image \| \| embed-multilingual-light-v3.0 \| \| embed-multilingual-light-v3.0-image \| \| embed-multilingual-v3.0 \| \| embed-multilingual-v3.0-image \| \| embed-v4.0 \| +-------------------------------------+ RAGFlow(user)> check instance 'test' from 'cohere' SUCCESS # FishAudio RAGFlow(user)> list supported models from 'fishaudio' 'test' +----------------------------------------+ \| model_name \| +----------------------------------------+ \| Valentino Narración Biblica Fer \| \| Super Smash Bros. 4/Ultimate Announcer \| \| Farid Dieck \| \| عصام الشوالي \| \| ALEX_CHIKNA \| \| Energetic Male \| \| voz de locutor k \| \| يي \| \| ELITE \| \| Mortal Kombat \| +----------------------------------------+ RAGFlow(user)> show balance from 'fishaudio' 'test' +----------------------------------+-----------------------------+--------+-----------------+------------------+-----------------------------+----------------------------------+ \| _id \| created_at \| credit \| has_free_credit \| has_phone_sha256 \| updated_at \| user_id \| +----------------------------------+-----------------------------+--------+-----------------+------------------+-----------------------------+----------------------------------+ \| 82ffec12cf984d88a30ec504d7909812 \| 2026-05-09T07:52:16.119000Z \| 0 \| \| false \| 2026-05-09T07:52:16.119000Z \| 2578ab1126804d6eaa630552400d7ff3 \| +----------------------------------+-----------------------------+--------+-----------------+------------------+-----------------------------+----------------------------------+ ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-05-11 20:18:38 +08:00
Renzo	39ee2fb120	Go: implement Rerank in NVIDIA driver (#14778 ) ## Summary - Replaces the `"no such method"` stub on `NvidiaModel.Rerank` (`internal/entity/models/nvidia.go`) with a real implementation against NVIDIA NIM's `/ranking` endpoint. - Mirrors the existing Python `NvidiaRerank` class at `rag/llm/rerank_model.py:149-190` for behavior parity: same `passages`/`query.text`/`logit` payload shape; `top_n` set to `len(documents)` so every input gets a score returned in original order (the issue body's spec omitted `top_n`, which would cause silent data loss). - Adds the `"rerank": "ranking"` URL suffix and two NIM rerank model entries (`nvidia/nv-rerankqa-mistral-4b-v3`, `nvidia/llama-3.2-nv-rerankqa-1b-v2`) to `conf/models/nvidia.json` so the picker exposes them. - Follows the same shape as the recently merged Aliyun (#14676), Gitee (#14656), and ZhipuAI (#14608) Rerank implementations: lowercase per-driver request/response types, conversion to the project-wide `RerankResponse{Data: []RerankResult}`, per-call `context.WithTimeout` of 30s. Closes #14720 ## Test plan - [x] `gofmt -l internal/entity/models/nvidia.go` — clean - [x] `go vet ./internal/entity/models/...` — no new errors introduced (the two pre-existing vet errors in `baidu.go:642` and `openrouter.go:566` are unrelated to this PR) - [x] `go build ./internal/entity/models/...` — succeeds - [x] `python3 -c "import json; json.load(open('conf/models/nvidia.json'))"` — JSON valid - [ ] Live smoke test against NVIDIA NIM with a real API key (requires reviewer with NIM credentials) ## Notes for reviewers - The issue body suggested omitting `top_n`. The Python reference includes it (`top_n: len(texts)`), and without it NVIDIA returns only the default top-K rankings rather than scores for every input. This PR follows the Python. - The URL host is `integrate.api.nvidia.com` (kept consistent with the existing chat/embeddings BaseURL in `nvidia.go`), not the legacy `ai.api.nvidia.com` host the Python uses. NIM's unified endpoint accepts the model names as-is, so no per-model URL transform is needed.	2026-05-11 17:21:16 +08:00
Jin Hai	9b3850339b	Go: add development guide document (#14785 ) ### What problem does this PR solve? As the title suggests. ### Type of change - [x] Documentation Update Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-11 17:20:41 +08:00
Jin Hai	c55e23e7e2	Go: refactor embedding interface (#14757 ) ### What problem does this PR solve? Provide embedding index according to the input text ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-11 14:45:30 +08:00
Joseff	13e6554901	Fix(Go): make OpenRouter Encode fail loudly on malformed responses (#14717 ) ### What problem does this PR solve? The OpenRouter `Encode` method silently swallowed malformed responses. If a `data[]` item from the API was missing a field (`index`, `embedding`, or unexpected shape), the loop did `continue` instead of returning an error — leaving `nil` entries in the result slice. Callers got back partial results with no indication anything went wrong, which then crashes downstream consumers when they try to use a `nil` vector. There were three concrete gaps: - No count-mismatch check between `data` length and input texts (only checked for empty) - No duplicate-index detection (a duplicate would silently overwrite) - Parse failures on individual items returned partial slices instead of erroring This PR replaces `map[string]interface{}` parsing with a typed `openrouterEmbeddingResponse` struct and applies the same 3-layer validation used in the other drivers (count mismatch → out-of-range index → duplicate index), so any malformed response produces a clear error instead of corrupted data. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-11 12:57:11 +08:00
Panda Dev	530edbac99	Go: implement Encode (embeddings) in LM Studio driver (#14694 ) ### What problem does this PR solve? The LM Studio Go driver shipped with a stub \`Encode\` method that returned \`no such method\`, even though LM Studio is one of the most common local LLM runners on macOS and Windows and exposes an OpenAI-compatible embeddings endpoint at \`/v1/embeddings\`. LM Studio users routinely load local embedding models such as \`nomic-ai/nomic-embed-text-v1.5\`, \`mixedbread-ai/mxbai-embed-large-v1\`, or \`BAAI/bge-m3\`. They run on the same \`/v1\` namespace as chat. The existing \`ListModels\` already discovers them, but because \`Encode\` was a stub, a tenant who picked one of these models in the Go layer could not actually run an embedding call. This finishes the local-LLM trio: Ollama Encode (#14664) and vLLM Encode (#14688) are already in flight, both using the same OpenAI-compatible \`/embeddings\` shape. ### What this PR includes - \`conf/models/lmstudio.json\`: add \`\"embedding\": \"embeddings\"\` under \`url_suffix\` so the driver can build the URL from config. - \`internal/entity/models/lmstudio.go\`: replace the \`Encode\` stub with a real implementation. Adds a small local response type that matches the OpenAI-compatible shape. No factory change. No interface change. ### How the driver works - Validate the model name. The API key is optional for local LM Studio, so the Authorization header is only set when both \`apiConfig\` and \`ApiKey\` are non-nil and non-empty, the same pattern the recently merged CheckConnection PR (#14614) uses. - Resolve the region with a default fallback. Return a clear "missing base URL" error when the user has not configured the local access address yet. - Use a per-call \`context.WithTimeout(30s)\` and \`http.NewRequestWithContext\`, the same pattern the merged Aliyun Encode (#14647) and the in-flight Ollama Encode (#14664) and vLLM Encode (#14688) use. - Send \`{model, input: [texts]}\` in one request. - Parse \`data[].embedding\` and copy each slice into a \`[][]float64\` indexed by \`data[].index\`, so the output order matches the input order. - Handle both \`float64\` and \`float32\` element types. - Empty input returns \`[][]float64{}\` with no HTTP call. - Length mismatch between input and result, out-of-range index, and any missing slot all return clear errors instead of silent zero vectors. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - \`go build ./internal/entity/models/...\` in a clean go 1.25 image returns exit 0. - The full method set on \`LmStudioModel\` still matches the \`ModelDriver\` interface. - Pattern parity with the merged Aliyun Encode (#14647), the in-flight Ollama Encode (#14664) and vLLM Encode (#14688), and the existing SiliconFlow Encode. Closes #14693	2026-05-11 12:55:57 +08:00
Joseff	0580c137fa	Perf(Go): batch SiliconFlow Encode requests with 32-item chunking (#14719 ) ### What problem does this PR solve? The SiliconFlow `Encode` method sent one HTTP request per text, which is wasteful and slow when indexing many documents (e.g., 100 docs = 100 round-trips). SiliconFlow's `/v1/embeddings` is OpenAI-compatible and accepts an array of strings in `input` (officially documented at https://docs.siliconflow.cn/en/api-reference/embeddings/create-embeddings, with a documented max array size of 32). This PR batches the requests up to that limit, reducing 100 docs to ~4 round-trips, and replaces `map[string]interface{}` parsing with a typed struct using the same 3-layer validation (count mismatch, out-of-range index, duplicate index) used in the other drivers. ### Type of change - [x] Performance Improvement	2026-05-11 12:55:27 +08:00
BitToby	4b96362092	Go: implement Encode (embeddings) in NVIDIA driver (#14700 ) ### What problem does this PR solve? The NVIDIA Go driver in `internal/entity/models/nvidia.go` shipped with a stub `Encode` method that returned `no such method`. `conf/models/nvidia.json` already lists `nvidia/llama-3.2-nemoretriever-1b-vlm-embed-v1` as an embedding model, but the conf had no `embedding` URL suffix, so the picker had nothing wired even if `Encode` worked. A tenant who wanted to use NVIDIA NIM for chat (already working) and embeddings from a single provider could not, even though the upstream endpoint is public at `https://integrate.api.nvidia.com/v1/embeddings` and uses an OpenAI-compatible request body extended with the NVIDIA-specific `input_type` and `truncate` fields. Several other Go drivers already implement `Encode` (siliconflow, zhipu-ai, aliyun), so the interface and the pattern are well-established. This PR fills the gap. ### What this PR includes * `conf/models/nvidia.json`: declare the `embedding` URL suffix alongside the existing `chat` and `models` entries. The embedding model entry was already present, so no model addition is needed. * `internal/entity/models/nvidia.go`: replace the `Encode` stub with a real implementation. Adds a small local response type that matches the OpenAI-compatible shape NVIDIA NIM returns. No factory change. No interface change. ### How the driver works * Validates `apiConfig` and the API key, validates the model name, resolves the region with a default fallback (matching the pattern the merged `ListModels` and `CheckConnection` paths in this driver already use), and builds the URL from `BaseURL[region] + URLSuffix.Embedding`. * Sends all input texts in one request as the `input` array, with the NVIDIA-specific `input_type: "query"`, `encoding_format: "float"`, and `truncate: "END"` fields, mirroring the Python `NvidiaEmbed` reference. * Parses `data[].embedding` and copies each slice into `[][]float64` indexed by `data[].index` so the output order matches the input order even if the API returns items in a different order. * Handles both `float64` and `float32` element types. * Empty input returns `[][]float64{}` with no HTTP call. * Non-200 responses propagate the upstream status line and body. * A final pass checks every input slot got a vector and returns a clear error if any slot is still nil. * Per-call 30s context deadline so a slow call cannot block forever. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? * `go build ./internal/entity/models/...` returns exit 0. * `go vet ./internal/entity/models/...` is clean. * `gofmt -l internal/entity/models/nvidia.go` is clean. * The full method set on `NvidiaModel` still matches the `ModelDriver` interface. * Pattern parity with the just-merged Aliyun `Encode` (#14647). Closes #14699	2026-05-11 12:50:50 +08:00
Jack Storment	8ff623fbc4	Go: implement Encode (embeddings) in Ollama driver (#14664 ) ### What problem does this PR solve? The Ollama Go driver shipped with a stub \`Encode\` method that returned \`no such method\`, even though Ollama is one of the most common local LLM runners and exposes an OpenAI-compatible embeddings endpoint at \`/v1/embeddings\`. Ollama users routinely run local embedding models such as \`nomic-embed-text\`, \`mxbai-embed-large\`, or \`bge-m3\`. Pulled with \`ollama pull <model>\` and served on the same \`/v1\` namespace as chat. The existing \`ListModels\` already discovers them, but because \`Encode\` was a stub, a tenant who picked one of these models in the Go layer could not actually run an embedding call. ### What this PR includes - \`conf/models/ollama.json\`: add \`\"embedding\": \"embeddings\"\` under \`url_suffix\` so the driver can build the URL from config. - \`internal/entity/models/ollama.go\`: replace the \`Encode\` stub with a real implementation. Adds a small local response type that matches the OpenAI-compatible shape. No factory change. No interface change. ### How the driver works - Validate the model name. The API key is optional for local Ollama, so the Authorization header is only set when both \`apiConfig\` and \`ApiKey\` are non-nil and non-empty, the same pattern the recently merged CheckConnection PR (#14614) uses. - Resolve the region with a default fallback. Return a clear "missing base URL" error when the user has not configured the local access address yet. - Use a per-call \`context.WithTimeout(30s)\` and \`http.NewRequestWithContext\`, the same pattern the merged Aliyun Encode (#14647) uses. - Send \`{model, input: [texts]}\` in one request. - Parse \`data[].embedding\` and copy each slice into a \`[][]float64\` indexed by \`data[].index\`, so the output order matches the input order. - Handle both \`float64\` and \`float32\` element types. - Empty input returns \`[][]float64{}\` with no HTTP call. - Length mismatch between input and result, out-of-range index, and any missing slot all return clear errors instead of silent zero vectors. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - \`go build ./internal/entity/models/...\` in a clean go 1.25 image returns exit 0. - The full method set on \`OllamaModel\` still matches the \`ModelDriver\` interface. - Pattern parity with the merged Aliyun Encode (#14647) and the existing SiliconFlow Encode. Closes #14662	2026-05-11 12:50:15 +08:00
Panda Dev	fa53b93dd5	Go: implement Encode (embeddings) in vLLM driver (#14688 ) ### What problem does this PR solve? The vLLM Go driver shipped with a stub \`Encode\` method that returned \`not implemented\`, even though vLLM is one of the most common production-grade self-hosted inference servers and exposes an OpenAI-compatible embeddings endpoint at \`/v1/embeddings\`. Users who self-host \`BAAI/bge-m3\`, \`Qwen3-Embedding-\`, \`NV-Embed-v2\`, or similar models on vLLM could not run an embedding call through the Go layer. The existing \`ListModels\` already discovers the loaded models, but the embedding path failed because \`Encode\` was a stub. ### What this PR includes - \`conf/models/vllm.json\`: add \`\"embedding\": \"embeddings\"\` under \`url_suffix\` so the driver can build the URL from config. - \`internal/entity/models/vllm.go\`: replace the \`Encode\` stub with a real implementation. Adds a small local response type that matches the OpenAI-compatible shape. No factory change. No interface change. ### How the driver works - Validate the model name. The API key is optional for self-hosted vLLM, so the Authorization header is only set when both \`apiConfig\` and \`ApiKey\` are non-nil and non-empty, the same pattern the recently merged CheckConnection PR (#14614) uses. - Resolve the region with a default fallback. Return a clear "missing base URL" error when the user has not configured the local access address yet. - Use a per-call \`context.WithTimeout(30s)\` and \`http.NewRequestWithContext\`, the same pattern the merged Aliyun Encode (#14647) and in-flight Ollama Encode (#14664) use. - Send \`{model, input: [texts]}\` in one request. - Parse \`data[].embedding\` and copy each slice into a \`[][]float64\` indexed by \`data[*].index\`, so the output order matches the input order. - Handle both \`float64\` and \`float32\` element types. - Empty input returns \`[][]float64{}\` with no HTTP call. - Length mismatch between input and result, out-of-range index, and any missing slot all return clear errors instead of silent zero vectors. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - \`go build ./internal/entity/models/...\` in a clean go 1.25 image returns exit 0. - The full method set on \`VllmModel\` still matches the \`ModelDriver\` interface. - Pattern parity with the merged Aliyun Encode (#14647), the in-flight Ollama Encode (#14664), and the existing SiliconFlow Encode. Closes #14687	2026-05-11 12:09:17 +08:00
BitToby	bfb4a0eea2	Go: implement Encode (embeddings) in Gitee AI driver (#14698 ) ### What problem does this PR solve? The Gitee AI Go driver in `internal/entity/models/gitee.go` shipped with a stub `Encode` method that returned `gitee, no such method`, even though `conf/models/gitee.json` already wires the `embedding` URL suffix. The conf also listed no embedding models, so the picker had nothing to select. This blocked any tenant who wanted to use Gitee AI for chat, rerank (already working, see #14656), and embeddings from a single provider. This PR fills the gap, mirroring the just-merged Aliyun `Encode` (#14647): - `internal/entity/models/gitee.go`: replace the `Encode` stub with a real implementation. Validates inputs, resolves the region with a default fallback, POSTs the standard OpenAI-compatible `{"model", "input": [...]}` body to `BaseURL[region] + URLSuffix.Embedding`, parses `data[].embedding` indexed by `data[].index` so output order matches input order, handles both `float64` and `float32` element types, and uses a 30s per-call context deadline matching the merged `Rerank`. - `conf/models/gitee.json`: add `BAAI/bge-m3` so the embedding picker has something to select. No factory change. No interface change. No URL suffix change. Verified with `go build`, `go vet`, and `gofmt -l` : all clean. Closes #14697 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-11 11:56:46 +08:00
Joseff	827cceccba	Fix(Go): correct Name() and region URL fallback in Aliyun driver (#14673 ) ### What problem does this PR solve? Two bugs in the Aliyun Go driver: 1. `Name()` returns `"siliconflow"` — a copy-paste bug from when the driver was created. `Name()` is used in error messages and log output, so every Aliyun error incorrectly attributed itself to SiliconFlow. 2. Silent empty URL for unknown regions in `ChatWithMessages`, `ChatStreamlyWithSender`, and `ListModels` — all three methods construct the request URL as `z.BaseURL[region]` without checking whether the key exists. For an unrecognised region this returns `""`, producing a malformed URL like `"/chat/completions"` that the HTTP transport rejects with a confusing error. `Encode` and `Rerank` (already merged) correctly fall back to `"default"` and return a clear error. This PR applies the same pattern to the remaining three methods. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-11 11:26:24 +08:00
Carmen Fernández Ruiz	f852a7524e	fix(go): wire Google CheckConnection to ListModels (#14660 ) ### What problem does this PR solve? Closes #14703 `GoogleModel.CheckConnection` currently returns a hardcoded `no such method` error even though the Google Go driver already supports `ListModels`. This makes provider connection checks fail regardless of whether the configured API key can list Google models. This PR makes `CheckConnection` call `ListModels`, adds a small API-key guard for nil, empty, and whitespace-only keys, and keeps `ListModels` useful by following paginated Google model responses. ### What stays unchanged * Google model listing still uses the Google GenAI SDK with `genai.BackendGeminiAPI`. * Model names still come from `models.Items[].Name`. `Balance`, `Encode`, chat, streaming, provider config, and factory wiring are unchanged. ### Tests and validation Added focused unit coverage for: * `CheckConnection` delegating to `ListModels` and returning its error * nil, missing, empty, and whitespace-only API key validation * model-name passthrough from the list-models adapter * paginated model listing, empty-result preservation, and next-page error propagation Validated current PR head `17ceef43515ba8c46c254dd349b9085bf26dcbea` locally with Go 1.25.0: * `go test ./internal/entity/models -run 'TestGoogleModel\|TestCollectGoogleModelNames' -count=1 -v` - PASS * `go test ./internal/entity/models -count=1` - PASS * `go test -race ./internal/entity/models -count=1` - PASS * `gofmt -w internal/entity/models/google.go internal/entity/models/google_test.go` - PASS, no diff * `git diff --check` - PASS ### Type of change * [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-11 11:25:17 +08:00
Joseff	f4f8bed9f7	Go: implement Encode (embeddings) in Google Gemini driver (#14682 ) ### What problem does this PR solve? - Implements the `Encode` method in the Google Gemini driver, which was previously a stub returning `not implemented` - Uses the `google.golang.org/genai` SDK's `EmbedContent` API, which routes to the `batchEmbedContents` endpoint internally — all texts are sent in a single request - Adds `text-embedding-004` (max 2048 tokens) to `conf/models/google.json` - Response values are `[]float32` from the SDK and are cast to `[]float64` to satisfy the `ModelDriver` interface ## Files changed - `internal/entity/models/google.go` — full `Encode` implementation - `conf/models/google.json` — adds `text-embedding-004` embedding model ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-11 11:24:21 +08:00
BitToby	39a1773f7f	Go: implement ListModels in Volcengine driver (#14702 ) ### What problem does this PR solve? The VolcEngine Go driver in `internal/entity/models/volcengine.go` shipped with a `ListModels` stub that returned `volcengine, no such method`. `conf/models/volcengine.json` also did not declare a `models` URL suffix, so the model picker had nothing to call even if the method body were filled in. A tenant who configured Volcengine (Doubao / Ark) as a provider could not see the list of available endpoints from the RAGFlow UI. Several other Go drivers already implement `ListModels` against the OpenAI-compatible `/models` endpoint (deepseek, gitee, nvidia, openai, siliconflow), so the interface and pattern are well-established. This PR fills the gap. ### What this PR includes * `conf/models/volcengine.json`: declare the `models` URL suffix alongside the existing `chat`, `files`, and `embedding` entries. The Ark v3 API exposes `https://ark.cn-beijing.volces.com/api/v3/models`, so the suffix is just `models`. * `internal/entity/models/volcengine.go`: replace the `ListModels` stub with a real implementation. Reuses the package-level `DSModelList` / `DSModel` types that DeepSeek, Gitee, and SiliconFlow already use to parse the OpenAI-compatible models response shape. No factory change. No interface change. ### How the driver works * Resolves the region with a default fallback, the same way the other VolcEngine methods in this driver already do. * Builds the URL from `BaseURL[region] + URLSuffix.Models`, with `strings.TrimSuffix` on the base to keep the join robust. * Issues a `GET` with optional `Authorization: Bearer <api_key>` (the header is omitted when no key is configured, mirroring the existing NVIDIA `ListModels`). * Reads the response body once, surfaces a non-200 with the upstream status line plus body, and parses the JSON via the shared `DSModelList` type. * Returns the model id list in input order. When the response includes an `owned_by` field, the entry is rendered as `id@owned_by`, matching the convention used by the other Go drivers. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? * `go build ./internal/entity/models/...` returns exit 0. * `go vet ./internal/entity/models/...` is clean. * `gofmt -l internal/entity/models/volcengine.go` is clean. * The full method set on `VolcEngine` still matches the `ModelDriver` interface. * Endpoint reachability check: `GET https://ark.cn-beijing.volces.com/api/v3/models` returns `401 Unauthorized` without an API key, confirming the path exists and accepts Bearer authentication. * Pattern parity with DeepSeek, Gitee, NVIDIA, and SiliconFlow `ListModels`. Fixes #14701 Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-11 10:59:18 +08:00
Panda Dev	6bfe0f9a10	Go: implement Encode (embeddings) in OpenAI driver (#14630 ) ### What problem does this PR solve? The OpenAI Go driver landed in #14605 with chat, list models, and check connection. Encode was left as a stub that returns \`not implemented\`. \`conf/models/openai.json\` already lists three embedding models out of the box: - text-embedding-ada-002 - text-embedding-3-small - text-embedding-3-large So a tenant who picked one of these in the Go layer could not actually run an embedding call. This PR fills the gap. ### What this PR includes - \`conf/models/openai.json\`: add \`\"embedding\": \"embeddings\"\` under \`url_suffix\` so the driver can build the URL from config. This matches the \`URLSuffix.Embedding\` field used by other drivers (siliconflow, zhipu-ai). - \`internal/entity/models/openai.go\`: replace the Encode stub with a real implementation that POSTs to \`/v1/embeddings\`. Adds a small local response type \`openaiEmbeddingResponse\`. No factory change. No interface change. ### How the implementation works - Validate \`apiConfig\` and the API key, validate the model name. Use the existing \`baseURLForRegion\` helper so an unknown region fails fast with a clear error. - Wrap the request with \`context.WithTimeout(nonStreamCallTimeout)\` so the call has a clear deadline. Same pattern as \`ChatWithMessages\` and \`ListModels\` already use in this file. - Send all input texts in one request. The OpenAI API accepts the \`input\` field as an array. - Parse \`data[].embedding\` and copy each slice into a \`[][]float64\` indexed by \`data[].index\` so the output order matches the input order even if the API returns items in a different order. - Handle both \`float64\` and \`float32\` element types, the way the SiliconFlow driver does. - An empty input slice returns \`[][]float64{}\` with no HTTP call. - Non-200 responses propagate the upstream status line and body. - A final pass checks that every input slot got a vector. If any slot is still nil, return a clear error so the caller does not silently use a zero vector. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - \`go build ./internal/entity/models/...\` in a clean go 1.25 image (the go.mod minimum) returns exit 0. - The full method set on \`OpenAIModel\` still matches the \`ModelDriver\` interface. - Pattern parity with the existing SiliconFlow Encode implementation (\`internal/entity/models/siliconflow.go\`). Closes #14629 --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-10 10:31:37 +08:00
Jin Hai	048ec2fc5c	Go: fix siliconflow rerank issue (#14743 ) ### What problem does this PR solve? As title. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-09 20:45:53 +08:00
Jin Hai	779cd83862	Go: fix Baidu rerank issue (#14742 ) ### What problem does this PR solve? top_n is missing ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-09 20:05:57 +08:00
Haruko386	7931b693dc	Go: implement provider: Baidu (#14741 ) ### What problem does this PR solve? This PR completes the Baidu Qianfan provider integration in RAGFlow. The following functionalities are now supported: - [x] Chat / Think Chat / Stream Chat / Stream Think Chat - [x] Embedding - [x] Rerank - [x] Model listing - [x] Provider connection checking - [ ] Balance ----- Verified examples from the CLI: ```plaintext RAGFlow(user)> embed text 'what is rag' 'who are you' with 'embedding-3@test@zhipu-ai' dimension 16; +-----------+-------+ \| dimension \| index \| +-----------+-------+ \| 16 \| 0 \| \| 16 \| 1 \| +-----------+-------+ RAGFlow(user)> rerank query 'what is rag' document 'rag is retrieval augment generation' 'rag need llm' 'famous rag project includes ragflow' with 'qwen3-reranker-4b@test@baidu' top 2; +-------+---------------------+ \| index \| relevance_score \| +-------+---------------------+ \| 0 \| 0.974821150302887 \| \| 1 \| 0.14223189651966095 \| \| 2 \| 0.08632347732782364 \| +-------+---------------------+ RAGFlow(user)> think chat with 'deepseek-v3.2@test@baidu' message 'who r u' Thinking: Hmm, the user is asking for a simple introduction. This is straightforward – no need for overcomplication. I should give a clear, friendly response that covers my basic identity as an AI assistant, my purpose, and my capabilities. Keeping it concise but informative is key here. Mentioning my creator Anthropic adds credibility, and ending with an offer to help invites further interaction. No need for technical details unless the user asks later. Answer: Hello! I'm an AI assistant created by Anthropic, designed to help with a wide variety of tasks. You can think of me as a helpful digital companion—I can answer questions, assist with writing, help solve problems, provide explanations, and engage in conversation on many topics. I'm here to help with whatever you need! How can I assist you today? Time: 8.103902 RAGFlow(user)> stream think chat with 'deepseek-v3.2@test@baidu' message 'who r u' Thinking: mm, the user is asking "who r u" with casual spelling. This is a straightforward identity question. should give a clear, friendly introduction without overcomplicating it. Can start with my core function as an AI assistant, mention my creator, and briefly state my key capabilities. response should be welcoming and invite further interaction since this seems like an introductory question. Keeping it concise but covering the essentials: who I am, what I do, and how I can help. Answer: ! I am DeepSeek, an AI assistant created by DeepSeek Company. I'm designed to help answer questions, provide information, assist with various tasks, and engage in conversations on a wide range of topics. I'm here to assist you with whatever you need - whether it's answering questions, helping with analysis, writing, coding, or just having a friendly chat!Is there anything specific I can help you with today? 😊 Time: 7.219703 RAGFlow(user)> list supported models from 'baidu' 'test' +--------------------------------------+ \| model_name \| +--------------------------------------+ \| ernie-3.5-8k-preview \| \| ernie-4.0-8k \| \| ernie-4.0-turbo-8k-latest \| \| ernie-4.0-turbo-8k-preview \| \| ernie-4.0-8k-preview \| \| ernie-speed-pro-128k \| \| ernie-char-fiction-8k \| \| ernie-3.5-8k \| \| ernie-3.5-128k \| \| ernie-lite-pro-128k \| \| ernie-novel-8k \| \| ernie-4.0-turbo-8k \| \| ernie-4.0-turbo-128k \| \| ernie-4.0-8k-latest \| \| irag-1.0 \| \| ........... \| \| glm-5.1 \| \| ernie-image-turbo \| \| deepseek-v4-pro \| \| deepseek-v4-flash \| \| ernie-5.1 \| +--------------------------------------+ RAGFlow(user)> check instance 'test' from 'baidu' SUCCESS ``` Additionally, this PR fixes an incorrect error message typo: Before: ```go fmt.Errorf("API requestssss failed with status %d: %s : %s", ...) ``` After: ```go fmt.Errorf("API request failed with status %d: %s", ...) ``` This PR mainly improves provider compatibility, API completeness, and runtime stability. ### Type of change * [x] Bug Fix (non-breaking change which fixes an issue) * [x] New Feature (non-breaking change which adds functionality) * [x] Refactoring	2026-05-09 19:21:13 +08:00
Jin Hai	17d71e5d79	Go CLI: embed and rerank (#14735 ) ### What problem does this PR solve? ``` RAGFlow(user)> embed text 'what is rag' 'who are you' with 'embedding-3@test@zhipu-ai' dimension 16; +-----------+-------+ \| dimension \| index \| +-----------+-------+ \| 16 \| 0 \| \| 16 \| 1 \| +-----------+-------+ RAGFlow(user)> rerank query 'what is rag' document 'rag is retrieval augment generation' 'rag need llm' 'famous rag project includes ragflow' with 'rerank@test@zhipu-ai' top 2; +-------+-----------------+ \| index \| relevance_score \| +-------+-----------------+ \| 0 \| 1 \| \| 2 \| 0.99999976 \| +-------+-----------------+ ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-09 17:41:54 +08:00
akie	c11650bb4c	Fix IDOR: Add permission checks to file ancestry endpoints (#14725 ) Close #14292 ## Issue File ancestry endpoints return folder metadata without validating tenant permissions, allowing any authenticated user to query arbitrary `file_id` values across tenant boundaries. ## Affected Endpoints - `GET /v1/file/parent_folder?file_id={file_id}` - `GET /v1/file/all_parent_folder?file_id={file_id}` - `GET /api/v1/files/{id}/ancestors` ## Root Cause These endpoints skip the permission check that other file operations (Delete, Download, Move) perform. ## Expected Permission Check All file operations should follow this 3-step validation: - Check file.tenant_id - Check if user_id belongs to this tenant (via user_tenant join table) - Check KB permission type (team permission) Code reference: This is implemented in `checkFileTeamPermission()` and used by Delete/Download/Move, but missing from GetParentFolder/GetAllParentFolders. ## Reproduction ```bash # User B (tenant: BBB) accessing User A's file (tenant: AAA) curl -H "Authorization: Bearer USER_B_TOKEN" \ "http://localhost:9384/v1/file/parent_folder?file_id=AAA_FILE_123" # Result: Returns User A's folder metadata ❌ # Expected: "No authorization." ✅ Fix Pass userID from handler to service and call checkFileTeamPermission() — same as Download/Delete/Move handlers. --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-09 16:03:23 +08:00
Haruko386	ee0de58204	Go: implement provider: HuggingFace (#14722 ) ### What problem does this PR solve? Implement `HuggingFace` provider ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-09 13:36:03 +08:00
Jin Hai	b6abce50b1	Go: Admin list ingestion tasks (#14695 ) ### What problem does this PR solve? ``` RAGFlow(admin)> list tasks; +-------------+------------------+----------------------------------+-------------+-----------+----------------------------------+----------+----------------------+-------------+-----------+---------+ \| chunk_count \| digest \| document_id \| duration \| from_page \| id \| priority \| progress \| retry_count \| task_type \| to_page \| +-------------+------------------+----------------------------------+-------------+-----------+----------------------------------+----------+----------------------+-------------+-----------+---------+ \| 16 \| 8a0016a0dc3cbdbb \| f6aa38bb4ad111f1ba6338a74640adcc \| 1511.156966 \| 0 \| f91e4f104ad111f1aaaf38a74640adcc \| 0 \| 1 \| 1 \| \| 12 \| +-------------+------------------+----------------------------------+-------------+-----------+----------------------------------+----------+----------------------+-------------+-----------+---------+ ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-09 10:03:23 +08:00
Jin Hai	5e96c5cae6	Fix go cli: search on datasets (#14692 ) ### What problem does this PR solve? As title ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-08 20:25:14 +08:00
Joseff	2ad854c586	Go: implement Rerank in Aliyun driver (#14676 ) ### What problem does this PR solve? The Aliyun Go driver has a stub `Rerank` method that always returns `"Aliyun, Rerank not implemented"`. DashScope exposes an OpenAI-compatible rerank endpoint (`compatible-mode/v1/rerank`) and hosts dedicated bilingual rerankers (`gte-rerank-v2`, `gte-rerank`) that are a natural pairing with the embedding models already in `aliyun.json`. Without this, Aliyun users cannot use reranking within RAGFlow. Closes #14675 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-08 20:21:04 +08:00
Jin Hai	ce2ec86b5e	Go: fix CLI logout command (#14672 ) ### What problem does this PR solve? ``` RAGFlow(user)> logout; SUCCESS ``` ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-08 16:47:25 +08:00
Haruko386	94f82acd03	Fix(Go): prevent global state pollution in local model connection check (#14669 ) ### What problem does this PR solve? 1. Fix Global State Pollution in Local Providers (Critical Bug): - Resolved a severe concurrency and architecture issue in `model_service.go`. Previously, `ListSupportedModels` would permanently overwrite the global provider singleton with a localized URL instance (`driver.NewInstance`). This caused cross-request contamination in multi-tenant environments. - Fixed `CheckProviderConnection` for local models (LM Studio, vLLM, Ollama). It now properly creates a localized driver copy and injects the `base_url` before testing the connection, entirely eliminating the false-positive `missing base URL` error without polluting the global state. 2. Implement `VolcEngine` Embeddings: - Fully implemented the `Encode` method for the `volcengine` provider, enabling text embedding capabilities for VolcEngine models. 3. Enhance Region Validation in `SiliconFlow`: - Added a strict empty string check (`*apiConfig.Region != ""`) alongside the existing `nil` check when parsing regions. This ensures that if an empty string is passed, the system safely falls back to the `"default"` region, preventing malformed URL requests and `unsupported protocol scheme` errors. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2026-05-08 15:54:27 +08:00
Jin Hai	ee5ae6f1a4	Go CLI: fix register user (#14665 ) ### What problem does this PR solve? 1. Update API URL 2. Add password encryption ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-08 15:53:06 +08:00
Panda Dev	a82ae4a991	Go: implement Encode (embeddings) in Aliyun driver (#14647 ) ### What problem does this PR solve? The Aliyun Go driver shipped with a stub \`Encode\` method that returned \`no such method\`, even though \`conf/models/aliyun.json\` already wires the OpenAI-compatible embeddings URL suffix at \`compatible-mode/v1/embeddings\`. The same config also did not list any embedding models, so the picker had nothing to select. So an Aliyun tenant who wanted to use Tongyi text-embedding-v3 or v4 in the Go layer could not, even though the upstream endpoint is public and uses the standard \`POST /v1/embeddings\` shape that the SiliconFlow and ZhipuAI drivers already support. This PR fills the gap. ### What this PR includes - \`conf/models/aliyun.json\`: add \`text-embedding-v4\` and \`text-embedding-v3\` to the \`models\` array. - \`internal/entity/models/aliyun.go\`: replace the \`Encode\` stub with a real implementation. Adds a small local response type that matches the OpenAI-compatible shape. No factory change. No interface change. ### How the driver works - Validate \`apiConfig\` and the API key, validate the model name, resolve the region with a default fallback, build the URL from \`BaseURL[region] + URLSuffix.Embedding\`. - Send all input texts in one request as the \`input\` array, the same OpenAI-compatible shape the SiliconFlow \`Encode\` uses. - Parse \`data[].embedding\` and copy each slice into a \`[][]float64\` indexed by \`data[].index\` so the output order matches the input order even if the API returns items in a different order. - Handle both \`float64\` and \`float32\` element types. - Empty input returns \`[][]float64{}\` with no HTTP call. - Non-200 responses propagate the upstream status line and body. - A final pass checks every input slot got a vector and returns a clear error if any slot is still nil. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - \`go build ./internal/entity/models/...\` in a clean go 1.25 image returns exit 0. - The full method set on \`AliyunModel\` still matches the \`ModelDriver\` interface. - Pattern parity with the existing SiliconFlow Encode implementation. Closes #14646 --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-08 13:58:25 +08:00
Haruko386	d13a240dc0	Go: implement remaining interface for OpenRouter (#14657 ) ### What problem does this PR solve? 1. implement `rerank`, `embedding`, `balance`, `checkConnet` method for `OpenRouter` 2. delete `chat` method in `internal/entity/models/volcengine.go` ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-05-08 13:56:45 +08:00
Jin Hai	731c887ba0	Fix cli login (#14658 ) ### What problem does this PR solve? Since API is updated, CLI login failed. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-08 13:56:19 +08:00
Panda Dev	d8d49df35e	Go: implement Rerank in Gitee AI driver (#14656 ) ### What problem does this PR solve? The Gitee AI Go driver shipped with a stub \`Rerank\` method that returned \`Rerank not implemented\`, even though \`conf/models/gitee.json\` already wires the rerank URL suffix at \`\"rerank\": \"rerank\"\`. The same config did not list any rerank model, so the picker had nothing to select. So a Gitee tenant could not use BAAI/bge-reranker-v2-m3 as a reranker through the Go layer today, even though the infrastructure was one config entry and one method body away. ### What this PR includes - \`conf/models/gitee.json\`: add \`BAAI/bge-reranker-v2-m3\` to the \`models\` array. - \`internal/entity/models/gitee.go\`: replace the \`Rerank\` stub with a real implementation. Adds two small local types that match the OpenAI-compatible \`/rerank\` shape already used by the SiliconFlow and ZhipuAI drivers. No factory change. No interface change. ### How the driver works - Validate \`apiConfig\` and the API key, validate the model name, resolve the region with a default fallback, build the URL from \`BaseURL[region] + URLSuffix.Rerank\`. - Use a per-call \`context.WithTimeout(30s)\` and \`http.NewRequestWithContext\`, matching the pattern the recently merged Aliyun Encode and the OpenAI driver already use. - Send \`{model, query, documents, top_n, return_documents:false}\` in the body. - Parse \`results[].relevance_score\` and copy each score into the output slice indexed by \`results[].index\`, so the output order matches the input order even if the API returns items in a different order. - Empty input returns \`[]float64{}\` with no HTTP call. - An out-of-range result index returns a clear error rather than silently skipping the entry. - Non-200 responses propagate the upstream status line and body. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - \`go build ./internal/entity/models/...\` in a clean go 1.25 image returns exit 0. - The full method set on \`GiteeModel\` still matches the \`ModelDriver\` interface. - Pattern parity with the existing SiliconFlow Rerank and the recently merged ZhipuAI Rerank (#14608). Closes #14655	2026-05-08 13:08:22 +08:00
Panda Dev	c7ddc8c039	fix(go): implement ListModels and CheckConnection in NVIDIA driver (#14636 ) ### What problem does this PR solve? The NVIDIA Go driver added in #14623 has a real chat path, but \`ListModels\` and \`CheckConnection\` are stubs that always return \`no such method\`. So: - The model picker cannot auto-populate available NVIDIA NIM model ids. Users have to type the full id by hand (e.g. \`abacusai/dracarys-llama-3.1-70b-instruct\`). - The "Check connection" button always fails for NVIDIA, even when the base URL is reachable and the API key is accepted. NVIDIA NIM is OpenAI-compatible. \`/v1/models\` works with the same Bearer token used for chat. The \`conf/models/nvidia.json\` file already wires the \`models\` url_suffix, so no config change is needed. ### What this PR includes - \`internal/entity/models/nvidia.go\`: - \`ListModels\` now calls \`GET ${BaseURL}/${URLSuffix.Models}\`, parses \`response.data[*].id\`, and returns the list. Same shape as the moonshot, xai, and openai drivers. - \`CheckConnection\` now calls \`ListModels\` and returns its error. Same pattern xai, moonshot, deepseek, aliyun, and gitee already use. \`Balance\`, \`Encode\`, and \`Rerank\` are still stubs in this PR and can be added in follow-ups. No JSON change. No factory change. No interface change. ### How the implementation works - Region resolution falls back to \`default\` when the supplied region is unknown, so a stray region value does not break a valid request. - The Authorization header is only set when \`apiConfig\` and \`ApiKey\` are non-nil and non-empty. This avoids a nil-pointer dereference and lets self-hosted NIM deployments without a key still work. - Non-200 responses propagate the upstream status line and body so the user sees a real error message. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### How was this tested? - \`go build ./internal/entity/models/...\` in a clean go 1.25 image (the go.mod minimum) returns exit 0. - The full method set on \`NvidiaModel\` still matches the \`ModelDriver\` interface. - Pattern parity with the existing xai, moonshot, deepseek, aliyun, gitee, and openai drivers. Closes #14635	2026-05-08 12:04:28 +08:00
Panda Dev	e729eced45	Go: implement Balance in DeepSeek driver (#14632 ) Closes #14631 ### What problem does this PR solve? The DeepSeek Go driver shipped with a stub \`Balance\` method that returned \`no such method\`, even though DeepSeek exposes a public \`GET /user/balance\` endpoint that works with the same Bearer token used for chat. So the "Balance" panel in the model provider UI always shows an error for DeepSeek tenants, while it already works for Moonshot and Gitee. This PR fills the gap. ### What this PR includes - \`conf/models/deepseek.json\`: add \`\"balance\": \"user/balance\"\` under \`url_suffix\` so the driver can build the URL from config the same way the other endpoints do. - \`internal/entity/models/deepseek.go\`: replace the \`Balance\` stub with a real implementation. Adds a small local response type \`deepseekBalanceResponse\` that matches the upstream shape. No factory change. No interface change. ### How the driver works - Validate \`apiConfig\` and the API key, resolve the region (with a \`default\` fallback), and build the URL from \`BaseURL[region] + URLSuffix.Balance\`. - GET the URL with \`Authorization: Bearer <api_key>\`. - Parse the upstream response: \`\`\`json { \"is_available\": true, \"balance_infos\": [ {\"currency\": \"USD\", \"total_balance\": \"10.00\", ...}, {\"currency\": \"CNY\", \"total_balance\": \"70.00\", ...} ] } \`\`\` \`total_balance\` is a string in the upstream API, so the driver parses it with \`strconv.ParseFloat\`. - Return the first balance entry as \`{\"balance\": <float>, \"currency\": <string>}\`, the same shape the Moonshot driver returns. The UI can render it with no provider-specific code. ### Edge cases - Missing or empty API key returns a clear local error before any HTTP call. - Empty \`balance_infos\` returns a clear \"no balance info in response\" error rather than a zero-value silent success. - Non-numeric \`total_balance\` returns a clear parse error. - Non-200 responses propagate the upstream status line and body so the user can see why the call failed. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - \`go build ./internal/entity/models/...\` in a clean go 1.25 image (the go.mod minimum) returns exit 0. - The full method set on \`DeepSeekModel\` still matches the \`ModelDriver\` interface. - Pattern parity with the existing Moonshot and Gitee Balance implementations.	2026-05-08 12:03:39 +08:00
Haruko386	a377512110	Go: implement provider: OpenRouter (#14652 ) ### What problem does this PR solve? 1. Implement `OpenRouter` Provider: Fully support OpenRouter AI models (e.g., `gemma`, `minimax`). Includes robust handling of Server-Sent Events (SSE) streams, error event interception, and proper parsing of both `reasoning_content` and standard `content`. 2. Fix BaseURL Resolution Bug: Fixed a critical edge case in region configuration parsing. Added a strict empty string check (`*apiConfig.Region != ""`) alongside the `nil` check. This ensures that if the UI passes an empty string, the system correctly falls back to the `"default"` region, preventing `unsupported protocol scheme ""` errors during HTTP requests. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2026-05-08 12:02:37 +08:00
Panda Dev	a86e0ca0ca	Go: implement Balance in SiliconFlow driver (#14643 ) ### What problem does this PR solve? The SiliconFlow Go driver shipped with a stub \`Balance\` method that returned \`no such method\`, even though SiliconFlow exposes a public \`GET /v1/user/info\` endpoint that returns the account balance per currency. So the "Balance" panel in the model provider UI always shows an error for SiliconFlow tenants, while it already works for Moonshot and Gitee. This PR fills the gap. ### What this PR includes - \`conf/models/siliconflow.json\`: add \`\"balance\": \"user/info\"\` under \`url_suffix\` so the driver builds the URL from config. - \`internal/entity/models/siliconflow.go\`: replace the \`Balance\` stub with a real implementation. Adds a small local response type that matches the upstream shape. No factory change. No interface change. ### How the driver works - Validate \`apiConfig\` and the API key, resolve the region with a default fallback, and build the URL from \`BaseURL[region] + URLSuffix.Balance\`. - GET the URL with \`Authorization: Bearer <api_key>\`. - Parse the upstream response. SiliconFlow returns balance fields as strings, so the driver parses them with \`strconv.ParseFloat\`. It prefers \`totalBalance\` over \`balance\` when both are present. - Return \`{\"balance\": <float>, \"currency\": \"CNY\"}\`, the same shape the Moonshot driver returns. The UI can render it with no provider-specific code. ### Edge cases - Missing or empty API key returns a clear local error before any HTTP call. - An unknown region falls back to the default base URL. - Empty \`balance\` and \`totalBalance\` returns a clear "no balance info in response" error rather than a zero-value silent success. - Non-numeric balance string returns a clear parse error. - Non-200 responses propagate the upstream status line and body. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - \`go build ./internal/entity/models/...\` in a clean go 1.25 image returns exit 0. - The full method set on \`SiliconflowModel\` still matches the \`ModelDriver\` interface. - Pattern parity with the existing Moonshot and Gitee Balance implementations. Closes #14642	2026-05-08 12:01:10 +08:00
Panda Dev	2fd8cdc3cc	fix(go): wire CheckConnection to ListModels in ollama, lm-studio, and vllm (#14614 ) ### What problem does this PR solve? Three Go drivers had `CheckConnection` returning a hardcoded `no such method` error, even though each one already has a working `ListModels` that hits the configured base URL with the configured API key. So the "Check connection" button in the model provider UI always failed for these three providers, even when the underlying setup was fine. Affected drivers: - `internal/entity/models/ollama.go` - `internal/entity/models/lmstudio.go` - `internal/entity/models/vllm.go` This is a real user-facing gap because Ollama and LM Studio are two of the most popular local LLM runners, and vLLM is widely used for self-hosted deployments. ### What this PR includes For each of the three drivers, replace the stub with a small implementation that calls `ListModels` and returns its error: ```go func (o OllamaModel) CheckConnection(apiConfig APIConfig) error { _, err := o.ListModels(apiConfig) return err } ``` This is the exact pattern that xai, moonshot, deepseek, aliyun, and gitee already use for the same method. No JSON change. No factory change. No interface change. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### How was this tested? - `go build ./internal/entity/models/...` in a clean go 1.25 image (the go.mod minimum) returns exit 0. - The full ModelDriver interface still resolves on each driver (NewInstance, Name, ChatWithMessages, ChatStreamlyWithSender, Encode, Rerank, ListModels, Balance, CheckConnection). - Pattern parity with the existing xai, moonshot, deepseek, aliyun, and gitee CheckConnection methods. Closes #14609	2026-05-08 12:00:10 +08:00
Panda Dev	bb10b83e61	Go: implement Rerank in ZhipuAI driver (#14608 ) ### What problem does this PR solve? The ZhipuAI Go driver had a stub Rerank method that returned "not implemented", even though conf/models/zhipu-ai.json already ships glm-rerank as a rerank model and the rerank URL suffix is already wired in url_suffix: ```json "url_suffix": { ... "rerank": "rerank" }, "models": [ {"name": "glm-rerank", "model_types": ["rerank"]}, ... ] ``` So the config was ready but the driver was not. A tenant who picked glm-rerank in the Go layer could not actually run a rerank call. This PR fills the gap so the listed model works end to end. ### What this PR includes - `internal/entity/models/zhipu-ai.go`: real implementation of `ZhipuAIModel.Rerank`, plus two small local types (`zhipuRerankRequest`, `zhipuRerankResponse`) that mirror the standard OpenAI-compatible rerank shape used by SiliconFlow. No factory change. No JSON change. No interface change. ### How the driver works - POST to `${BaseURL}/${URLSuffix.Rerank}` (resolves to `https://open.bigmodel.cn/api/paas/v4/rerank` with the default config), reusing the existing httpClient on the driver. - Validate apiConfig and the API key, validate the model name, and resolve the region. Return a clear local error before any HTTP call when something is missing. - Send `{model, query, documents, top_n, return_documents: false}` in the body, the same shape the SiliconFlow driver already uses. - Walk `results[].relevance_score` and copy each score into the output slice indexed by `results[].index`, so the output order matches the input order even if the API returns results in a different order. - Empty `texts` input returns an empty `[]float64` with no HTTP call. - Non-200 responses propagate the upstream status line and body. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - `go build ./internal/entity/models/...` in a clean go 1.25 image (the go.mod minimum) returns exit 0. - The full method set on `ZhipuAIModel` still matches the `ModelDriver` interface (NewInstance, Name, ChatWithMessages, ChatStreamlyWithSender, Encode, ListModels, Balance, CheckConnection, Rerank). - Pattern parity with the existing SiliconFlow Rerank implementation (`internal/entity/models/siliconflow.go`). Closes #14607	2026-05-07 17:56:30 +08:00
Jin Hai	94324afee9	Go: fix auth issue in hybrid mode (#14611 ) ### What problem does this PR solve? Since secret key get and set logic is updated, the go server also need to update. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-07 17:14:22 +08:00
Haruko386	078ea3bf4a	Go: implement provider: Nvidia (#14623 ) ### What problem does this PR solve? 1. Implement `Nvidia` Provider: Fully support NVIDIA NIM APIs with robust parameter handling (including the `thinking` parameter) and safe URL merging in `NewInstance`. 2. Fix Misleading CLI Errors: Corrected a bug in `common_command.go` where failed chat requests inaccurately reported `failed to list instance models`. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2026-05-07 14:17:57 +08:00
Panda Dev	b8b741555f	Go: implement provider: OpenAI (#14605 ) ### What problem does this PR solve? Add a Go driver for OpenAI (GPT models). The config file conf/models/openai.json has been in the repo for a while with the full GPT-5 model list, but internal/entity/models/factory.go had no case for "openai". So any tenant that configured OpenAI as a model provider in the Go layer fell through to the default branch and got the dummy driver. Chat, list models, and check connection all returned dummy responses instead of reaching the API. OpenAI is the most commonly requested provider and the JSON config already ships with the repo, so this gap is high impact even though the JSON has been there for some time. ### What this PR includes - New file internal/entity/models/openai.go with an OpenAIModel that implements the ModelDriver interface. - factory.go: route the "openai" provider name to NewOpenAIModel. - conf/models/openai.json: add "models": "models" under url_suffix so ListModels can hit /v1/models with no hardcoded fallback. ### How the driver works - OpenAI exposes the canonical OpenAI-compatible API at https://api.openai.com/v1. - ChatWithMessages and ChatStreamlyWithSender post to /chat/completions in the same shape the moonshot, vllm, and xai drivers use. - ListModels and CheckConnection call /models to list available ids and confirm the API key works. - reasoning_content is passed through for the o-series and other reasoning models, in both the non-stream and stream paths. - Encode (embeddings) is left as "not implemented" for now, the same way the other recent provider drivers do it. Rerank and Balance are not part of OpenAI's public API surface in this layer and return a clear "not implemented" or "no such method" error. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - go build ./internal/entity/models/... in a clean go 1.25 image (the go.mod minimum) returns exit 0 with no errors. - Method set of OpenAIModel matches the ModelDriver interface: NewInstance, Name, ChatWithMessages, ChatStreamlyWithSender, Encode, Rerank, ListModels, Balance, CheckConnection. - Pattern parity with the merged moonshot (#14433), volcengine (#14460), minimax (#14478), vllm (#14532), xai (#14550), and lm-studio (#14586) PRs. Closes #14604	2026-05-07 13:09:51 +08:00
Haruko386	dd7a0ce1d3	Go: implement provider: lm-studio (#14586 ) ### What problem does this PR solve? implement `lm-studio` provider ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring	2026-05-06 19:23:11 +08:00
Jack Storment	c2ad672c09	Go: implement provider: xAI (#14550 ) Closes #14552 ### What problem does this PR solve? Add a Go driver for xAI (Grok models). The config file conf/models/xai.json has been in the repo since the early Go provider work, but internal/entity/models/factory.go had no case for "xai". So any xAI request fell through to the dummy driver and never reached the API. This PR adds the missing driver and wires it up. ### What this PR includes - New file internal/entity/models/xai.go with an XAIModel that implements the ModelDriver interface. - factory.go: route the "xai" provider name to NewXAIModel. ### How the driver works - xAI exposes an OpenAI-compatible API at https://api.x.ai/v1. - ChatWithMessages and ChatStreamlyWithSender post to /chat/completions in the same shape the moonshot and deepseek drivers use. - ListModels and CheckConnection call /models to confirm the API key works and to list available model ids. - reasoning_content is passed through for grok-3-mini and other xAI reasoning models, both in the non-stream and stream paths. - Encode, Rerank, and Balance are not part of the public xAI API at the moment, so they return a clear "not implemented" or "no such method" error. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - go build ./internal/entity/models/... in a clean go 1.25 image (the go.mod minimum) returns exit 0 with no errors. - Method set of XAIModel matches the ModelDriver interface: NewInstance, Name, ChatWithMessages, ChatStreamlyWithSender, Encode, Rerank, ListModels, Balance, CheckConnection. - Pattern parity with the merged moonshot (#14433), volcengine (#14460), minimax (#14478), and vllm (#14532) PRs. --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-06 12:16:37 +08:00
Haruko386	cd54c08e84	Go: implement provider: Ollama (#14580 ) ### What problem does this PR solve? implement `Ollama` provider ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-06 12:03:58 +08:00
qinling0210	7335916868	Use GetChatModel, remove duplicate functions in model_service.go (#14546 ) ### What problem does this PR solve? Use GetChatModel, remove duplicate functions in model_service.go ### Type of change - [x] Refactoring Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-06 11:33:32 +08:00
Jin Hai	aa57b5bd8b	Go: move logger to common module (#14545 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-06 10:41:58 +08:00
Jin Hai	3a51c27a75	Go: CLI chat with text, image, video (#14573 ) ### What problem does this PR solve? ``` RAGFlow(user)> chat with 'glm-4.6v-flash@test@zhipu-ai' message 'What are the pics talk about?' image 'https://cdn.bigmodel.cn/static/logo/register.png' 'https://cdn.bigmodel.cn/static/logo/api-key.png' Answer: The first picture shows a login/register modal with options for phone number login, account login, and WeChat QR code login, along with a prompt for new users to get a 20 million tokens experience package. The second picture displays the API keys management page of a platform, including a warning about API key security and a table listing existing API keys with details like creation time and usage history. Time: 31.600545 RAGFlow(user)> chat with 'glm-4.6v-flash@test@zhipu-ai' message 'What are the video talk about?' video 'https://cdn.bigmodel.cn/agent-demos/lark/113123.mov' Answer: Based on the sequence of frames provided, the video is a demonstration of a web search and navigation process. 1. The video starts with a blank Google search page. 2. The user types "智谱" (which is the Chinese name for the company Zhipu AI) into the search box. 3. The search is initiated and the page shows "About 0 results". 4. The search results load, showing information about Zhipu AI, including its website. 5. The user clicks on the main website link (www.zhipuai.cn). 6. The video ends by showing the homepage of Zhipu AI's website, titled "Z.ai GLM Large Model Open Platform". In summary, the video is about searching for the company "智谱" (Zhipu AI) on Google and then navigating to its official website. Time: 76.582520 ``` ### Type of change - [x] New Feature (non-breaking change which adds functionality) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-05 18:14:39 +08:00
qinling0210	12af73f2ca	Support stream for multimodal chat (#14537 ) ### What problem does this PR solve? Support stream for multimodal chat ### Type of change - [x] Refactoring	2026-04-30 19:33:57 +08:00
Haruko386	93f3b90121	Go: implement provider: Vllm (#14532 ) ### What problem does this PR solve? Implement the vLLM model provider for RAGFlow to fully support local and self-hosted open-source models (e.g., Qwen, GLM, Llama) via the vLLM framework, and fix several critical bugs related to model instance management and API requests. Key changes and fixes: 1. Added Standard vLLM Provider (`vllm.go`, `vllm.json`): - Implemented `VllmModel` driver strictly adhering to the OpenAI API specification. - Removed hardcoded and dangerous routing logic (e.g., forcing `AsyncChat` for Qwen/GLM prefixes), ensuring standard `/v1/chat/completions` compatibility. - Refactored `ListModels` to use safe JSON parsing (resolving nil pointer panics) and standard `GET` requests without bodies. - Added `APIConfig.Region` fallback logic to prevent empty `base_url` fetching when checking models. 2. Fixed `ChatToModelStreamWithSender` Bug (`model_service.go`): - Resolved the `model is disabled` error when streaming chat with local database-saved models. - Added the missing `if modelInfo.Status == "active"` block to correctly invoke `NewInstance` and inject the dynamic `base_url` into the provider driver before starting the SSE stream. 3. Fixed `ListSupportedModels` Bug (`model_service.go`): - Added dynamic `NewInstance` injection for `base_url`. Previously, the list models function used the static JSON config without injecting user-configured dynamic URLs from the database, resulting in an `unsupported protocol scheme ""` error. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2026-04-30 16:30:14 +08:00

1 2 3 4

152 Commits