ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-05-28 03:33:05 +08:00

Author	SHA1	Message	Date
Jin Hai	9b3850339b	Go: add development guide document (#14785 ) ### What problem does this PR solve? As the title suggests. ### Type of change - [x] Documentation Update Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-11 17:20:41 +08:00
tmimmanuel	663fc1d42c	fix(opensearch): implement doc-meta dispatch surface on OSConnection (#14577 ) ### What problem does this PR solve? Fixes #14570. On OpenSearch backends (`DOC_ENGINE=opensearch`) every document-metadata write failed with `'OSConnection' object has no attribute 'create_doc_meta_idx'`, so both `PATCH /api/v1/datasets/{ds}/documents/{doc}` with `meta_fields` and `POST /api/v1/datasets/{ds}/metadata/update` were unusable while every other document operation (retrieval, parsing, name update, chunk management) worked correctly on the same OpenSearch cluster. The bug runs deeper than the missing method name in the error message suggests. `DocMetadataService` also reached into `settings.docStoreConn.es.*` directly for the index refresh, the scripted partial update, and the count call, which means that even after adding `create_doc_meta_idx` to `OSConnection` the very next call in the same metadata flow would still raise `AttributeError` because `OSConnection` exposes `self.os` rather than `self.es`. Fixing only the reported symptom would have moved the failure one line down without restoring the feature. This PR adds a uniform document-metadata dispatch surface to both connection classes so they present the same abstract API, and routes the service layer through that surface via `getattr` guards instead of poking at backend-specific attributes. The four new methods on `OSConnection` and `ESConnectionBase` are `create_doc_meta_idx`, `refresh_idx`, `count_idx`, and `replace_meta_fields`. `OSConnection.create_doc_meta_idx` reuses the existing `conf/doc_meta_es_mapping.json` schema in the OpenSearch `body=` form because OpenSearch and Elasticsearch share the same index-creation payload, and `replace_meta_fields` emits a full scripted assignment (`ctx._source.meta_fields = params.meta_fields`) on both backends so removed keys actually disappear instead of being preserved by deep-merge semantics. The `getattr`-guarded dispatch in `DocMetadataService` keeps the existing fall-through paths intact for Infinity and OceanBase, which continue to rely on their search-based count fallback and on the delete-then-insert metadata replacement they used before, so this change is strictly additive for those two backends. Verification: `pytest test/unit_test/rag/utils/test_opensearch_doc_meta.py` runs 16 new unit tests that pass locally and pin the `OSConnection` dispatch surface, the `create_doc_meta_idx` short-circuit when the index already exists, the mapping-file payload routing, the `IndicesClient.create` failure path, the `refresh_idx` and `count_idx` success and error sentinels, and the full-assignment script emitted by `replace_meta_fields`. The test module stubs `common.settings` and `rag.nlp` at import time so the suite runs without the heavy backend SDKs that the rest of the repository pulls in transitively. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: tmimmanuel <tmimmanuel@users.noreply.github.com>	2026-05-11 17:04:28 +08:00
box4wangjing	292b0b8bce	chore: fix some comments to improve readability (#14756 ) ### What problem does this PR solve? fix some comments to improve readability ### Type of change - [x] Documentation Update --------- Signed-off-by: box4wangjing <box4wangjing@outlook.com>	2026-05-11 16:48:48 +08:00
Octopus	c58906b69e	fix: OCR.detect() returns truthy None-tuple causing NoneType subscript crash (#13951 ) Fixes #13851 ## Problem `OCR.detect()` in `deepdoc/vision/ocr.py` returns `None, None, time_dict` (a truthy 3-tuple) when the text detector fails or receives a `None` image. However, the caller in `pdf_parser.py:__ocr()` checks: ```python bxs = self.ocr.detect(np.array(img), device_id) if not bxs: # False! (None, None, time_dict) is a non-empty tuple → truthy self.boxes.append([]) return bxs = [(line[0], line[1][0]) for line in bxs] # iterates (None, None, time_dict) # line = None → None[0] → TypeError: 'NoneType' object is not subscriptable ``` This causes the `NoneType object is not subscriptable` error that appears after "OCR started" in the chunking pipeline when using PDF + General parser. ## Solution Simplified `OCR.detect()` to return `None` (falsy) instead of `None, None, time_dict` on failure. The `time_dict` was unused by the only caller of this method. The early-return guard `if not bxs:` in `pdf_parser.py` then correctly catches it. ## Testing - The method's only caller (`pdf_parser.py:__ocr`) already has a `if not bxs:` guard that handles the `None` return correctly. - No other callers of `OCR.detect()` exist in the codebase. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * Refactor * Modified OCR detection function return behavior to streamline output. The function now returns detection results only, without timing metadata. Error cases now return `None` instead of empty tuple values. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2026-05-11 16:19:28 +08:00
Nie WeiYang	1e80be77a2	fix(web): fix incomplete Docx preview in citation reference (#14122 ) This PR fixes a UI issue where the .docx document preview was displayed incompletely when clicking on a citation/reference link during a knowledge base conversation. ### What problem does this PR solve? The Issue: In the chat interface, when a user clicks the source citation at the end of an answer, the DocPreviewer opens. However, for .docx files, if the content exceeded the window height, it was truncated and unscrollable, preventing users from reading the full referenced text. Changes: web/src/components/document-preview/doc-preview.tsx: Added the overflow-auto Tailwind class to the DocPreviewer root container to ensure scrollbars appear automatically when content overflows. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: nie.weiyang <nie.weiyang@embedway.com>	2026-05-11 16:17:48 +08:00
as-ondewo	6fb8c31c22	Fix: Document parse status set to DONE before chunks are retrievable (#13352 ) ### What problem does this PR solve? The document parse status was set to DONE before the document chunks were actually retrievable from Elasticsearch/Opensearch because it did not wait for the index refresh. This meant that it was possible that the document parse status returned by the API was DONE but when trying to retrieve chunks there were none. Since the index refreshes every 1 second this was quite likely to happen when wait for document parsing by polling with a short interval and then immediately trying to retrieve chunks once the status was DONE. I fixed this bug and added a test case that would have caught it. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-11 16:04:08 +08:00
Sank	592dba1489	Refact: Added a private helper _visibility_and_status_filter (#13627 ) ### What problem does this PR solve? Added a private helper _visibility_and_status_filter(joined_tenant_ids, user_id) that returns the Peewee condition: visible to user (team or own) and status is VALID. ### Type of change - [x] Refactoring --------- Co-authored-by: Serobabov Aleksandr <40SerobabovAS@region.cbr.ru> Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-05-11 15:21:41 +08:00
tmimmanuel	6ce014c23b	fix: offload blocking DB/Redis calls to thread pool for high-concurrency support (#13825 ) (#13941 ) ### What problem does this PR solve? Addresses event-loop blocking under high concurrency reported in #13825. When multiple requests hit the API simultaneously, synchronous DB/Redis calls block the async event loop, preventing Quart from handling other requests and causing cascading 502/504 timeouts. This PR wraps all remaining blocking DB/Redis calls in `canvas_app.py`, `chat_api.py`, `session.py`, and `canvas_service.py` with `await thread_pool_exec()` - Offload all synchronous `Service.`, `REDIS_CONN.`, and `APIToken.query` calls to the thread pool - Convert sync endpoint handlers (`list_chats`, `get_chat`, `templates`, `sessions`, etc.) to `async def` - Convert sync helper functions (`_ensure_owned_chat`, `_validate_llm_id`, `_validate_dataset_ids`, etc.) to async - no duplicate sync/async pairs - Wrap `CanvasReplicaService` Redis IO calls (`bootstrap`, `replace_for_set`, `commit_after_run`) - Use `asyncio.gather()` for concurrent file uploads and chat response building Note: This fixes the code-level event-loop blocking, which is a prerequisite for handling concurrent requests. For the full "30 concurrent requests without 502/504" goal described in the issue, users should also tune deployment config: - `WS=4` or higher (HTTP worker processes, default 1) - `MAX_CONCURRENT_CHATS=50` (default 10) - `SANDBOX_EXECUTOR_MANAGER_POOL_SIZE` for workflow-heavy workloads ### Performance verification Reviewer asked for a before-vs-after comparison ([comment](https://github.com/infiniflow/ragflow/pull/13941#issuecomment-4393667231)). I built a self-contained microbenchmark that reproduces the exact failure mode this PR targets: an async handler that performs blocking DB/Redis-style calls (50 ms each, 3 per request, 30 concurrent requests) is run twice — once with the pre-PR pattern (sync call directly inside the async handler) and once with the post-PR pattern (`await thread_pool_exec(...)`). The benchmark imports nothing from RAGFlow except `thread_pool_exec` itself, so it is hermetic and reproducible (`THREAD_POOL_MAX_WORKERS=128`, Python 3.13.12). Throughput — wall-clock for 30 concurrent requests (lower is better) \| flavour \| wall(s) \| p50(s) \| p95(s) \| max(s) \| \|---\|---:\|---:\|---:\|---:\| \| before \| 4.986 \| 0.158 \| 0.207 \| 0.269 \| \| after \| 0.248 \| 0.181 \| 0.230 \| 0.231 \| The pre-PR handler serializes the entire load on the event-loop thread, so 30 × 3 × 50 ms ≈ 4.5 s shows up as the wall time. The post-PR handler parallelizes the blocking work across the thread pool and finishes the same load in 248 ms — a ~20× speedup on this workload. Event-loop responsiveness — latency of an unrelated probe coroutine while the 30 slow requests are running (lower is better) \| flavour \| samples \| probe p50 (ms) \| probe p95 (ms) \| probe max (ms) \| \|---\|---:\|---:\|---:\|---:\| \| before \| 1 \| 5442.26 \| 5442.26 \| 5442.26 \| \| after \| 28 \| 0.88 \| 11.53 \| 98.02 \| This is the metric that maps directly to "the API still answers other requests while one is busy". A 5 ms-interval probe was scheduled while the 30 slow handlers ran. With the pre-PR code the event loop was frozen for the entire duration of the blocking work, so only one probe sample was ever picked up and it waited 5,442 ms. After the PR, 28 probe samples landed with p50 0.88 ms / p95 11.53 ms, meaning unrelated requests are no longer starved by the slow ones. That is the regression mode behind the cascading 502/504s reported in #13825. <details> <summary>Raw benchmark output</summary> ``` config: 30 concurrent requests, 3 blocking calls of 50ms each per request, THREAD_POOL_MAX_WORKERS=128 === Throughput (lower wall is better) === flavour wall(s) p50(s) p95(s) max(s) before 4.986 0.158 0.207 0.269 after 0.248 0.181 0.230 0.231 === Event-loop responsiveness (lower probe latency is better) === flavour samples probe p50(ms) probe p95(ms) probe max(ms) before 1 5442.26 5442.26 5442.26 after 28 0.88 11.53 98.02 ``` </details> The benchmark script is included as a comment on the PR for reproducibility. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Performance Improvement Closes [#13825](https://github.com/infiniflow/ragflow/issues/13825) --------- Co-authored-by: tmimmanuel <tmimmanuel@users.noreply.github.com> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-05-11 15:08:55 +08:00
Paul Y Hui	a0efc453f3	Fix: safe argument guard and remove redundant redis call (#14060 ) ### What problem does this PR solve? - Moved if not all([email, new_pwd, new_pwd2]) guard to the top, before any decryption that could crash on None value - Removed the redundant REDIS_CONN.get() call — one call is sufficient ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2026-05-11 15:02:24 +08:00
Jin Hai	c55e23e7e2	Go: refactor embedding interface (#14757 ) ### What problem does this PR solve? Provide embedding index according to the input text ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-11 14:45:30 +08:00
Ricardo-M-L	5ef7f50eef	fix: use context manager for ThreadPoolExecutor in file_service.py (#14144 ) ## Summary - Wrap 2 `ThreadPoolExecutor` instances in `file_service.py` with `with` statement - Ensures threads are properly shut down after all futures complete ## Problem `parse_docs()` (line 532) and the file processing method (line 694) create `ThreadPoolExecutor` instances that are never shut down. In a long-running server process, this leaks thread resources on every invocation — threads remain alive consuming memory even after all submitted work is complete. ## Fix Replace bare `ThreadPoolExecutor()` with `with ThreadPoolExecutor() as exe:` context manager, which calls `executor.shutdown(wait=True)` on exit. ## Test plan - [x] Verified both call sites use `with` statement after fix - [x] No remaining bare `ThreadPoolExecutor` in `file_service.py` - [x] `document_service.py:1066` is a module-level executor (different pattern, not changed in this PR) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-05-11 14:02:45 +08:00
buua436	a03b95f8c4	Fix: shared dataset chunk index lookup (#14764 ) ### What problem does this PR solve? shared dataset chunk index lookup ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-11 13:50:08 +08:00
buua436	024c8cb0b5	Fix: dataset search rerank id type (#14759 ) ### What problem does this PR solve? issue: https://github.com/infiniflow/ragflow/issues/14748 change: dataset search rerank id type ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-11 13:48:05 +08:00
jony376	46897d6fa4	Fix: bind memory message `user_id` to authenticated user for JWT auth (#14745 ) ### Related issues Closes #14744 ### What problem does this PR solve? The Memory REST endpoint `POST /api/v1/messages` previously persisted whatever `user_id` the client sent in the JSON body. Memory rows were therefore attributed to an arbitrary string, even when the caller authenticated as a normal workspace user via JWT (browser/session-style bearer token decoded into an access token). That broke attribution and audit semantics for shared memories (team visibility): any authorized writer could spoof another subject id. The Python SDK already sends an optional `user_id` for integrations using API keys (`APIToken`) to tag an external subject distinct from the tenant owner user. ### Solution - Record `g.auth_via_api_token` in `_load_user` (`api/apps/__init__.py`): set `True` only when authentication resolves via `APIToken`, otherwise `False` after JWT-based login succeeds. - In `POST /messages` (`memory_api.add_message`): if the request was authenticated with an API key, keep accepting optional `user_id` from the body (default empty string). For JWT-authenticated users, always set stored `user_id` to `current_user.id` and ignore the client field. - Guard reads of `g` with `RuntimeError` handling so isolated imports or tests without a Quart application context do not fail when resolving `user_id`. - Document on `RAGFlow.add_message` that `user_id` is only meaningful for API-key authentication. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### Testing - `python -m py_compile` on modified modules (`api/apps/__init__.py`, `api/apps/restful_apis/memory_api.py`). - Recommended: run web/SDK memory message tests (`test_add_message`, `test_message_routes_unit`) against a full environment with `quart` and configured services. ### Notes for reviewers - Behavior change only for callers using JWT-style authorization on `POST /messages`; API-key callers keep prior optional `user_id` semantics. Co-authored-by: jony376 <jony376@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-11 13:26:05 +08:00
Achieve3318	16354f4e14	fix(dify): guard retrieval argument error behavior (#14169 ) ## What problem does this PR solve? The Dify-compatible `/dify/retrieval` endpoint recently gained stricter parsing and validation for its request payload, including: - Normalized `retrieval_setting.top_k` and `retrieval_setting.score_threshold` types. - Clear separation between malformed arguments vs missing required fields. Previously, there was no unit test explicitly guarding the exact error code and message contract for these cases. ## What does this PR change? - Add guard-style unit test in `test_dify_retrieval_routes_unit.py`: - `test_retrieval_argument_error_messages`: - Sends a request with malformed numeric options: - `retrieval_setting = {"top_k": "not-int", "score_threshold": "not-float"}` - Asserts `code == RetCode.ARGUMENT_ERROR` and message contains `"invalid or malformed arguments:"`. - Sends a request with required fields missing: - Empty payload (`{}`) - Asserts `code == RetCode.ARGUMENT_ERROR` and message contains `"required arguments are missing:"`. This test encodes the intended behavior of the Dify retrieval API so future refactors cannot silently regress error handling. ## Type of change - [x] Tests (add coverage and guardrails for existing behavior) Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-05-11 13:17:42 +08:00
FPlust	0734fd793a	fix: scope pending_cell_images by sheet in excel parser (#14120 ) pending_cell_images should be scoped by sheet ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-11 13:17:14 +08:00
Wang Qi	3838770e7a	GraphRAG feature - Part 1 - add spacy to extract entity and relation (#14670 ) ### What problem does this PR solve? GraphRAG feature - Part 1 - add spacy to extract entity and relation <img width="1621" height="1288" alt="image" src="https://github.com/user-attachments/assets/aadeddad-94da-46c6-adad-9c3784181f61" /> ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-11 12:59:59 +08:00
web-dev0521	cc207b5b05	Refactor: tidy up ThreadPoolExecutor lifecycle in file_service and task executor (#14668 ) ## Summary - Wrap the `ThreadPoolExecutor` instances in `FileService.parse_docs` and `FileService.get_files` with `with ... as exe:` blocks for deterministic cleanup - Replace the `concurrent.futures.ThreadPoolExecutor` in `do_handle_task` with `asyncio.create_task(asyncio.to_thread(build_TOC, ...))`, preserving the existing parallelism with chunk insertion while leveraging the surrounding async context - Drop the now-unused `import concurrent` and the `executor.shutdown(wait=False)` call in the `finally` block Closes #14622. No behavioral change, no public API change. Net diff: ~19 insertions / 25 deletions across two files. ## Test plan - [ ] `uv run ruff check api/db/services/file_service.py rag/svr/task_executor.py` passes - [ ] Upload a multi-file batch through the chat/file endpoint and confirm `FileService.parse_docs` still returns combined parsed text - [ ] Trigger `FileService.get_files` via the chat reference flow with a mix of image and non-image files; verify both `raw=True` and `raw=False` paths return correctly - [ ] Run a `naive`-parser document task with `toc_extraction: true` and confirm the TOC chunk is generated and inserted exactly as before - [ ] Run a `naive`-parser document task with `toc_extraction: false` and confirm the path with `toc_thread = None` is unaffected - [ ] Cancel a running task to exercise the `finally` block and confirm cleanup still works without the executor shutdown call --------- Co-authored-by: web-dev0521 <jasonpette1783@gmail.com> Co-authored-by: Wang Qi <wangq8@outlook.com>	2026-05-11 12:59:00 +08:00
Joseff	13e6554901	Fix(Go): make OpenRouter Encode fail loudly on malformed responses (#14717 ) ### What problem does this PR solve? The OpenRouter `Encode` method silently swallowed malformed responses. If a `data[]` item from the API was missing a field (`index`, `embedding`, or unexpected shape), the loop did `continue` instead of returning an error — leaving `nil` entries in the result slice. Callers got back partial results with no indication anything went wrong, which then crashes downstream consumers when they try to use a `nil` vector. There were three concrete gaps: - No count-mismatch check between `data` length and input texts (only checked for empty) - No duplicate-index detection (a duplicate would silently overwrite) - Parse failures on individual items returned partial slices instead of erroring This PR replaces `map[string]interface{}` parsing with a typed `openrouterEmbeddingResponse` struct and applies the same 3-layer validation used in the other drivers (count mismatch → out-of-range index → duplicate index), so any malformed response produces a clear error instead of corrupted data. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-11 12:57:11 +08:00
Panda Dev	530edbac99	Go: implement Encode (embeddings) in LM Studio driver (#14694 ) ### What problem does this PR solve? The LM Studio Go driver shipped with a stub \`Encode\` method that returned \`no such method\`, even though LM Studio is one of the most common local LLM runners on macOS and Windows and exposes an OpenAI-compatible embeddings endpoint at \`/v1/embeddings\`. LM Studio users routinely load local embedding models such as \`nomic-ai/nomic-embed-text-v1.5\`, \`mixedbread-ai/mxbai-embed-large-v1\`, or \`BAAI/bge-m3\`. They run on the same \`/v1\` namespace as chat. The existing \`ListModels\` already discovers them, but because \`Encode\` was a stub, a tenant who picked one of these models in the Go layer could not actually run an embedding call. This finishes the local-LLM trio: Ollama Encode (#14664) and vLLM Encode (#14688) are already in flight, both using the same OpenAI-compatible \`/embeddings\` shape. ### What this PR includes - \`conf/models/lmstudio.json\`: add \`\"embedding\": \"embeddings\"\` under \`url_suffix\` so the driver can build the URL from config. - \`internal/entity/models/lmstudio.go\`: replace the \`Encode\` stub with a real implementation. Adds a small local response type that matches the OpenAI-compatible shape. No factory change. No interface change. ### How the driver works - Validate the model name. The API key is optional for local LM Studio, so the Authorization header is only set when both \`apiConfig\` and \`ApiKey\` are non-nil and non-empty, the same pattern the recently merged CheckConnection PR (#14614) uses. - Resolve the region with a default fallback. Return a clear "missing base URL" error when the user has not configured the local access address yet. - Use a per-call \`context.WithTimeout(30s)\` and \`http.NewRequestWithContext\`, the same pattern the merged Aliyun Encode (#14647) and the in-flight Ollama Encode (#14664) and vLLM Encode (#14688) use. - Send \`{model, input: [texts]}\` in one request. - Parse \`data[].embedding\` and copy each slice into a \`[][]float64\` indexed by \`data[].index\`, so the output order matches the input order. - Handle both \`float64\` and \`float32\` element types. - Empty input returns \`[][]float64{}\` with no HTTP call. - Length mismatch between input and result, out-of-range index, and any missing slot all return clear errors instead of silent zero vectors. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - \`go build ./internal/entity/models/...\` in a clean go 1.25 image returns exit 0. - The full method set on \`LmStudioModel\` still matches the \`ModelDriver\` interface. - Pattern parity with the merged Aliyun Encode (#14647), the in-flight Ollama Encode (#14664) and vLLM Encode (#14688), and the existing SiliconFlow Encode. Closes #14693	2026-05-11 12:55:57 +08:00
Joseff	0580c137fa	Perf(Go): batch SiliconFlow Encode requests with 32-item chunking (#14719 ) ### What problem does this PR solve? The SiliconFlow `Encode` method sent one HTTP request per text, which is wasteful and slow when indexing many documents (e.g., 100 docs = 100 round-trips). SiliconFlow's `/v1/embeddings` is OpenAI-compatible and accepts an array of strings in `input` (officially documented at https://docs.siliconflow.cn/en/api-reference/embeddings/create-embeddings, with a documented max array size of 32). This PR batches the requests up to that limit, reducing 100 docs to ~4 round-trips, and replaces `map[string]interface{}` parsing with a typed struct using the same 3-layer validation (count mismatch, out-of-range index, duplicate index) used in the other drivers. ### Type of change - [x] Performance Improvement	2026-05-11 12:55:27 +08:00
BitToby	4b96362092	Go: implement Encode (embeddings) in NVIDIA driver (#14700 ) ### What problem does this PR solve? The NVIDIA Go driver in `internal/entity/models/nvidia.go` shipped with a stub `Encode` method that returned `no such method`. `conf/models/nvidia.json` already lists `nvidia/llama-3.2-nemoretriever-1b-vlm-embed-v1` as an embedding model, but the conf had no `embedding` URL suffix, so the picker had nothing wired even if `Encode` worked. A tenant who wanted to use NVIDIA NIM for chat (already working) and embeddings from a single provider could not, even though the upstream endpoint is public at `https://integrate.api.nvidia.com/v1/embeddings` and uses an OpenAI-compatible request body extended with the NVIDIA-specific `input_type` and `truncate` fields. Several other Go drivers already implement `Encode` (siliconflow, zhipu-ai, aliyun), so the interface and the pattern are well-established. This PR fills the gap. ### What this PR includes * `conf/models/nvidia.json`: declare the `embedding` URL suffix alongside the existing `chat` and `models` entries. The embedding model entry was already present, so no model addition is needed. * `internal/entity/models/nvidia.go`: replace the `Encode` stub with a real implementation. Adds a small local response type that matches the OpenAI-compatible shape NVIDIA NIM returns. No factory change. No interface change. ### How the driver works * Validates `apiConfig` and the API key, validates the model name, resolves the region with a default fallback (matching the pattern the merged `ListModels` and `CheckConnection` paths in this driver already use), and builds the URL from `BaseURL[region] + URLSuffix.Embedding`. * Sends all input texts in one request as the `input` array, with the NVIDIA-specific `input_type: "query"`, `encoding_format: "float"`, and `truncate: "END"` fields, mirroring the Python `NvidiaEmbed` reference. * Parses `data[].embedding` and copies each slice into `[][]float64` indexed by `data[].index` so the output order matches the input order even if the API returns items in a different order. * Handles both `float64` and `float32` element types. * Empty input returns `[][]float64{}` with no HTTP call. * Non-200 responses propagate the upstream status line and body. * A final pass checks every input slot got a vector and returns a clear error if any slot is still nil. * Per-call 30s context deadline so a slow call cannot block forever. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? * `go build ./internal/entity/models/...` returns exit 0. * `go vet ./internal/entity/models/...` is clean. * `gofmt -l internal/entity/models/nvidia.go` is clean. * The full method set on `NvidiaModel` still matches the `ModelDriver` interface. * Pattern parity with the just-merged Aliyun `Encode` (#14647). Closes #14699	2026-05-11 12:50:50 +08:00
Jack Storment	8ff623fbc4	Go: implement Encode (embeddings) in Ollama driver (#14664 ) ### What problem does this PR solve? The Ollama Go driver shipped with a stub \`Encode\` method that returned \`no such method\`, even though Ollama is one of the most common local LLM runners and exposes an OpenAI-compatible embeddings endpoint at \`/v1/embeddings\`. Ollama users routinely run local embedding models such as \`nomic-embed-text\`, \`mxbai-embed-large\`, or \`bge-m3\`. Pulled with \`ollama pull <model>\` and served on the same \`/v1\` namespace as chat. The existing \`ListModels\` already discovers them, but because \`Encode\` was a stub, a tenant who picked one of these models in the Go layer could not actually run an embedding call. ### What this PR includes - \`conf/models/ollama.json\`: add \`\"embedding\": \"embeddings\"\` under \`url_suffix\` so the driver can build the URL from config. - \`internal/entity/models/ollama.go\`: replace the \`Encode\` stub with a real implementation. Adds a small local response type that matches the OpenAI-compatible shape. No factory change. No interface change. ### How the driver works - Validate the model name. The API key is optional for local Ollama, so the Authorization header is only set when both \`apiConfig\` and \`ApiKey\` are non-nil and non-empty, the same pattern the recently merged CheckConnection PR (#14614) uses. - Resolve the region with a default fallback. Return a clear "missing base URL" error when the user has not configured the local access address yet. - Use a per-call \`context.WithTimeout(30s)\` and \`http.NewRequestWithContext\`, the same pattern the merged Aliyun Encode (#14647) uses. - Send \`{model, input: [texts]}\` in one request. - Parse \`data[].embedding\` and copy each slice into a \`[][]float64\` indexed by \`data[].index\`, so the output order matches the input order. - Handle both \`float64\` and \`float32\` element types. - Empty input returns \`[][]float64{}\` with no HTTP call. - Length mismatch between input and result, out-of-range index, and any missing slot all return clear errors instead of silent zero vectors. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - \`go build ./internal/entity/models/...\` in a clean go 1.25 image returns exit 0. - The full method set on \`OllamaModel\` still matches the \`ModelDriver\` interface. - Pattern parity with the merged Aliyun Encode (#14647) and the existing SiliconFlow Encode. Closes #14662	2026-05-11 12:50:15 +08:00
hyl64	77ce88dfcc	fix(prompt): reserve system budget in message_fit_in (#14164 ) ## Summary This PR fixes the `message_fit_in()` truncation bug reported in #13607. Changes: - fix the user-message truncation branch to reserve room for the system prompt token budget - guard the zero-token edge case to avoid dividing by zero in the truncation ratio check - add focused regression tests covering both the user-dominant truncation path and the zero-token boundary case ## Validation ```bash pytest -q --noconftest test/unit_test/rag/prompts/test_generator_message_fit_in.py ``` Result: `2 passed` Closes #13607	2026-05-11 12:44:27 +08:00
07heco	e46989832e	fix: complete robustness fixes for rerank module addressing all review comments (#14265 ) ## Summary This PR fully addresses all CodeRabbit review feedback and enhances the robustness of the reranking module with 100% backward compatibility. ## Key Fixes 1. Fixed JinaRerank hardcoded base_url to support subclass endpoint overrides 2. Corrected GPUStackRerank exception handling to use proper requests exceptions and preserve stack traces 3. Added 30s timeout to all API calls to prevent service hanging 4. Added empty input validation for all rerank providers 5. Replaced direct dict key access with .get() to eliminate KeyError crashes 6. Fixed _normalize_rank edge case for empty arrays 7. Implemented missing functionality for Ai302Rerank 8. Standardized type hints and fixed typo issues ## Compatibility - No breaking changes to any existing functionality - All rerank providers work as originally intended - Fully compatible with existing configurations and workflows ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-05-11 12:40:41 +08:00
Panda Dev	fa53b93dd5	Go: implement Encode (embeddings) in vLLM driver (#14688 ) ### What problem does this PR solve? The vLLM Go driver shipped with a stub \`Encode\` method that returned \`not implemented\`, even though vLLM is one of the most common production-grade self-hosted inference servers and exposes an OpenAI-compatible embeddings endpoint at \`/v1/embeddings\`. Users who self-host \`BAAI/bge-m3\`, \`Qwen3-Embedding-\`, \`NV-Embed-v2\`, or similar models on vLLM could not run an embedding call through the Go layer. The existing \`ListModels\` already discovers the loaded models, but the embedding path failed because \`Encode\` was a stub. ### What this PR includes - \`conf/models/vllm.json\`: add \`\"embedding\": \"embeddings\"\` under \`url_suffix\` so the driver can build the URL from config. - \`internal/entity/models/vllm.go\`: replace the \`Encode\` stub with a real implementation. Adds a small local response type that matches the OpenAI-compatible shape. No factory change. No interface change. ### How the driver works - Validate the model name. The API key is optional for self-hosted vLLM, so the Authorization header is only set when both \`apiConfig\` and \`ApiKey\` are non-nil and non-empty, the same pattern the recently merged CheckConnection PR (#14614) uses. - Resolve the region with a default fallback. Return a clear "missing base URL" error when the user has not configured the local access address yet. - Use a per-call \`context.WithTimeout(30s)\` and \`http.NewRequestWithContext\`, the same pattern the merged Aliyun Encode (#14647) and in-flight Ollama Encode (#14664) use. - Send \`{model, input: [texts]}\` in one request. - Parse \`data[].embedding\` and copy each slice into a \`[][]float64\` indexed by \`data[*].index\`, so the output order matches the input order. - Handle both \`float64\` and \`float32\` element types. - Empty input returns \`[][]float64{}\` with no HTTP call. - Length mismatch between input and result, out-of-range index, and any missing slot all return clear errors instead of silent zero vectors. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - \`go build ./internal/entity/models/...\` in a clean go 1.25 image returns exit 0. - The full method set on \`VllmModel\` still matches the \`ModelDriver\` interface. - Pattern parity with the merged Aliyun Encode (#14647), the in-flight Ollama Encode (#14664), and the existing SiliconFlow Encode. Closes #14687	2026-05-11 12:09:17 +08:00
Qinsanz	d6660cf156	fix(keyword_extraction): accept Chinese commas/semicolons/newlines as keyword delimiters (#14540 ) ## What Widen the keyword delimiter in `rag/svr/task_executor.py`: both `build_chunks` (LLM `keyword_extraction` cache parsing) and `run_dataflow` (chunk-level `keywords` ingestion) now split on `, ， ; ；、 \r \n` instead of only ASCII comma. ## Why `rag/prompts/keyword_prompt.md` instructs the LLM: > The keywords are delimited by ENGLISH COMMA. In practice, Chinese-leaning models (Qwen / Tongyi-Qianwen, GLM, etc.) frequently ignore this instruction when the source content is Chinese and emit Chinese commas (`，`) instead. Result: `cached.split(",")` sees the full LLM output as a single keyword. Repro: `auto_keywords>=4` + Chinese docs + `qwen-plus@Tongyi-Qianwen`. We observed entries in `important_kwd` like `"功能介绍，配置说明，参数详解，问题排查"` — one bucket instead of four. ## Impact - Silent data-quality bug; no exception thrown. - BM25 `important_kwd^30` boost effectively stops firing — the indexed term is the whole list, never matches user query tokens. - Any downstream aggregating `important_kwd` (tagging, analytics, candidate-keyword review UIs) sees garbage. ## Compatibility - Pure widening of the splitter; ASCII-comma-only outputs continue to work identically. - No schema / API change. ## Test plan Manually verified against `qwen-plus@Tongyi-Qianwen` with `auto_keywords=10` on Chinese .txt files: - Before: `important_kwd` contains one element per chunk that is the full LLM string with `，`-separated phrases inside. - After: `important_kwd` contains N elements, one per phrase, as the LLM intended.	2026-05-11 12:05:24 +08:00
BitToby	bfb4a0eea2	Go: implement Encode (embeddings) in Gitee AI driver (#14698 ) ### What problem does this PR solve? The Gitee AI Go driver in `internal/entity/models/gitee.go` shipped with a stub `Encode` method that returned `gitee, no such method`, even though `conf/models/gitee.json` already wires the `embedding` URL suffix. The conf also listed no embedding models, so the picker had nothing to select. This blocked any tenant who wanted to use Gitee AI for chat, rerank (already working, see #14656), and embeddings from a single provider. This PR fills the gap, mirroring the just-merged Aliyun `Encode` (#14647): - `internal/entity/models/gitee.go`: replace the `Encode` stub with a real implementation. Validates inputs, resolves the region with a default fallback, POSTs the standard OpenAI-compatible `{"model", "input": [...]}` body to `BaseURL[region] + URLSuffix.Embedding`, parses `data[].embedding` indexed by `data[].index` so output order matches input order, handles both `float64` and `float32` element types, and uses a 30s per-call context deadline matching the merged `Rerank`. - `conf/models/gitee.json`: add `BAAI/bge-m3` so the embedding picker has something to select. No factory change. No interface change. No URL suffix change. Verified with `go build`, `go vet`, and `gofmt -l` : all clean. Closes #14697 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-11 11:56:46 +08:00
VincentLambert	b83e2ae5a2	fix: handle missing parent chunk in retrieval_by_children (#14556 ) ### What problem does this PR solve? `retrieval_by_children()` in `rag/nlp/search.py` crashes with a `TypeError: 'NoneType' object is not subscriptable` when a parent ("mom") chunk referenced by child chunks is missing from the index. This happens when the index is in an inconsistent state — for example after a partial re-index, a document deletion that didn't clean up all children, or a race condition during ingestion. `dataStore.get()` returns `None` for the missing parent, and the subsequent access to `chunk["content_with_weight"]` raises a `TypeError`. Stack trace: ``` TypeError: 'NoneType' object is not subscriptable File "rag/nlp/search.py", line 792, in retrieval_by_children "content_with_weight": chunk["content_with_weight"], ``` ### Type of change - [x] Bug Fix ### Fix When `dataStore.get()` returns `None` for a parent chunk, fall back to using the child chunks directly and continue processing the remaining parents. This preserves retrieval results for all other chunks rather than aborting the entire query with an exception. ```python chunk = self.dataStore.get(id, idx_nms[0], [ck["kb_id"] for ck in cks]) if chunk is None: chunks.extend(cks) continue ``` --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-11 11:55:44 +08:00
Sp1kyss	e6cb9faace	fix: close two security analyzer bypass paths in sandbox executor (#14690 ) ## Summary Two bypass vectors in the sandbox code security analyzer allowed malicious code to pass the safety check undetected and reach the Docker executor. ### 1. JavaScript: template-literal bypass of `require()` block The `SecureJavaScriptAnalyzer` regex patterns used `['"]` to match module names, covering only single and double quotes. An attacker could use ES6 template literals to bypass all three `require` checks: `javascript const cp = require(`child_process`); async function main() { return cp.execSync('cat /etc/passwd').toString(); } ` The same bypass applied to `fs` and `worker_threads`. Fix: Updated all three `require` patterns from `['"]` to `['"\]` to also match backtick template literals. ### 2. Python: `builtins` not blocked + attribute-call blind spot in `visit_Call` `visit_Call` only checked `ast.Name` nodes, so attribute-style calls like `module.func()` were invisible to the analyzer. Additionally, `builtins` was absent from `DANGEROUS_IMPORTS`. Combined, this allowed: `python import builtins def main(): builtins.exec('import os; os.system("id")') ` Neither the import nor the exec call triggered any flag. Fix: Added `builtins` to `DANGEROUS_IMPORTS` and added an `ast.Attribute` branch to `visit_Call` so that `module.dangerous_func()` style calls are caught alongside bare `dangerous_func()` calls. ## Tests Added four regression tests covering each new bypass vector: - `test_javascript_child_process_template_literal_is_rejected` - `test_javascript_fs_template_literal_is_rejected` - `test_python_builtins_import_is_rejected` - `test_python_attribute_eval_call_is_rejected` --------- Co-authored-by: bounty-hunter <bounty@hunter.local>	2026-05-11 11:46:27 +08:00
Joseff	827cceccba	Fix(Go): correct Name() and region URL fallback in Aliyun driver (#14673 ) ### What problem does this PR solve? Two bugs in the Aliyun Go driver: 1. `Name()` returns `"siliconflow"` — a copy-paste bug from when the driver was created. `Name()` is used in error messages and log output, so every Aliyun error incorrectly attributed itself to SiliconFlow. 2. Silent empty URL for unknown regions in `ChatWithMessages`, `ChatStreamlyWithSender`, and `ListModels` — all three methods construct the request URL as `z.BaseURL[region]` without checking whether the key exists. For an unrecognised region this returns `""`, producing a malformed URL like `"/chat/completions"` that the HTTP transport rejects with a confusing error. `Encode` and `Rerank` (already merged) correctly fall back to `"default"` and return a clear error. This PR applies the same pattern to the remaining three methods. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-11 11:26:24 +08:00
Carmen Fernández Ruiz	f852a7524e	fix(go): wire Google CheckConnection to ListModels (#14660 ) ### What problem does this PR solve? Closes #14703 `GoogleModel.CheckConnection` currently returns a hardcoded `no such method` error even though the Google Go driver already supports `ListModels`. This makes provider connection checks fail regardless of whether the configured API key can list Google models. This PR makes `CheckConnection` call `ListModels`, adds a small API-key guard for nil, empty, and whitespace-only keys, and keeps `ListModels` useful by following paginated Google model responses. ### What stays unchanged * Google model listing still uses the Google GenAI SDK with `genai.BackendGeminiAPI`. * Model names still come from `models.Items[].Name`. `Balance`, `Encode`, chat, streaming, provider config, and factory wiring are unchanged. ### Tests and validation Added focused unit coverage for: * `CheckConnection` delegating to `ListModels` and returning its error * nil, missing, empty, and whitespace-only API key validation * model-name passthrough from the list-models adapter * paginated model listing, empty-result preservation, and next-page error propagation Validated current PR head `17ceef43515ba8c46c254dd349b9085bf26dcbea` locally with Go 1.25.0: * `go test ./internal/entity/models -run 'TestGoogleModel\|TestCollectGoogleModelNames' -count=1 -v` - PASS * `go test ./internal/entity/models -count=1` - PASS * `go test -race ./internal/entity/models -count=1` - PASS * `gofmt -w internal/entity/models/google.go internal/entity/models/google_test.go` - PASS, no diff * `git diff --check` - PASS ### Type of change * [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-11 11:25:17 +08:00
Joseff	f4f8bed9f7	Go: implement Encode (embeddings) in Google Gemini driver (#14682 ) ### What problem does this PR solve? - Implements the `Encode` method in the Google Gemini driver, which was previously a stub returning `not implemented` - Uses the `google.golang.org/genai` SDK's `EmbedContent` API, which routes to the `batchEmbedContents` endpoint internally — all texts are sent in a single request - Adds `text-embedding-004` (max 2048 tokens) to `conf/models/google.json` - Response values are `[]float32` from the SDK and are cast to `[]float64` to satisfy the `ModelDriver` interface ## Files changed - `internal/entity/models/google.go` — full `Encode` implementation - `conf/models/google.json` — adds `text-embedding-004` embedding model ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-11 11:24:21 +08:00
Ricardo-M-L	13922209e6	fix(llm): add timeout to HTTP requests in LLM integration layer (#14313 ) ### What problem does this PR solve? Multiple `requests.post()` calls across the LLM integration layer lack a `timeout` parameter. Without a timeout, a single unresponsive upstream service can block the calling thread indefinitely, eventually exhausting the thread pool and degrading the entire system. This is a well-known issue — Python's `requests` library defaults to `timeout=None` (infinite wait), and [the library docs explicitly recommend](https://requests.readthedocs.io/en/latest/user/advanced/#timeouts) always setting a timeout. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Change Added `timeout` to all `requests.post()` calls missing it: \| File \| Calls fixed \| Timeout \| \|------\|-------------\|---------\| \| `rag/llm/rerank_model.py` \| 9 \| 30s \| \| `rag/llm/embedding_model.py` \| 8 \| 30s \| \| `rag/llm/cv_model.py` \| 3 \| 60s \| \| `rag/llm/tts_model.py` \| 2 \| 60s \| \| `rag/llm/sequence2txt_model.py` \| 2 \| 60s \| Embedding/rerank calls use 30s (lightweight API calls). Vision, TTS, and audio transcription use 60s (heavier workloads with file uploads). Note: other files in the codebase (e.g. `check_minio_alive`, `check_ragflow_server_alive`) already use `timeout=10`, so this PR brings the LLM layer in line with existing practice. Signed-off-by: Ricardo-M-L <Sibyl_Hartmanbnb@webname.com> Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-05-11 11:19:07 +08:00
Paras Sondhi	51b73850e1	feat: make sandbox Dockerfile mirrors optional with ARG (#14553 ) ### What problem does this PR solve? Resolves #14447. (Note: This supersedes stalled PR #14448 and implements the requested CodeRabbitAI fixes). Currently, the Dockerfiles inside `agent/sandbox/sandbox_base_image` (both Python and Node.js) have hardcoded Chinese package mirrors. This forces the mirrors on all users globally, which causes build network timeouts for contributors outside of China. This PR introduces an enhancement to fix the issue by: 1. Implementing the `NEED_MIRROR` build argument in the sandbox Dockerfiles. 2. Replacing static `ENV` instructions with conditional shell logic inside `RUN` blocks to dynamically set the package registries. 3. Allowing the build to cleanly fall back to default global registries (`pypi.org` and `npmjs.org`) when `--build-arg NEED_MIRROR=0` is passed. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-11 11:01:43 +08:00
BitToby	39a1773f7f	Go: implement ListModels in Volcengine driver (#14702 ) ### What problem does this PR solve? The VolcEngine Go driver in `internal/entity/models/volcengine.go` shipped with a `ListModels` stub that returned `volcengine, no such method`. `conf/models/volcengine.json` also did not declare a `models` URL suffix, so the model picker had nothing to call even if the method body were filled in. A tenant who configured Volcengine (Doubao / Ark) as a provider could not see the list of available endpoints from the RAGFlow UI. Several other Go drivers already implement `ListModels` against the OpenAI-compatible `/models` endpoint (deepseek, gitee, nvidia, openai, siliconflow), so the interface and pattern are well-established. This PR fills the gap. ### What this PR includes * `conf/models/volcengine.json`: declare the `models` URL suffix alongside the existing `chat`, `files`, and `embedding` entries. The Ark v3 API exposes `https://ark.cn-beijing.volces.com/api/v3/models`, so the suffix is just `models`. * `internal/entity/models/volcengine.go`: replace the `ListModels` stub with a real implementation. Reuses the package-level `DSModelList` / `DSModel` types that DeepSeek, Gitee, and SiliconFlow already use to parse the OpenAI-compatible models response shape. No factory change. No interface change. ### How the driver works * Resolves the region with a default fallback, the same way the other VolcEngine methods in this driver already do. * Builds the URL from `BaseURL[region] + URLSuffix.Models`, with `strings.TrimSuffix` on the base to keep the join robust. * Issues a `GET` with optional `Authorization: Bearer <api_key>` (the header is omitted when no key is configured, mirroring the existing NVIDIA `ListModels`). * Reads the response body once, surfaces a non-200 with the upstream status line plus body, and parses the JSON via the shared `DSModelList` type. * Returns the model id list in input order. When the response includes an `owned_by` field, the entry is rendered as `id@owned_by`, matching the convention used by the other Go drivers. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? * `go build ./internal/entity/models/...` returns exit 0. * `go vet ./internal/entity/models/...` is clean. * `gofmt -l internal/entity/models/volcengine.go` is clean. * The full method set on `VolcEngine` still matches the `ModelDriver` interface. * Endpoint reachability check: `GET https://ark.cn-beijing.volces.com/api/v3/models` returns `401 Unauthorized` without an API key, confirming the path exists and accepts Bearer authentication. * Pattern parity with DeepSeek, Gitee, NVIDIA, and SiliconFlow `ListModels`. Fixes #14701 Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-11 10:59:18 +08:00
VincentLambert	08bb53bbb1	Feat: add BedrockCV for vision/image2text inference via LiteLLM (#14705 ) ## Summary - `CvModel["Bedrock"]` was absent from `rag/llm/cv_model.py`, causing `model_instance()` to return `None` when a Bedrock model was used as a PDF parser — even after correct model resolution. - This PR adds `BedrockCV`, enabling Bedrock vision models (e.g. `amazon.nova-pro-v1:0`, `anthropic.claude-3-5-sonnet`) to be used as PDF parsers. ## What problem does this PR solve? When a Bedrock model is selected as the PDF parser in a knowledge base, ingestion failed with: ``` 'LiteLLMBase' object has no attribute 'describe_with_prompt' ``` The root cause: `LiteLLMBase` (the Bedrock chat implementation) was the only registered handler for the Bedrock factory. It does not implement `describe_with_prompt`. `CvModel` had no Bedrock entry, so `model_instance()` returned `None` for `image2text` requests. ## Type of change - [x] New Feature (non-breaking change which adds functionality) ## Changes `rag/llm/cv_model.py` Adds `BedrockCV(Base)` with `_FACTORY_NAME = "Bedrock"`: - Uses `litellm.completion` with the `bedrock/` prefix (consistent with `LiteLLMBase`) - Parses AWS credentials from the JSON key assembled by `add_llm` (`auth_mode`, `bedrock_ak`, `bedrock_sk`, `bedrock_region`, `aws_role_arn`) - Supports three auth modes: `access_key_secret`, `iam_role` (via STS `assume_role`), and default credential chain (IRSA, instance profile) - Implements `describe_with_prompt` and `describe` ## Test plan - [ ] Configure a Bedrock vision model (e.g. `amazon.nova-pro-v1:0`) with valid AWS credentials - [ ] Select it as PDF parser in a knowledge base - [ ] Verify ingestion of a PDF document completes without errors - [ ] Verify `CvModel["Bedrock"]` resolves to `BedrockCV` 🤖 Generated with [Claude Code](https://claude.ai/claude-code) --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-11 10:29:58 +08:00
Ahmad Intisar	3c4d1da98f	Feature/table parser column roles (#13710 ) ### What problem does this PR solve? The table file parser (CSV/Excel) currently treats all columns identically — every column is both vectorized (embedded in chunk text) and stored as filterable metadata. There's no way for users to control which columns should be searchable by semantic meaning versus which should only be filterable attributes. For example, when ingesting a news articles CSV with columns like title, content, country, category, source, etc., the embedding includes metadata fields like country: Brazil and source: Reuters in the chunk text, which dilutes the semantic quality of the embedding without adding retrieval value. The RDBMS connector (MySQL/PostgreSQL) already supports content_columns / metadata_columns, but this capability was missing for file-based table ingestion. This PR adds column-level control (vectorize / metadata / both) for the table file parser, following RAGFlow's existing patterns. Backward compatible: Datasets without table_column_roles or with table_column_mode: auto behave exactly as before (all columns = both). ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-05-11 10:06:04 +08:00
Igor Ilinskii	889aba6a32	fix base_url handling in HuggingfaceRerank (#14555 ) ### What problem does this PR solve? HuggingfaceRerank.post() unconditionally prepends `http://` to base_url, which already contains a protocol. This creates invalid URLs like http://http://127.0.0.1:8080/rerank, breaking all requests. The fix normalizes URL handling to match the rest of the codebase, removing redunant `http://`. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Related Issues - #7318 - #7796 --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>	2026-05-11 10:04:40 +08:00
Tim Wang	ed01ac9994	Fix: resolve template strings in tool component parameters (#14601 ) ## Summary - Tool-type components (Email, Invoke, etc.) fail to resolve template strings that mix variable references with literal text in their parameters. - This adds template string resolution to `get_input()` in `ComponentBase`, reusing existing `get_input_elements_from_text()` and `string_format()` methods. ## Problem `get_input()` in `ComponentBase` handles two cases: 1. Pure reference (`{Component:ID@field}`) — resolved via `is_reff()` + `get_variable_value()` 2. Literal value — passed through as-is But template strings like `{UserFillUp:X@name}@duke.edu` or `Question from {Agent:Y@topic}` fall through to the literal branch because `is_reff()` returns `False` (it expects the entire string to be a single reference). The unresolved template is passed directly to the tool. This affects all tool components (Email, Invoke, etc.) that need mixed reference + text parameters — for example, constructing email addresses or subjects dynamically. ## Fix ```python # In get_input(), between is_reff check and literal fallback: elif isinstance(v, str) and re.search(self.variable_ref_patt, v): elements = self.get_input_elements_from_text(v) kv = {k: e.get('value', '') for k, e in elements.items()} self.set_input_value(var, self.string_format(v, kv)) ``` This reuses `get_input_elements_from_text()` and `string_format()` which are already used by `Message` components for the same purpose. The fix only activates when the string contains at least one variable reference pattern but is not a pure reference. ## Test plan - [x] Pure references (`{Component:ID@field}`) still resolve correctly via `is_reff()` path - [x] Literal values without references pass through unchanged - [x] Template strings like `{ref}@duke.edu` resolve the reference and keep the literal suffix - [x] Template strings like `Question from {ref}` resolve correctly - [x] Multiple references in one string (`{ref1} and {ref2}`) both resolve - [x] Message components unaffected (they use their own template resolution in `_run`) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: wanghualoong <wanghualoong@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-05-11 10:01:41 +08:00
Mehmet Karakose	7ec87f7cb7	fix(auth): fall back to session-based auth in _load_user (#14569 ) ## Summary Closes #13663. OAuth / OIDC callbacks call `login_user(user)` which writes `_user_id` into the session cookie, but `_load_user()` in `api/apps/__init__.py` only ever looked at the `Authorization` header. The SPA's response interceptor wipes the Authorization value from `localStorage` on the first 401 it sees — meaning that during the post-redirect window after an OAuth login, a single transient 401 sends every subsequent request back to the login page even though `login_user()` had already established a perfectly good server-side session. The reporter's analysis traces this all the way through the redirect → `navigate('/')` → first request → empty header → 401 → `removeAll()` → infinite-redirect-to-login chain. ## What changed - New `_load_user_from_session()` helper that reads `session["_user_id"]`, looks up the user in `UserService` (with the same `StatusEnum.VALID` and `access_token` checks already used elsewhere), and assigns `g.user`. - Every `return None` path in `_load_user()` now routes through that helper before giving up: - missing `Authorization` header - malformed `bearer ` prefix - empty / too-short JWT payload - JWT signature failure - JWT-resolved user not found / has no `access_token` - `APIToken.query()` fallback exhausted The JWT and API-token paths still take precedence — the session is only consulted when those can't authenticate the request. So existing local-login and SDK callers see no behaviour change; only OAuth / OIDC users that hit the original race now stay logged in. The Bearer-prefix issue called out in #13663 (lines 103-110) is already handled in the current code, so this PR only addresses the second half of the report. ## Test plan - [ ] Configure OIDC under `oauth` in `service_conf.yaml` - [ ] Click the OIDC login button, complete auth at the IdP - [ ] Confirm that navigating between pages no longer bounces back to `/login` - [ ] Confirm local email/password login still issues + accepts JWTs - [ ] Confirm SDK/API key callers still authenticate via `Authorization: Bearer <api-token>` --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2026-05-11 09:59:52 +08:00
很拉风的James	6cb4bc2947	Fix: Radio.Group cloneElement crashes on non-element children (#14407 ) ### What problem does this PR solve? `Radio.Group` in `web/src/components/ui/radio.tsx` injects the parent's `disabled` prop into each child via `React.cloneElement` with `as React.ReactElement` and no validation. This throws at runtime when a consumer passes strings, numbers, `null`, `false`, or other non-element nodes, while the cast hides the unsafe access from TypeScript. Use `React.isValidElement<RadioProps>(child)` as a type guard before calling `cloneElement`. Non-element children pass through unchanged, and `child.props` access becomes type-checked without an `as` cast. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-05-11 09:54:42 +08:00
Panda Dev	6bfe0f9a10	Go: implement Encode (embeddings) in OpenAI driver (#14630 ) ### What problem does this PR solve? The OpenAI Go driver landed in #14605 with chat, list models, and check connection. Encode was left as a stub that returns \`not implemented\`. \`conf/models/openai.json\` already lists three embedding models out of the box: - text-embedding-ada-002 - text-embedding-3-small - text-embedding-3-large So a tenant who picked one of these in the Go layer could not actually run an embedding call. This PR fills the gap. ### What this PR includes - \`conf/models/openai.json\`: add \`\"embedding\": \"embeddings\"\` under \`url_suffix\` so the driver can build the URL from config. This matches the \`URLSuffix.Embedding\` field used by other drivers (siliconflow, zhipu-ai). - \`internal/entity/models/openai.go\`: replace the Encode stub with a real implementation that POSTs to \`/v1/embeddings\`. Adds a small local response type \`openaiEmbeddingResponse\`. No factory change. No interface change. ### How the implementation works - Validate \`apiConfig\` and the API key, validate the model name. Use the existing \`baseURLForRegion\` helper so an unknown region fails fast with a clear error. - Wrap the request with \`context.WithTimeout(nonStreamCallTimeout)\` so the call has a clear deadline. Same pattern as \`ChatWithMessages\` and \`ListModels\` already use in this file. - Send all input texts in one request. The OpenAI API accepts the \`input\` field as an array. - Parse \`data[].embedding\` and copy each slice into a \`[][]float64\` indexed by \`data[].index\` so the output order matches the input order even if the API returns items in a different order. - Handle both \`float64\` and \`float32\` element types, the way the SiliconFlow driver does. - An empty input slice returns \`[][]float64{}\` with no HTTP call. - Non-200 responses propagate the upstream status line and body. - A final pass checks that every input slot got a vector. If any slot is still nil, return a clear error so the caller does not silently use a zero vector. ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### How was this tested? - \`go build ./internal/entity/models/...\` in a clean go 1.25 image (the go.mod minimum) returns exit 0. - The full method set on \`OpenAIModel\` still matches the \`ModelDriver\` interface. - Pattern parity with the existing SiliconFlow Encode implementation (\`internal/entity/models/siliconflow.go\`). Closes #14629 --------- Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2026-05-10 10:31:37 +08:00
Jin Hai	048ec2fc5c	Go: fix siliconflow rerank issue (#14743 ) ### What problem does this PR solve? As title. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-09 20:45:53 +08:00
Jin Hai	779cd83862	Go: fix Baidu rerank issue (#14742 ) ### What problem does this PR solve? top_n is missing ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-05-09 20:05:57 +08:00
Hunnyboy1217	782084780e	feat(connectors): ETag-based bypass for incremental S3 ingestion (#14628 ) (#14677 ) ### What problem does this PR solve? S3-family connector syncs currently re-download every in-window object just so we can compute `xxhash128(blob)` and compare against `Document.content_hash`. Anything that bumps `LastModified` without changing bytes (`aws s3 cp` touches, bucket re-encryption, etc.) pays full bandwidth and re-parses files that didn't actually change. #14628 covers the broader incremental-ingestion redesign; this PR is the first slice. The fix is a pre-listing short-circuit. `BlobStorageConnector` (S3 / R2 / GCS / OCI / S3-compat) now implements a new `FingerprintConnector` interface: `list_keys()` paginates `list_objects_v2` and yields `KeyRecord(key, fingerprint)` where `fingerprint = xxhash128(ETag)`. The orchestrator joins those against the connector's existing `{doc_id: content_hash}` map and only calls `get_value(key)` when the fingerprint differs. Unchanged keys are skipped entirely — no `GetObject`, no re-parse. No DDL. xxhash128(ETag) is 32 hex chars and reuses the existing `Document.content_hash` column per @yingfeng's suggestion; the connector decides at listing time whether to populate it. Local uploads and connectors that don't opt in fall through to the existing post-download `xxhash128(blob)` path with no behavior change. This is PR-1 of a 4-PR series — full design lives on #14628. Subsequent PRs extend tier 1 to local FS / WebDAV / Dropbox / Seafile / RDBMS (PR-2), wire up tier 2 cursor connectors with `SyncLogs.next_checkpoint` (PR-3), and unify deletion via `KeyRecord(deleted=True)` reconciliation (PR-4). Holding those back keeps this PR additive and reviewable on its own. #### Files touched - `common/data_source/models.py` — new `KeyRecord`; optional `fingerprint` on `Document` - `common/data_source/interfaces.py` — `IncrementalCapability` enum, `FingerprintConnector` ABC - `common/data_source/blob_connector.py` — `BlobStorageConnector` implements `FingerprintConnector`; per-object download factored into `_build_document_from_obj()` so `_yield_blob_objects`, `list_keys`, `get_value` all share it - `rag/svr/sync_data_source.py` — `_BlobLikeBase._fingerprint_filtered_generator` does the bypass loop; `_run_task_logic` plumbs `doc.fingerprint` into the upload dict - `api/db/services/document_service.py` — `list_id_content_hash_map_by_kb_and_source_type()` helper - `api/db/services/connector_service.py` + `file_service.py` — fingerprint flows through `duplicate_and_parse → upload_document` and lands in `content_hash` - `test/unit_test/common/test_blob_connector_fingerprint.py` — 14 tests covering ETag normalization (single-part, multipart, quoted, empty), `list_keys()` not calling `GetObject`, `get_value()` materializing with fingerprint, deterministic/stable fingerprints, and the bypass loop asserting `GetObject` is not called on a match #### Worth flagging for review Old `_BlobLikeBase._generate` called `poll_source(start, now)` with a `LastModified` window when `poll_range_start` was set. New code uses `_fingerprint_filtered_generator` (full bucket listing + fingerprint compare) outside of explicit `reindex=1`. Strictly better for unchanged-bucket cases since it skips `GetObject`, but it does mean every sync now does a full `list_objects_v2` paginate. Should still be cheap for most buckets — flagging in case anyone has a very large bucket where the time-window filter was meaningful. On migration: existing rows have `content_hash = xxhash128(blob)` from the old code. The first sync after this lands sees ETag-derived fingerprints that don't match, re-fetches every object once, and writes the new fingerprint. From the second sync onward the bypass works as expected. "Slow day one, fast every day after." A `fingerprint_backfill: trust` opt-out is sketched in the design doc but not in this PR. #### Test plan - [x] `uv run ruff check` — clean on all 8 touched files - [x] `uv run pytest test/unit_test/common/test_blob_connector_fingerprint.py -v` — 14 passed - [x] Broader unit-test suite — no regressions in anything I touched - [ ] Manual smoke against a real S3 bucket — configure a connector, run sync twice, expect the second sync to log `bypassed=N, fetched=0` and no `GetObject` calls in CloudTrail / bucket access logs - [ ] Manual smoke with `reindex=1` — confirm the full re-download path still works ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-05-09 20:03:56 +08:00
Haruko386	7931b693dc	Go: implement provider: Baidu (#14741 ) ### What problem does this PR solve? This PR completes the Baidu Qianfan provider integration in RAGFlow. The following functionalities are now supported: - [x] Chat / Think Chat / Stream Chat / Stream Think Chat - [x] Embedding - [x] Rerank - [x] Model listing - [x] Provider connection checking - [ ] Balance ----- Verified examples from the CLI: ```plaintext RAGFlow(user)> embed text 'what is rag' 'who are you' with 'embedding-3@test@zhipu-ai' dimension 16; +-----------+-------+ \| dimension \| index \| +-----------+-------+ \| 16 \| 0 \| \| 16 \| 1 \| +-----------+-------+ RAGFlow(user)> rerank query 'what is rag' document 'rag is retrieval augment generation' 'rag need llm' 'famous rag project includes ragflow' with 'qwen3-reranker-4b@test@baidu' top 2; +-------+---------------------+ \| index \| relevance_score \| +-------+---------------------+ \| 0 \| 0.974821150302887 \| \| 1 \| 0.14223189651966095 \| \| 2 \| 0.08632347732782364 \| +-------+---------------------+ RAGFlow(user)> think chat with 'deepseek-v3.2@test@baidu' message 'who r u' Thinking: Hmm, the user is asking for a simple introduction. This is straightforward – no need for overcomplication. I should give a clear, friendly response that covers my basic identity as an AI assistant, my purpose, and my capabilities. Keeping it concise but informative is key here. Mentioning my creator Anthropic adds credibility, and ending with an offer to help invites further interaction. No need for technical details unless the user asks later. Answer: Hello! I'm an AI assistant created by Anthropic, designed to help with a wide variety of tasks. You can think of me as a helpful digital companion—I can answer questions, assist with writing, help solve problems, provide explanations, and engage in conversation on many topics. I'm here to help with whatever you need! How can I assist you today? Time: 8.103902 RAGFlow(user)> stream think chat with 'deepseek-v3.2@test@baidu' message 'who r u' Thinking: mm, the user is asking "who r u" with casual spelling. This is a straightforward identity question. should give a clear, friendly introduction without overcomplicating it. Can start with my core function as an AI assistant, mention my creator, and briefly state my key capabilities. response should be welcoming and invite further interaction since this seems like an introductory question. Keeping it concise but covering the essentials: who I am, what I do, and how I can help. Answer: ! I am DeepSeek, an AI assistant created by DeepSeek Company. I'm designed to help answer questions, provide information, assist with various tasks, and engage in conversations on a wide range of topics. I'm here to assist you with whatever you need - whether it's answering questions, helping with analysis, writing, coding, or just having a friendly chat!Is there anything specific I can help you with today? 😊 Time: 7.219703 RAGFlow(user)> list supported models from 'baidu' 'test' +--------------------------------------+ \| model_name \| +--------------------------------------+ \| ernie-3.5-8k-preview \| \| ernie-4.0-8k \| \| ernie-4.0-turbo-8k-latest \| \| ernie-4.0-turbo-8k-preview \| \| ernie-4.0-8k-preview \| \| ernie-speed-pro-128k \| \| ernie-char-fiction-8k \| \| ernie-3.5-8k \| \| ernie-3.5-128k \| \| ernie-lite-pro-128k \| \| ernie-novel-8k \| \| ernie-4.0-turbo-8k \| \| ernie-4.0-turbo-128k \| \| ernie-4.0-8k-latest \| \| irag-1.0 \| \| ........... \| \| glm-5.1 \| \| ernie-image-turbo \| \| deepseek-v4-pro \| \| deepseek-v4-flash \| \| ernie-5.1 \| +--------------------------------------+ RAGFlow(user)> check instance 'test' from 'baidu' SUCCESS ``` Additionally, this PR fixes an incorrect error message typo: Before: ```go fmt.Errorf("API requestssss failed with status %d: %s : %s", ...) ``` After: ```go fmt.Errorf("API request failed with status %d: %s", ...) ``` This PR mainly improves provider compatibility, API completeness, and runtime stability. ### Type of change * [x] Bug Fix (non-breaking change which fixes an issue) * [x] New Feature (non-breaking change which adds functionality) * [x] Refactoring	2026-05-09 19:21:13 +08:00
Liu An	57b24be6d6	Docs: Update version references to v0.25.2 in READMEs and docs (#14731 ) ### What problem does this PR solve? - Update version tags in README files (including translations) from v0.25.1 to v0.25.2 - Modify Docker image references and documentation to reflect new version - Update version badges and image descriptions - Maintain consistency across all language variants of README files ### Type of change - [x] Documentation Update v0.25.2	2026-05-09 19:06:05 +08:00
writinwaters	a3de873617	Docs: Updated release date (#14740 ) ### What problem does this PR solve? Updated v0.25.2 release date. ### Type of change - [x] Documentation Update	2026-05-09 18:49:33 +08:00
euvre	f4b8f53b6d	Fix: restore embedding model switching for datasets with existing chunks (#14732 ) ### What problem does this PR solve? ## Problem During the REST API refactoring (#13690), the `/api/v2/kb/check_embedding` endpoint was removed and never migrated to the new RESTful structure. The frontend was pointed to the `/api/v1/datasets/{id}/embedding` endpoint (which is `run_embedding` — a completely different function). Additionally, a hard guard was introduced that rejects any `embd_id` change when `chunk_num > 0`, making it impossible to switch embedding models on datasets with existing chunks. ## Root Cause 1. Missing endpoint: The old `check_embedding` logic (sample random chunks, re-embed with the new model, compare cosine similarity) was not carried over to the new REST API service layer. 2. Wrong frontend URL: `checkEmbedding` in `api.ts` pointed to `/datasets/{id}/embedding` (`run_embedding`) instead of a dedicated check endpoint. 3. Overly restrictive guard: `dataset_api_service.py` line 310 blocked all `embd_id` updates when `chunk_num > 0`. This check did not exist in the pre-refactor code — it was incorrectly introduced during the refactor. ## Changes ### Backend - `api/apps/services/dataset_api_service.py` - Remove the `chunk_num > 0` hard guard on `embd_id` updates - Add `check_embedding()` service function: samples random chunks, re-embeds them with the candidate model, computes cosine similarity, returns compatibility result (avg ≥ 0.9 = compatible) - Add `import re` for the `_clean()` helper - `api/apps/restful_apis/dataset_api.py` - Add `POST /datasets/<dataset_id>/embedding/check` endpoint following the new REST API conventions - Clean up unused top-level imports (`random`, `re`, `numpy`) ### Frontend - `web/src/utils/api.ts` - Fix `checkEmbedding` URL from `/datasets/${datasetId}/embedding` → `/datasets/${datasetId}/embedding/check` ### Tests - `test/testcases/test_http_api/test_dataset_management/test_update_dataset.py` - Update `test_embedding_model_with_existing_chunks` to assert success (`code == 0`) instead of expecting the old `102` error - `test/testcases/test_web_api/test_dataset_management/test_dataset_sdk_routes_unit.py` - Update `test_update_route_branch_matrix_unit` to assert `RetCode.SUCCESS` when updating `embd_id` on a chunked dataset, replacing the old `chunk_num` error assertion ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Signed-off-by: noob <yixiao121314@outlook.com>	2026-05-09 18:48:57 +08:00

1 2 3 4 5 ...

6151 Commits