ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-05-23 09:28:06 +08:00

Author	SHA1	Message	Date
Idriss Sbaaoui	2508c46c8f	Playwright : add new test for configuration tab in datasets (#13365 ) ### What problem does this PR solve? this pr adds new tests, for the full configuration tab in datasests ### Type of change - [x] Other (please describe): new tests	2026-03-04 19:10:06 +08:00
少卿	54ae5b4a27	Fix Dify external retrieval by providing metadata.document_id (#13337 ) ### What problem does this PR solve? ## Summary Dify’s external retrieval expects `records[].metadata.document_id` to be a non-empty string. RAGFlow currently only sets `metadata.doc_id`, which causes Dify validation to fail. This PR adds `metadata.document_id` (mapped from `doc_id`) in the Dify-compatible retrieval response. ## Changes - Add `meta["document_id"] = c["doc_id"]` in `api/apps/sdk/dify_retrieval.py` ## Testing - Not run (logic-only change). ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-04 13:23:37 +08:00
statxc	839b603768	feat: Add PDF parser selection to Agent Begin and Await Response comp… (#13325 ) ### Issue: #12756 ### What problem does this PR solve? When users upload files through Agent's Begin or Await Response components, the parsing is hardcoded to "Plain Text", ignoring all other available parsers (DeepDOC, TCADP, Docling, MinerU, PaddleOCR). This PR adds a PDF parser dropdown to these components so users can select the appropriate parser for their file inputs. ### Changes Backend - `agent/component/fillup.py` - Added `layout_recognize` param to `UserFillUpParam`, forwarded to `FileService.get_files()` - `agent/component/begin.py` - Same forwarding in `Begin._invoke()` - `agent/canvas.py` - Extract Begin's `layout_recognize` for `sys.files` parsing, added param to `get_files_async()` / `get_files()` - `api/db/services/file_service.py` - Added `layout_recognize` param to `parse()` and `get_files()`, replacing hardcoded `"Plain Text"` - `rag/app/naive.py` - Added `"plain text"` and `"tcadp parser"` aliases to PARSERS dict to match dropdown values after `.lower()` Frontend - `web/src/pages/agent/form/begin-form/index.tsx` - Show `LayoutRecognizeFormField` dropdown when file inputs exist - `web/src/pages/agent/form/begin-form/schema.ts` - Added `layout_recognize` to Zod schema - `web/src/pages/agent/form/user-fill-up-form/index.tsx` - Same dropdown for Await Response component ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-04 11:09:33 +08:00
Magicbook1108	4f09b3e2a4	Fix: pipeline canvas category (#13319 ) ### What problem does this PR solve? Fix: pipeline canvas category ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-02 20:27:36 +08:00
Magicbook1108	5fc3bd38b0	Feat: Support siliconflow.com (#13308 ) ### What problem does this PR solve? Feat: Support siliconflow.com ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-03-02 15:37:42 +08:00
Ahmad Intisar	184388879d	feat: Add `disable_password_login` configuration to support SSO-only authentication (#13151 ) ### What problem does this PR solve? Enterprise deployments that use an external Identity Provider (e.g., Microsoft Entra ID, Okta, Keycloak) need the ability to enforce SSO-only authentication by hiding the email/password login form. Currently, the login page always shows the password form alongside OAuth buttons, with no way to disable it. This PR adds a `disable_password_login` configuration option under the existing `authentication` section in `service_conf.yaml`. When set to `true`, the login page only displays configured OAuth/SSO buttons and hides the email/password form, "Remember me" checkbox, and "Sign up" link. The flag can be set via: - `service_conf.yaml` (`authentication.disable_password_login: true`) - Environment variable (`DISABLE_PASSWORD_LOGIN=true`) Default behavior is unchanged (`false`). ### Behavior \| `disable_password_login` \| OAuth configured \| Result \| \|---\|---\|---\| \| `false` (default) \| No \| Standard email/password form \| \| `false` \| Yes \| Email/password form + SSO buttons below \| \| `true` \| Yes \| SSO buttons only (no form, no sign up link) \| \| `true` \| No \| Empty card (admin should configure OAuth first) \| ### Type of change - [x] New Feature (non-breaking change which adds functionality) ### Files changed (5) 1. `docker/service_conf.yaml.template` — added `disable_password_login: false` under authentication 2. `common/settings.py` — added `DISABLE_PASSWORD_LOGIN` global variable and loader in `init_settings()` 3. `common/config_utils.py` — fixed `TypeError` in `show_configs()` when authentication section contains non-dict values (e.g., booleans) 4. `api/apps/system_app.py` — exposed `disablePasswordLogin` flag in `/config` endpoint 5. `web/src/pages/login/index.tsx` — conditionally render password form based on config flag; OAuth buttons always render when channels exist --------- Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>	2026-03-02 14:06:03 +08:00
Attili-sys	21bc1ab7ec	Feature rtl support (#13118 ) ### What problem does this PR solve? This PR adds comprehensive Right-to-Left (RTL) language support, primarily targeting Arabic and other RTL scripts (Hebrew, Persian, Urdu, etc.). Previously, RTL content had multiple rendering issues: - Incorrect sentence splitting for Arabic punctuation in citation logic - Misaligned text in chat messages and markdown components - Improper positioning of blockquotes and “think” sections - Incorrect table alignment - Citation placement ambiguity in RTL prompts - UI layout inconsistencies when mixing LTR and RTL text This PR introduces backend and frontend improvements to properly detect, render, and style RTL content while preserving existing LTR behavior. #### Backend - Updated sentence boundary regex in `rag/nlp/search.py` to include Arabic punctuation: - `،` (comma) - `؛` (semicolon) - `؟` (question mark) - `۔` (Arabic full stop) - Ensures citation insertion works correctly in RTL sentences. - Updated citation prompt instructions to clarify citation placement rules for RTL languages. #### Frontend - Introduced a new utility: `text-direction.ts` - Detects text direction based on Unicode ranges. - Supports Arabic, Hebrew, Syriac, Thaana, and related scripts. - Provides `getDirAttribute()` for automatic `dir` assignment. - Applied dynamic `dir` attributes across: - Markdown rendering - Chat messages - Search results - Tables - Hover cards and reference popovers - Added proper RTL styling in LESS: - Text alignment adjustments - Blockquote border flipping - Section indentation correction - Table direction switching - Use of `<bdi>` for figure labels to prevent bidirectional conflicts #### DevOps / Environment - Added Windows backend launch script with retry handling. - Updated dependency metadata. - Adjusted development-only React debugging behavior. --- ### Type of change - [x] Bug Fix (non-breaking change which fixes RTL rendering and citation issues) - [x] New Feature (non-breaking change which adds RTL detection and dynamic direction handling) --------- Co-authored-by: 6ba3i <isbaaoui09@gmail.com> Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local> Co-authored-by: Ahmad Intisar <168020872+ahmadintisar@users.noreply.github.com> Co-authored-by: Liu An <asiro@qq.com>	2026-03-02 13:03:44 +08:00
Idriss Sbaaoui	9d78d3ddb1	Tests: fix failling http in CI (#13301 ) ### What problem does this PR solve? test_doc_sdk_routes_unit had two flaky/incorrect branch assumptions: 1. parse/stop_parsing production logic gates on doc.run, but tests used progress, causing branch mismatch and unintended fallthrough into mutation/DB paths. 2. stop_parsing invalid-state test asserted an outdated message fragment, making the contract brittle. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-03-02 10:44:33 +08:00
Magicbook1108	1027916bfe	Fix: inconsistent state handling for multi-user single-canvas access (#13267 ) ### What problem does this PR solve? <img width="700" alt="image" src="https://github.com/user-attachments/assets/1db7412e-4554-44bc-84ba-16421949aacc" /> ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>	2026-02-28 15:09:21 +08:00
天海蒼灆	983150b936	Fix (api): fix the document parsing status check logic (#12504 ) ### What problem does this PR solve? When the original code terminates the parsing task halfway, the progress may not be 0 or 1, which will result in the inability to call the interface to parse again -Change the document parsing progress check to task status check, and use TaskStatus.RUNNING.value to judge -Update the condition judgment for stopping parsing documents, and check whether the task is running instead ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-28 14:38:55 +08:00
Jin Hai	54094771a3	Fix streaming chat on web API (#13275 ) ### What problem does this PR solve? This pull request makes a small but important fix to how streaming requests are handled in the `completion` endpoint of `conversation_app.py`. The main change ensures that the `stream` argument is not passed twice, which could cause errors. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2026-02-28 12:16:38 +08:00
Yongteng Lei	0110151e12	Fix: document remove race condition (#13242 ) ### What problem does this PR solve? Fix document remove race condition. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-28 11:23:24 +08:00
as-ondewo	194e076e26	Fix: init superuser can create duplicate users (#13221 ) ### What problem does this PR solve? This PR fixes 2 bugs related to RAGFlow's init superuser functionality. #### Bug 1 When the RAGFlow server was started with the `--init-superuser` option it would always create a new admin user even if it already exists resulting in duplicate users. To fix this, I added an additional check before create the superuser and added the unique constraint to the email column of the database, to mitigate potential TOCTOU race conditions. Since existing databases could contain duplicate emails I added email de-duplication to the database migration. #### Bug 2 When the RAGFlow server was started with the `--init-superuser` option but without configured default LLM and embedding models it would fail to start because the `init_superuser` function would always make test request to the models even if they were not set. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-27 19:55:51 +08:00
qinling0210	8b6d363a98	Use pagination in _search_metadata (#13238 ) ### What problem does this PR solve? Fix [#13210](https://github.com/infiniflow/ragflow/issues/13210) Remove limit in _search_metadata, use pagination in _search_metadata. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-27 11:24:49 +08:00
Magicbook1108	c03c537bf8	Feat: optimize gmail/google-drive (#13230 ) ### What problem does this PR solve? Feat: optimize gmail/google-drive Now: <img width="700" alt="image" src="https://github.com/user-attachments/assets/0c4b6044-7209-4c4f-ac0c-32070b79daf7" /> <img width="700" alt="image" src="https://github.com/user-attachments/assets/406f93d8-9b0f-4f5a-b8bb-3936990f558c" /> ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-02-26 19:19:40 +08:00
PandaMan	d43aebe701	Fix/13142 auto metadata (#13217 ) ### What problem does this PR solve? Close #13142 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-26 10:25:48 +08:00
He Wang	394ff16b66	fix: OceanBase metadata not returned in document list API (#13209 ) ### What problem does this PR solve? Fix #13144. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-25 15:29:17 +08:00
Phives	4ceb668d40	feat(api/utils): Harden file_utils for robustness and edge cases (#12915 ) ## Summary Improves robustness and edge-case handling in `api.utils.file_utils` to avoid crashes, DoS/OOM risks, and timeouts when processing user-provided filenames, paths, and file blobs. ## Changes ### Resource limits & timeouts - `MAX_BLOB_SIZE_THUMBNAIL` (50 MiB) and `MAX_BLOB_SIZE_PDF` (100 MiB) to reject oversized inputs before thumbnail/PDF processing. - `GHOSTSCRIPT_TIMEOUT_SEC` (120 s) for `repair_pdf_with_ghostscript` subprocess to avoid hangs on malicious or broken PDFs. ### `filename_type` - Handles `None`, empty string, non-string (e.g. int/list), and path-only input via new `_normalize_filename_for_type()`. - Uses basename for type detection (e.g. `a/b/c.pdf` → PDF). - Enforces `FILE_NAME_LEN_LIMIT`; invalid input returns `FileType.OTHER`. ### `thumbnail_img` - Rejects `None`/empty/oversized blob and invalid filename; returns `None` instead of raising. - Wraps PDF, image, and PPT handling in try/except so corrupt or malformed files return `None`. - Ensures PDF has pages and PPT has slides before use. - Normalizes PIL image mode (RGBA/P/LA → RGB) for safe PNG export. ### `repair_pdf_with_ghostscript` - Handles `None`/empty input; skips repair when input size exceeds limit. - Uses `subprocess.run(..., timeout=GHOSTSCRIPT_TIMEOUT_SEC)` and catches `TimeoutExpired`. - Returns original bytes when Ghostscript output is empty. ### `read_potential_broken_pdf` - `None` → `b""`; non–sequence-like (no `len`) → `b""`; empty → return as-is. - Oversized blob returned as-is (no repair) to avoid DoS. ### `sanitize_path` - Explicit `None` and non-string check; strips whitespace before normalizing. ## Testing - `test/unit_test/utils/test_api_file_utils.py` added with 36 unit tests covering the above behavior (filename_type, sanitize_path, read_potential_broken_pdf, thumbnail_img, thumbnail, repair_pdf_with_ghostscript, constants). - All tests pass. --------- Co-authored-by: Gittensor Miner <miner@gittensor.io>	2026-02-25 14:34:47 +08:00
Yongteng Lei	2bf2abfdbc	Fix: authorization bypass (IDOR) in /v1/document/web_crawl (#13203 ) ### What problem does this PR solve? Fix authorization bypass (IDOR) in `/v1/document/web_crawl` allows Cross-Tenant Dataset Modification. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-25 12:59:41 +08:00
Yongteng Lei	72b89304c1	Fix: LFI vulnerability in document parsing API (#13196 ) ### What problem does this PR solve? Fix LFI vulnerability in document parsing API. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-25 09:47:39 +08:00
PandaMan	f4cbdc3a3b	fix(api): MinIO health check use dynamic scheme and verify (Closes #13159 and #13158 ) (#13197 ) ## Summary Fixes MinIO SSL/TLS support in two places: the MinIO client connection and the health check used by the Admin/Service Health dashboard. Both now respect the `secure` and `verify` settings from the MinIO configuration. Closes #13158 Closes #13159 --- ## Problem #13158 – MinIO client: The client in `rag/utils/minio_conn.py` was hardcoded with `secure=False`, so RAGFlow could not connect to MinIO over HTTPS even when `secure: true` was set in config. There was also no way to disable certificate verification for self-signed certs. #13159 – MinIO health check: In `api/utils/health_utils.py`, the MinIO liveness check always used `http://` for the health URL. When MinIO was configured with SSL, the health check failed and the dashboard showed "timeout" even though MinIO was reachable over HTTPS. --- ## Solution ### MinIO client (`rag/utils/minio_conn.py`) - Read `MINIO.secure` (default `false`) and pass it into the `Minio()` constructor so HTTPS is used when configured. - Add `_build_minio_http_client()` that reads `MINIO.verify` (default `true`). When `verify` is false, return an `urllib3.PoolManager` with `cert_reqs=ssl.CERT_NONE` and pass it as `http_client` to `Minio()` so self-signed certificates are accepted. - Support string values for `secure` and `verify` (e.g. `"true"`, `"false"`). ### MinIO health check (`api/utils/health_utils.py`) - Add `_minio_scheme_and_verify()` to derive URL scheme (http/https) and the `verify` flag from `MINIO.secure` and `MINIO.verify`. - Update `check_minio_alive()` to use the correct scheme, pass `verify` into `requests.get(..., verify=verify)`, and use `timeout=10`. ### Config template (`docker/service_conf.yaml.template`) - Add commented optional MinIO keys `secure` and `verify` (and env vars `MINIO_SECURE`, `MINIO_VERIFY`) so deployers know they can enable HTTPS and optional cert verification. ### Tests - `test/unit_test/utils/test_health_utils_minio.py` – Tests for `_minio_scheme_and_verify()` and `check_minio_alive()` (scheme, verify, status codes, timeout, errors). - `test/unit_test/utils/test_minio_conn_ssl.py` – Tests for `_build_minio_http_client()` (verify true/false/missing, string values, `CERT_NONE` when verify is false). --- ## Testing - Unit tests added/updated as above; run with the project's test runner. - Manually: configure MinIO with HTTPS and `secure: true` (and optionally `verify: false` for self-signed); confirm client operations work and the Service Health dashboard shows MinIO as alive instead of timeout.	2026-02-25 09:47:12 +08:00
Yongteng Lei	c292d617ca	Fix: stored XSS via HTML File upload and inline Rendering in file get (#13202 ) ### What problem does this PR solve? Fix stored XSS via HTML file upload and inline rendering in /v1/file/get/<id> ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-25 09:46:48 +08:00
as-ondewo	0a7c520579	Fix: empty response from OpenAI chat completion endpoint (#13166 ) ### What problem does this PR solve? When using a chat assistant that has a hardcoded `empty_response`, that response was not returned correctly in streaming mode when no information is found in the knowledge base. In this case only one response with `"content": null` was yielded. If `"references": true`, then the `empty_response` is still put into the `final_content` so there is technically some content returned, but when `"references": false` no content at all is returned. I update the OpenAI chat completion endpoint to yield an additional response with the `empty_response` in the content. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-24 19:18:12 +08:00
Magicbook1108	5de92e57d3	Fix: 'None None' in log (#13192 ) ### What problem does this PR solve? Fix: 'None None' in log ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-24 19:15:20 +08:00
Magicbook1108	46dec98f52	Fix: Chat/Agent embedded page (#13199 ) ### What problem does this PR solve? Fix: Chat/Agent embedded page #13190 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-24 19:14:24 +08:00
tuandang-diag	d89ad8b79d	fix: handle null response in LLM and improve JSON parsing in agent (#13187 ) Fixes AttributeError in _remove_reasoning_content() when LLM returns None, and improves JSON parsing regex for markdown code fences in agent_with_tools.py	2026-02-24 13:15:09 +08:00
as-ondewo	91d1a81937	fix: error during admin tenant creation when using Postgres (#13164 ) ### What problem does this PR solve? This fixes the bug described in #13130. When starting RAGFlow with Postgres the admin tenant create failed because the rerank model was not set. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-24 10:57:31 +08:00
Levi	6d6c54db19	fix(metadata): handle unhashable list values in metadata split (#13116 ) ### What problem does this PR solve? This PR fixes missing metadata on documents synced from the Moodle connector, especially for Book modules. Background: - Moodle Book metadata includes fields like `chapters`, which is a `list[dict]`. - During metadata normalization in `DocMetadataService._split_combined_values`, list deduplication used `dict.fromkeys(...)`. - `dict.fromkeys(...)` fails for unhashable values (like `dict`), causing metadata update to fail. - Result: documents were imported, but metadata was not saved for affected module types (notably Books). What this PR changes: - Replaces hash-based list deduplication with `dedupe_list(...)`, which safely handles unhashable list items while preserving order. - This allows Book metadata (and other complex list metadata) to be persisted correctly. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [ ] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Contribution during my time at RAGcon GmbH.	2026-02-12 19:48:51 +08:00
Lynn	6e7bcf58bc	Refactor: split message apis to gateway and service (#13126 ) ### What problem does this PR solve? Split message apis to gateway and service ### Type of change - [x] Refactoring	2026-02-12 14:43:52 +08:00
Lynn	30d5fc1a07	Refactor: split memory API into gateway and service layers (#13111 ) ### What problem does this PR solve? Decouple the memory API into a gateway layer (for routing/param parse) and a service layer (for business logic). ### Type of change - [x] Refactoring	2026-02-12 10:11:50 +08:00
Magicbook1108	109441628b	Fix: upload image files (#13071 ) ### What problem does this PR solve? Fix: upload image files ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-11 09:47:33 +08:00
akie	6f785e06a4	Fix issue #13084 (#13088 ) When match_expressions contains coroutine objects (from GraphRAG's Dealer.get_vector()), the code cannot identify this type because it only checks for MatchTextExpr, MatchDenseExpr, or FusionExpr. As a result: score_func remains initialized as an empty string "" This empty string is appended to the output list The output list is passed to Infinity SDK's table_instance.output() method Infinity's SQL parser (via sqlglot) fails to parse the empty string, throwing a ParseError	2026-02-10 17:04:45 +08:00
Kevin Hu	9bc16d8df2	Fix: agent files issue, (#13067 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-09 19:52:52 +08:00
6ba3i	fabbfcab90	Fix: failing p3 test for SDK/HTTP APIs (#13062 ) ### What problem does this PR solve? Adjust highlight parsing, add row-count SQL override, tweak retrieval thresholding, and update tests with engine-aware skips/utilities. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-09 14:56:10 +08:00
Kevin Hu	e51a40fdfc	Fix: launch an agent. (#13039 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-09 10:08:36 +08:00
Magicbook1108	301ed76aa4	Fix: task cancel (#13034 ) ### What problem does this PR solve? Fix: task cancel #11745 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-06 14:48:24 +08:00
Kevin Hu	1262533b74	Feat: support verify to set llm key and boost bigrams. (#12980 ) #12863 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-02-05 19:19:09 +08:00
Magicbook1108	0a08fc7b07	Fix: example code in session.py (#13004 ) ### What problem does this PR solve? Fix: example code in session.py #12950 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) --------- Co-authored-by: Levi <stupse-tipp0j@icloud.com> Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com> Co-authored-by: Liu An <asiro@qq.com>	2026-02-05 15:56:58 +08:00
Levi	803b480f9c	feat: Add optional document metadata in OpenAI-compatible response references (#12950 ) ### What problem does this PR solve? This PR adds an opt‑in way to include document‑level metadata in OpenAI‑compatible reference chunks. Until now, metadata could be used for filtering but wasn’t returned in responses. The change enables clients to show richer citations (author/year/source, etc.) while keeping payload size and privacy under control via an explicit request flag and optional field allowlist. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): Contribution during my time at RAGcon GmbH.	2026-02-05 09:54:33 +08:00
BitToby	4d4b5a978d	feat: enable multi-file upload for chat and agent workflows (#12977 ) ### Closes: #12921 ### What problem does this PR solve? Previously, multi-file upload was not working correctly across the application: - Chat: UI displayed "Upload max 5 files" but only the first file was actually uploaded - Agent conversational mode: Frontend sent multiple files but backend only processed one - Agent task-mode file inputs: Explicitly limited to single file only This PR enables proper multi-file upload support for both chat and agent workflows, allowing users to upload and process multiple files (up to 5) as the UI originally suggested. Changes: - `web/src/pages/next-chats/hooks/use-upload-file.ts`: Process all files instead of only `files[0]` - `api/apps/canvas_app.py`: Handle multiple files via `files.getlist("file")` - `web/src/pages/agent/debug-content/uploader.tsx`: Allow up to 5 files with `multiple={true}` - `agent/component/begin.py` & `fillup.py`: Support file arrays while maintaining backward compatibility ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2026-02-04 18:03:21 +08:00
Magicbook1108	a37d287fad	Fix: pdf chunking / table rotation (#12981 ) ### What problem does this PR solve? Fix: PDF chunking issue for single-page documents Refactor: Change the default refresh frequency to 5 Fix: Add a 0-degree threshold; require other rotation angles to exceed it by at least 0.2 Fix: Put connector name tips to correct place Fix: incorrect example response in delete datasets. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] Refactoring	2026-02-04 17:00:25 +08:00
qinling0210	205ae769bb	Fix "metadata table not exists" (#12949 ) ### What problem does this PR solve? Fix "metadata table not exists" when updating a meta data. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-03 17:28:10 +08:00
Lynn	32f9a87b2e	Fix: default admin tenant (#12964 ) ### What problem does this PR solve? Add tenant for default admin, and allow login to ragflow server as default admin. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-02-03 15:37:36 +08:00
zhanglei	7cbe8b5b53	feat: Add a custom header to the SDK for chatting with the agent. (#12430 ) ### What problem does this PR solve? As title. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Co-authored-by: Liu An <asiro@qq.com>	2026-02-03 11:01:18 +08:00
eviaaaaa	2e5a18602b	refactor: optimize agent list payload and improve multimodal detection logic (#12942 ) ## Description This PR focuses on API performance optimization and refining the model capability detection logic in the Agent/Canvas module. ### 1. Performance Optimization (Backend) - Changes: Removed `cls.model.dsl` from query fields in `UserCanvasService.get_by_tenant_ids`. - Reasoning: The `dsl` object is large and unnecessary for the Agent list view. Excluding it reduces the payload size of the `/v1/canvas/list` API, leading to faster serialization and reduced network latency. - Consistency: Full DSL data remains accessible via the individual `/v1/canvas/get/<id>` endpoint used in the detail view. ### 2. Multimodal Detection Refinement (Frontend) - Changes: Replaced `model_type === LlmModelType.Image2text` with `tags?.includes('IMAGE2TEXT')`. - Reasoning: In RAGFlow, `model_type` defines the primary role of a model (e.g., `chat`). However, many advanced Chat models are also vision-capable. Since `model_type` is a single-value field, it cannot represent these multiple capabilities. - Solution: Utilizing the `tags` field (which supports multiple attributes) to check for `IMAGE2TEXT` ensures that models like `gpt-5.2-pro` correctly display multimodal input options. ## Type of Change - [x] Bug fix (logic correction for multimodal detection) - [x] Optimization (performance improvement for list API) ## Main Changes - `api/db/services/canvas_service.py`: Optimized DB query by excluding heavy DSL fields. - `web/src/pages/agent/form/agent-form/index.tsx`: Enhanced capability detection using the tags system. ## Verification - [x] Verified Agent list loads faster with reduced response payload. - [x] Confirmed that `chat` models with the `IMAGE2TEXT` tag now correctly enable the multimodal input UI.	2026-02-02 17:35:54 +08:00
Liu An	1b587013d8	Fix: remove unused imports and f-string formatting (#12935 ) ### What problem does this PR solve? - Remove unused imports (Mock, patch, MagicMock, json, os, RAGFLOW_COLUMNS, VECTOR_FIELD_PATTERN) from multiple files - Replace f-string formatting with regular strings for console output messages in cli.py - Clean up unnecessary imports that were no longer being used in the codebase ### Type of change - [x] Refactoring	2026-02-02 12:11:39 +08:00
NTLx	c4c3f744c0	feat: add Peewee ORM support for OceanBase as primary database (#12769 ) (#12926 ) ## Summary This PR adds Peewee ORM support for OceanBase as the primary database in RAGFlow, as requested in issue #12769. ## Changes ### Core Implementation 1. RetryingPooledOceanBaseDatabase Class - Inherits from `PooledMySQLDatabase` (OceanBase is MySQL-compatible) - Implements retry mechanism for connection issues - Handles MySQL-specific error codes (2013, 2006 for connection loss) - Provides connection pool management 2. PooledDatabase Enum - Added `OCEANBASE = RetryingPooledOceanBaseDatabase` 3. DatabaseLock Enum - Added `OCEANBASE = MysqlDatabaseLock` - OceanBase uses MySQL-style locking 4. TextFieldType Enum - Added `OCEANBASE = "LONGTEXT"` - OceanBase uses same text field type as MySQL 5. DatabaseMigrator Enum - Added `OCEANBASE = MySQLMigrator` - OceanBase uses MySQL migration tools ### Usage ```bash # Set environment variable to use OceanBase export DB_TYPE=oceanbase # Configure connection (in docker/.env or environment) OCEANBASE_HOST=localhost OCEANBASE_PORT=2881 OCEANBASE_USER=root OCEANBASE_PASSWORD=password OCEANBASE_DATABASE=ragflow ``` ### Technical Details - Location: `api/db/db_models.py` - Dependencies: No new dependencies (uses existing Peewee MySQL support) - Code Size: ~90 lines - Difficulty: Simple ### Testing - Added comprehensive unit tests in `tests/unit/test_oceanbase_peewee.py` - Tests cover: - OceanBase database class existence and inheritance - Enum values for PooledDatabase, DatabaseLock, TextFieldType - Initialization with custom retry settings - Environment variable configuration ### Acceptance Criteria ✅ Can switch to OceanBase database via `DB_TYPE=oceanbase` environment variable ✅ All database operations work normally in OceanBase environment ✅ OceanBase uses MySQL compatibility mode (no additional dependencies) ### Background This is part of the RAGFlow + OceanBase Hackathon to allow users to choose OceanBase as RAGFlow's primary database, leveraging OceanBase's high availability and scalability. --- ## Related Issues - Primary: https://github.com/infiniflow/ragflow/issues/12769 - Context: https://github.com/oceanbase/seekdb/issues/123 (OceanBase Developer Challenge) --- Closes infiniflow/ragflow#12769	2026-01-31 15:45:20 +08:00
Carve_	23bdf25a1f	feature:Add OceanBase Storage Support for Table Parser (#12923 ) ### What problem does this PR solve? close #12770 This PR adds OceanBase as a storage backend for the Table Parser. It enables dynamic table schema storage via JSON and implements OceanBase SQL execution for text-to-SQL retrieval. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### Changes - Table Parser stores row data into `chunk_data` when doc engine is OceanBase. (table.py) - OceanBase table schema adds `chunk_data` JSON column and migrates if needed. - Implemented OceanBase `sql()` to execute text-to-SQL results. (ob_conn.py) - Add `DOC_ENGINE_OCEANBASE` flag for engine detection (setting.py) ### Test 1. Set `DOC_ENGINE=oceanbase` (e.g. in `docker/.env`) <img width="1290" height="783" alt="doc_engine_ob" src="https://github.com/user-attachments/assets/7d1c609f-7bf2-4b2e-b4cc-4243e72ad4f1" /> 2. Upload an Excel file to Knowledge Base.(for test, we use as below) <img width="786" height="930" alt="excel" src="https://github.com/user-attachments/assets/bedf82f2-cd00-426b-8f4d-6978a151231a" /> 3. Choose Table as parsing method. <img width="2550" height="1134" alt="parse_excel" src="https://github.com/user-attachments/assets/aba11769-02be-4905-97e1-e24485e24cd0" /> 4.Ask a natural language query in chat. <img width="2550" height="1134" alt="query" src="https://github.com/user-attachments/assets/26a910a6-e503-4ac7-b66a-f5754bbb0e91" />	2026-01-31 15:11:54 +08:00
Carve_	ee23b9eb63	feature:Add OceanBase Support to Text-to-SQL Agent (#12919 ) ### What problem does this PR solve? Close #12768. This PR adds OceanBase support to RAGFlow’s Text-to-SQL (ExeSQL) component. OceanBase is integrated via MySQL compatibility mode, and the UI `db_type` options are updated accordingly. ### Type of change - [ ] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality) - [ ] Documentation Update - [ ] Refactoring - [ ] Performance Improvement - [ ] Other (please describe): ### Changes Backend - Add `oceanbase` `db_type` validation and connection logic in `exesql.py` and reuse existing MySQL compatibility mode Frontend - Add OceanBase option to the ExeSQL `db_type` selector ### How to test 1. Configure OceanBase connection in ExeSQL node (host/port/user/password/database) 2. Input: “Show 10 rows from test table” 3. Generated SQL: `SELECT * FROM test LIMIT 10;` 4. Query executes successfully and results are returned ### Screenshots - ExeSQL db_type includes OceanBase <img width="649" height="1015" alt="2" src="https://github.com/user-attachments/assets/e0a5f7b9-e282-402a-8639-64c1aef8fce6" /> - ExeSQL test OceanBase connection <img width="2247" height="1140" alt="test_ob" src="https://github.com/user-attachments/assets/f16ebd93-b48e-4d18-b53f-8496581e755d" /> - Query results from OceanBase shown in UI <img width="2550" height="1351" alt="1" src="https://github.com/user-attachments/assets/b44163dc-baab-420d-b31e-b644bdcb77a9" />	2026-01-31 15:03:40 +08:00
qinling0210	212d6f3660	Fix metadata in get_list() (#12906 ) ### What problem does this PR solve? test_update_document.py failed as metadata is not included in the response of get_list(), fix the issue. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2026-01-30 14:06:49 +08:00

1 2 3 4 5 ...

1385 Commits