Commit Graph

1385 Commits

Author SHA1 Message Date
2508c46c8f Playwright : add new test for configuration tab in datasets (#13365)
### What problem does this PR solve?

this pr adds new tests, for the full configuration tab in datasests

### Type of change

- [x] Other (please describe): new tests
2026-03-04 19:10:06 +08:00
54ae5b4a27 Fix Dify external retrieval by providing metadata.document_id (#13337)
### What problem does this PR solve?

## Summary                                                           
  Dify’s external retrieval expects `records[].metadata.document_id` to
  be a non-empty string.                                               
  RAGFlow currently only sets `metadata.doc_id`, which causes Dify     
  validation to fail.                                                  
                                                                       
  This PR adds `metadata.document_id` (mapped from `doc_id`) in the    
  Dify-compatible retrieval response.                                  
                                                                       
  ## Changes                                                           
- Add `meta["document_id"] = c["doc_id"]` in
`api/apps/sdk/dify_retrieval.py`
                                                                       
  ## Testing                                                           
  - Not run (logic-only change).

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-04 13:23:37 +08:00
839b603768 feat: Add PDF parser selection to Agent Begin and Await Response comp… (#13325)
### Issue: #12756

### What problem does this PR solve?

When users upload files through Agent's Begin or Await Response
components, the parsing is hardcoded to "Plain Text", ignoring all other
available parsers (DeepDOC, TCADP, Docling, MinerU, PaddleOCR). This PR
adds a PDF parser dropdown to these components so users can select the
appropriate parser for their file inputs.


### Changes

**Backend**
- `agent/component/fillup.py` - Added `layout_recognize` param to
`UserFillUpParam`, forwarded to `FileService.get_files()`
- `agent/component/begin.py` - Same forwarding in `Begin._invoke()`
- `agent/canvas.py` - Extract Begin's `layout_recognize` for `sys.files`
parsing, added param to `get_files_async()` / `get_files()`
- `api/db/services/file_service.py` - Added `layout_recognize` param to
`parse()` and `get_files()`, replacing hardcoded `"Plain Text"`
- `rag/app/naive.py` - Added `"plain text"` and `"tcadp parser"` aliases
to PARSERS dict to match dropdown values after `.lower()`

**Frontend**
- `web/src/pages/agent/form/begin-form/index.tsx` - Show
`LayoutRecognizeFormField` dropdown when file inputs exist
- `web/src/pages/agent/form/begin-form/schema.ts` - Added
`layout_recognize` to Zod schema
- `web/src/pages/agent/form/user-fill-up-form/index.tsx` - Same dropdown
for Await Response component


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-04 11:09:33 +08:00
4f09b3e2a4 Fix: pipeline canvas category (#13319)
### What problem does this PR solve?

Fix: pipeline canvas category

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-02 20:27:36 +08:00
5fc3bd38b0 Feat: Support siliconflow.com (#13308)
### What problem does this PR solve?

Feat: Support siliconflow.com

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-02 15:37:42 +08:00
184388879d feat: Add disable_password_login configuration to support SSO-only authentication (#13151)
### What problem does this PR solve?

Enterprise deployments that use an external Identity Provider (e.g.,
Microsoft Entra ID, Okta, Keycloak) need the ability to enforce SSO-only
authentication by hiding the email/password login form. Currently, the
login page always shows the password form alongside OAuth buttons, with
no way to disable it.

This PR adds a `disable_password_login` configuration option under the
existing `authentication` section in `service_conf.yaml`. When set to
`true`, the login page only displays configured OAuth/SSO buttons and
hides the email/password form, "Remember me" checkbox, and "Sign up"
link.

The flag can be set via:
- `service_conf.yaml` (`authentication.disable_password_login: true`)
- Environment variable (`DISABLE_PASSWORD_LOGIN=true`)

Default behavior is unchanged (`false`).

### Behavior

| `disable_password_login` | OAuth configured | Result |
|---|---|---|
| `false` (default) | No | Standard email/password form |
| `false` | Yes | Email/password form + SSO buttons below |
| `true` | Yes | **SSO buttons only** (no form, no sign up link) |
| `true` | No | Empty card (admin should configure OAuth first) |

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

### Files changed (5)

1. `docker/service_conf.yaml.template` — added `disable_password_login:
false` under authentication
2. `common/settings.py` — added `DISABLE_PASSWORD_LOGIN` global variable
and loader in `init_settings()`
3. `common/config_utils.py` — fixed `TypeError` in `show_configs()` when
authentication section contains non-dict values (e.g., booleans)
4. `api/apps/system_app.py` — exposed `disablePasswordLogin` flag in
`/config` endpoint
5. `web/src/pages/login/index.tsx` — conditionally render password form
based on config flag; OAuth buttons always render when channels exist

---------

Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
2026-03-02 14:06:03 +08:00
21bc1ab7ec Feature rtl support (#13118)
### What problem does this PR solve?

This PR adds comprehensive **Right-to-Left (RTL) language support**,
primarily targeting Arabic and other RTL scripts (Hebrew, Persian, Urdu,
etc.).

Previously, RTL content had multiple rendering issues:

- Incorrect sentence splitting for Arabic punctuation in citation logic
- Misaligned text in chat messages and markdown components  
- Improper positioning of blockquotes and “think” sections  
- Incorrect table alignment  
- Citation placement ambiguity in RTL prompts  
- UI layout inconsistencies when mixing LTR and RTL text  

This PR introduces backend and frontend improvements to properly detect,
render, and style RTL content while preserving existing LTR behavior.

#### Backend
- Updated sentence boundary regex in `rag/nlp/search.py` to include
Arabic punctuation:
  - `،` (comma)
  - `؛` (semicolon)
  - `؟` (question mark)
  - `۔` (Arabic full stop)
- Ensures citation insertion works correctly in RTL sentences.
- Updated citation prompt instructions to clarify citation placement
rules for RTL languages.

#### Frontend
- Introduced a new utility: `text-direction.ts`
  - Detects text direction based on Unicode ranges.
  - Supports Arabic, Hebrew, Syriac, Thaana, and related scripts.
  - Provides `getDirAttribute()` for automatic `dir` assignment.

- Applied dynamic `dir` attributes across:
  - Markdown rendering
  - Chat messages
  - Search results
  - Tables
  - Hover cards and reference popovers

- Added proper RTL styling in LESS:
  - Text alignment adjustments
  - Blockquote border flipping
  - Section indentation correction
  - Table direction switching
  - Use of `<bdi>` for figure labels to prevent bidirectional conflicts

#### DevOps / Environment
- Added Windows backend launch script with retry handling.
- Updated dependency metadata.
- Adjusted development-only React debugging behavior.

---

### Type of change

- [x] Bug Fix (non-breaking change which fixes RTL rendering and
citation issues)
- [x] New Feature (non-breaking change which adds RTL detection and
dynamic direction handling)

---------

Co-authored-by: 6ba3i <isbaaoui09@gmail.com>
Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
Co-authored-by: Ahmad Intisar <168020872+ahmadintisar@users.noreply.github.com>
Co-authored-by: Liu An <asiro@qq.com>
2026-03-02 13:03:44 +08:00
9d78d3ddb1 Tests: fix failling http in CI (#13301)
### What problem does this PR solve?
test_doc_sdk_routes_unit had two flaky/incorrect branch assumptions:

1. parse/stop_parsing production logic gates on doc.run, but tests used
progress, causing branch mismatch and unintended fallthrough into
mutation/DB paths.
2. stop_parsing invalid-state test asserted an outdated message
fragment, making the contract brittle.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-02 10:44:33 +08:00
1027916bfe Fix: inconsistent state handling for multi-user single-canvas access (#13267)
### What problem does this PR solve?

<img width="700" alt="image"
src="https://github.com/user-attachments/assets/1db7412e-4554-44bc-84ba-16421949aacc"
/>

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-02-28 15:09:21 +08:00
983150b936 Fix (api): fix the document parsing status check logic (#12504)
### What problem does this PR solve?
When the original code terminates the parsing task halfway, the progress
may not be 0 or 1, which will result in the inability to call the
interface to parse again

-Change the document parsing progress check to task status check, and
use TaskStatus.RUNNING.value to judge
-Update the condition judgment for stopping parsing documents, and check
whether the task is running instead


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-28 14:38:55 +08:00
54094771a3 Fix streaming chat on web API (#13275)
### What problem does this PR solve?

This pull request makes a small but important fix to how streaming
requests are handled in the `completion` endpoint of
`conversation_app.py`. The main change ensures that the `stream`
argument is not passed twice, which could cause errors.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-02-28 12:16:38 +08:00
0110151e12 Fix: document remove race condition (#13242)
### What problem does this PR solve?

Fix document remove race condition.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-28 11:23:24 +08:00
194e076e26 Fix: init superuser can create duplicate users (#13221)
### What problem does this PR solve?

This PR fixes 2 bugs related to RAGFlow's init superuser functionality.

#### Bug 1

When the RAGFlow server was started with the `--init-superuser` option
it would always create a new admin user even if it already exists
resulting in duplicate users.

To fix this, I added an additional check before create the superuser and
added the *unique* constraint to the email column of the database, to
mitigate potential TOCTOU race conditions. Since existing databases
could contain duplicate emails I added email de-duplication to the
database migration.

#### Bug 2

When the RAGFlow server was started with the `--init-superuser` option
but without configured default LLM and embedding models it would fail to
start because the `init_superuser` function would always make test
request to the models even if they were not set.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-27 19:55:51 +08:00
8b6d363a98 Use pagination in _search_metadata (#13238)
### What problem does this PR solve?

Fix [#13210](https://github.com/infiniflow/ragflow/issues/13210)

Remove limit in _search_metadata, use pagination in _search_metadata.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-27 11:24:49 +08:00
c03c537bf8 Feat: optimize gmail/google-drive (#13230)
### What problem does this PR solve?

Feat: optimize gmail/google-drive

Now:
<img width="700" alt="image"
src="https://github.com/user-attachments/assets/0c4b6044-7209-4c4f-ac0c-32070b79daf7"
/>
<img width="700" alt="image"
src="https://github.com/user-attachments/assets/406f93d8-9b0f-4f5a-b8bb-3936990f558c"
/>


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-02-26 19:19:40 +08:00
d43aebe701 Fix/13142 auto metadata (#13217)
### What problem does this PR solve?

Close #13142

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-26 10:25:48 +08:00
394ff16b66 fix: OceanBase metadata not returned in document list API (#13209)
### What problem does this PR solve?

Fix #13144.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-25 15:29:17 +08:00
4ceb668d40 feat(api/utils): Harden file_utils for robustness and edge cases (#12915)
## Summary
Improves robustness and edge-case handling in `api.utils.file_utils` to
avoid crashes, DoS/OOM risks, and timeouts when processing user-provided
filenames, paths, and file blobs.

## Changes

### Resource limits & timeouts
- **`MAX_BLOB_SIZE_THUMBNAIL`** (50 MiB) and **`MAX_BLOB_SIZE_PDF`**
(100 MiB) to reject oversized inputs before thumbnail/PDF processing.
- **`GHOSTSCRIPT_TIMEOUT_SEC`** (120 s) for
`repair_pdf_with_ghostscript` subprocess to avoid hangs on malicious or
broken PDFs.

### `filename_type`
- Handles `None`, empty string, non-string (e.g. int/list), and
path-only input via new **`_normalize_filename_for_type()`**.
- Uses basename for type detection (e.g. `a/b/c.pdf` → PDF).
- Enforces **`FILE_NAME_LEN_LIMIT`**; invalid input returns
`FileType.OTHER`.

### `thumbnail_img`
- Rejects `None`/empty/oversized blob and invalid filename; returns
`None` instead of raising.
- Wraps PDF, image, and PPT handling in try/except so corrupt or
malformed files return `None`.
- Ensures PDF has pages and PPT has slides before use.
- Normalizes PIL image mode (RGBA/P/LA → RGB) for safe PNG export.

### `repair_pdf_with_ghostscript`
- Handles `None`/empty input; skips repair when input size exceeds
limit.
- Uses `subprocess.run(..., timeout=GHOSTSCRIPT_TIMEOUT_SEC)` and
catches `TimeoutExpired`.
- Returns original bytes when Ghostscript output is empty.

### `read_potential_broken_pdf`
- `None` → `b""`; non–sequence-like (no `len`) → `b""`; empty → return
as-is.
- Oversized blob returned as-is (no repair) to avoid DoS.

### `sanitize_path`
- Explicit `None` and non-string check; strips whitespace before
normalizing.

## Testing
- **`test/unit_test/utils/test_api_file_utils.py`** added with 36 unit
tests covering the above behavior (filename_type, sanitize_path,
read_potential_broken_pdf, thumbnail_img, thumbnail,
repair_pdf_with_ghostscript, constants).
- All tests pass.

---------

Co-authored-by: Gittensor Miner <miner@gittensor.io>
2026-02-25 14:34:47 +08:00
2bf2abfdbc Fix: authorization bypass (IDOR) in /v1/document/web_crawl (#13203)
### What problem does this PR solve?

Fix authorization bypass (IDOR) in `/v1/document/web_crawl` allows
Cross-Tenant Dataset Modification.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-25 12:59:41 +08:00
72b89304c1 Fix: LFI vulnerability in document parsing API (#13196)
### What problem does this PR solve?

Fix LFI vulnerability in document parsing API.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-25 09:47:39 +08:00
f4cbdc3a3b fix(api): MinIO health check use dynamic scheme and verify (Closes #13159 and #13158) (#13197)
## Summary

Fixes MinIO SSL/TLS support in two places: the MinIO **client**
connection and the **health check** used by the Admin/Service Health
dashboard. Both now respect the `secure` and `verify` settings from the
MinIO configuration.

Closes #13158
Closes #13159

---

## Problem

**#13158 – MinIO client:** The client in `rag/utils/minio_conn.py` was
hardcoded with `secure=False`, so RAGFlow could not connect to MinIO
over HTTPS even when `secure: true` was set in config. There was also no
way to disable certificate verification for self-signed certs.

**#13159 – MinIO health check:** In `api/utils/health_utils.py`, the
MinIO liveness check always used `http://` for the health URL. When
MinIO was configured with SSL, the health check failed and the dashboard
showed "timeout" even though MinIO was reachable over HTTPS.

---

## Solution

### MinIO client (`rag/utils/minio_conn.py`)

- Read `MINIO.secure` (default `false`) and pass it into the `Minio()`
constructor so HTTPS is used when configured.
- Add `_build_minio_http_client()` that reads `MINIO.verify` (default
`true`). When `verify` is false, return an `urllib3.PoolManager` with
`cert_reqs=ssl.CERT_NONE` and pass it as `http_client` to `Minio()` so
self-signed certificates are accepted.
- Support string values for `secure` and `verify` (e.g. `"true"`,
`"false"`).

### MinIO health check (`api/utils/health_utils.py`)

- Add `_minio_scheme_and_verify()` to derive URL scheme (http/https) and
the `verify` flag from `MINIO.secure` and `MINIO.verify`.
- Update `check_minio_alive()` to use the correct scheme, pass `verify`
into `requests.get(..., verify=verify)`, and use `timeout=10`.

### Config template (`docker/service_conf.yaml.template`)

- Add commented optional MinIO keys `secure` and `verify` (and env vars
`MINIO_SECURE`, `MINIO_VERIFY`) so deployers know they can enable HTTPS
and optional cert verification.

### Tests

- **`test/unit_test/utils/test_health_utils_minio.py`** – Tests for
`_minio_scheme_and_verify()` and `check_minio_alive()` (scheme, verify,
status codes, timeout, errors).
- **`test/unit_test/utils/test_minio_conn_ssl.py`** – Tests for
`_build_minio_http_client()` (verify true/false/missing, string values,
`CERT_NONE` when verify is false).

---

## Testing

- Unit tests added/updated as above; run with the project's test runner.
- Manually: configure MinIO with HTTPS and `secure: true` (and
optionally `verify: false` for self-signed); confirm client operations
work and the Service Health dashboard shows MinIO as alive instead of
timeout.
2026-02-25 09:47:12 +08:00
c292d617ca Fix: stored XSS via HTML File upload and inline Rendering in file get (#13202)
### What problem does this PR solve?

Fix stored XSS via HTML file upload and inline rendering in
/v1/file/get/<id>

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-25 09:46:48 +08:00
0a7c520579 Fix: empty response from OpenAI chat completion endpoint (#13166)
### What problem does this PR solve?

When using a chat assistant that has a hardcoded `empty_response`, that
response was not returned correctly in streaming mode when no
information is found in the knowledge base. In this case only one
response with `"content": null` was yielded. If `"references": true`,
then the `empty_response` is still put into the `final_content` so there
is technically some content returned, but when `"references": false` no
content at all is returned.

I update the OpenAI chat completion endpoint to yield an additional
response with the `empty_response` in the content.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-24 19:18:12 +08:00
5de92e57d3 Fix: 'None None' in log (#13192)
### What problem does this PR solve?

Fix: 'None None' in log

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-24 19:15:20 +08:00
46dec98f52 Fix: Chat/Agent embedded page (#13199)
### What problem does this PR solve?

Fix: Chat/Agent embedded page #13190

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-24 19:14:24 +08:00
d89ad8b79d fix: handle null response in LLM and improve JSON parsing in agent (#13187)
Fixes AttributeError in _remove_reasoning_content() when LLM returns
None, and improves JSON parsing regex for markdown code fences in
agent_with_tools.py
2026-02-24 13:15:09 +08:00
91d1a81937 fix: error during admin tenant creation when using Postgres (#13164)
### What problem does this PR solve?

This fixes the bug described in #13130. When starting RAGFlow with
Postgres the admin tenant create failed because the rerank model was not
set.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-24 10:57:31 +08:00
6d6c54db19 fix(metadata): handle unhashable list values in metadata split (#13116)
### What problem does this PR solve?

This PR fixes missing metadata on documents synced from the Moodle
connector, especially for **Book** modules.

Background:
- Moodle Book metadata includes fields like `chapters`, which is a
`list[dict]`.
- During metadata normalization in
`DocMetadataService._split_combined_values`, list deduplication used
`dict.fromkeys(...)`.
- `dict.fromkeys(...)` fails for unhashable values (like `dict`),
causing metadata update to fail.
- Result: documents were imported, but metadata was not saved for
affected module types (notably Books).

What this PR changes:
- Replaces hash-based list deduplication with `dedupe_list(...)`, which
safely handles unhashable list items while preserving order.
- This allows Book metadata (and other complex list metadata) to be
persisted correctly.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Contribution during my time at RAGcon GmbH.
2026-02-12 19:48:51 +08:00
6e7bcf58bc Refactor: split message apis to gateway and service (#13126)
### What problem does this PR solve?

Split message apis to gateway and service

### Type of change

- [x] Refactoring
2026-02-12 14:43:52 +08:00
30d5fc1a07 Refactor: split memory API into gateway and service layers (#13111)
### What problem does this PR solve?

Decouple the memory API into a gateway layer (for routing/param parse)
and a service layer (for business logic).

### Type of change

- [x] Refactoring
2026-02-12 10:11:50 +08:00
109441628b Fix: upload image files (#13071)
### What problem does this PR solve?

Fix: upload image files

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-11 09:47:33 +08:00
6f785e06a4 Fix issue #13084 (#13088)
When match_expressions contains coroutine objects (from GraphRAG's
Dealer.get_vector()), the code cannot identify this type because it only
checks for MatchTextExpr, MatchDenseExpr, or FusionExpr.

As a result:

score_func remains initialized as an empty string ""
This empty string is appended to the output list
The output list is passed to Infinity SDK's table_instance.output()
method
Infinity's SQL parser (via sqlglot) fails to parse the empty string,
throwing a ParseError
2026-02-10 17:04:45 +08:00
9bc16d8df2 Fix: agent files issue, (#13067)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-09 19:52:52 +08:00
fabbfcab90 Fix: failing p3 test for SDK/HTTP APIs (#13062)
### What problem does this PR solve?

Adjust highlight parsing, add row-count SQL override, tweak retrieval
thresholding, and update tests with engine-aware skips/utilities.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-09 14:56:10 +08:00
e51a40fdfc Fix: launch an agent. (#13039)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-09 10:08:36 +08:00
301ed76aa4 Fix: task cancel (#13034)
### What problem does this PR solve?

Fix: task cancel #11745 
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-06 14:48:24 +08:00
1262533b74 Feat: support verify to set llm key and boost bigrams. (#12980)
#12863

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-02-05 19:19:09 +08:00
0a08fc7b07 Fix: example code in session.py (#13004)
### What problem does this PR solve?

Fix: example code in session.py #12950

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Levi <stupse-tipp0j@icloud.com>
Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
Co-authored-by: Liu An <asiro@qq.com>
2026-02-05 15:56:58 +08:00
803b480f9c feat: Add optional document metadata in OpenAI-compatible response references (#12950)
### What problem does this PR solve?

This PR adds an opt‑in way to include document‑level metadata in
OpenAI‑compatible reference chunks. Until now, metadata could be used
for filtering but wasn’t returned in responses. The change enables
clients to show richer citations (author/year/source, etc.) while
keeping payload size and privacy under control via an explicit request
flag and optional field allowlist.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

Contribution during my time at RAGcon GmbH.
2026-02-05 09:54:33 +08:00
4d4b5a978d feat: enable multi-file upload for chat and agent workflows (#12977)
### Closes: #12921 

### What problem does this PR solve?

Previously, multi-file upload was not working correctly across the
application:

- **Chat**: UI displayed "Upload max 5 files" but only the first file
was actually uploaded
- **Agent conversational mode**: Frontend sent multiple files but
backend only processed one
- **Agent task-mode file inputs**: Explicitly limited to single file
only

This PR enables proper multi-file upload support for both chat and agent
workflows, allowing users to upload and process multiple files (up to 5)
as the UI originally suggested.

**Changes:**
- `web/src/pages/next-chats/hooks/use-upload-file.ts`: Process all files
instead of only `files[0]`
- `api/apps/canvas_app.py`: Handle multiple files via
`files.getlist("file")`
- `web/src/pages/agent/debug-content/uploader.tsx`: Allow up to 5 files
with `multiple={true}`
- `agent/component/begin.py` & `fillup.py`: Support file arrays while
maintaining backward compatibility

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-02-04 18:03:21 +08:00
a37d287fad Fix: pdf chunking / table rotation (#12981)
### What problem does this PR solve?

Fix: PDF chunking issue for single-page documents
Refactor: Change the default refresh frequency to 5
Fix: Add a 0-degree threshold; require other rotation angles to exceed
it by at least 0.2
Fix: Put connector name tips to correct place
Fix: incorrect example response in delete datasets.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2026-02-04 17:00:25 +08:00
205ae769bb Fix "metadata table not exists" (#12949)
### What problem does this PR solve?

Fix "metadata table not exists" when updating a meta data.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-03 17:28:10 +08:00
32f9a87b2e Fix: default admin tenant (#12964)
### What problem does this PR solve?

Add tenant for default admin, and allow login to ragflow server as
default admin.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-02-03 15:37:36 +08:00
7cbe8b5b53 feat: Add a custom header to the SDK for chatting with the agent. (#12430)
### What problem does this PR solve?

As title.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: Liu An <asiro@qq.com>
2026-02-03 11:01:18 +08:00
2e5a18602b refactor: optimize agent list payload and improve multimodal detection logic (#12942)
## Description
This PR focuses on API performance optimization and refining the model
capability detection logic in the Agent/Canvas module.

### 1. Performance Optimization (Backend)
- **Changes**: Removed `cls.model.dsl` from query fields in
`UserCanvasService.get_by_tenant_ids`.
- **Reasoning**: The `dsl` object is large and unnecessary for the Agent
list view. Excluding it reduces the payload size of the
`/v1/canvas/list` API, leading to faster serialization and reduced
network latency.
- **Consistency**: Full DSL data remains accessible via the individual
`/v1/canvas/get/<id>` endpoint used in the detail view.

### 2. Multimodal Detection Refinement (Frontend)
- **Changes**: Replaced `model_type === LlmModelType.Image2text` with
`tags?.includes('IMAGE2TEXT')`.
- **Reasoning**: In RAGFlow, `model_type` defines the primary role of a
model (e.g., `chat`). However, many advanced Chat models are also
vision-capable. Since `model_type` is a single-value field, it cannot
represent these multiple capabilities.
- **Solution**: Utilizing the `tags` field (which supports multiple
attributes) to check for `IMAGE2TEXT` ensures that models like
`gpt-5.2-pro` correctly display multimodal input options.



## Type of Change
- [x] Bug fix (logic correction for multimodal detection)
- [x] Optimization (performance improvement for list API)

## Main Changes
- `api/db/services/canvas_service.py`: Optimized DB query by excluding
heavy DSL fields.
- `web/src/pages/agent/form/agent-form/index.tsx`: Enhanced capability
detection using the tags system.

## Verification
- [x] Verified Agent list loads faster with reduced response payload.
- [x] Confirmed that `chat` models with the `IMAGE2TEXT` tag now
correctly enable the multimodal input UI.
2026-02-02 17:35:54 +08:00
1b587013d8 Fix: remove unused imports and f-string formatting (#12935)
### What problem does this PR solve?

- Remove unused imports (Mock, patch, MagicMock, json, os,
RAGFLOW_COLUMNS, VECTOR_FIELD_PATTERN) from multiple files
- Replace f-string formatting with regular strings for console output
messages in cli.py
- Clean up unnecessary imports that were no longer being used in the
codebase

### Type of change

- [x] Refactoring
2026-02-02 12:11:39 +08:00
c4c3f744c0 feat: add Peewee ORM support for OceanBase as primary database (#12769) (#12926)
## Summary

This PR adds Peewee ORM support for OceanBase as the primary database in
RAGFlow, as requested in issue #12769.

## Changes

### Core Implementation

1. **RetryingPooledOceanBaseDatabase Class**
   - Inherits from `PooledMySQLDatabase` (OceanBase is MySQL-compatible)
   - Implements retry mechanism for connection issues
   - Handles MySQL-specific error codes (2013, 2006 for connection loss)
   - Provides connection pool management

2. **PooledDatabase Enum**
   - Added `OCEANBASE = RetryingPooledOceanBaseDatabase`

3. **DatabaseLock Enum**
   - Added `OCEANBASE = MysqlDatabaseLock`
   - OceanBase uses MySQL-style locking

4. **TextFieldType Enum**
   - Added `OCEANBASE = "LONGTEXT"`
   - OceanBase uses same text field type as MySQL

5. **DatabaseMigrator Enum**
   - Added `OCEANBASE = MySQLMigrator`
   - OceanBase uses MySQL migration tools

### Usage

```bash
# Set environment variable to use OceanBase
export DB_TYPE=oceanbase

# Configure connection (in docker/.env or environment)
OCEANBASE_HOST=localhost
OCEANBASE_PORT=2881
OCEANBASE_USER=root
OCEANBASE_PASSWORD=password
OCEANBASE_DATABASE=ragflow
```

### Technical Details

- **Location**: `api/db/db_models.py`
- **Dependencies**: No new dependencies (uses existing Peewee MySQL
support)
- **Code Size**: ~90 lines
- **Difficulty**: Simple

### Testing

- Added comprehensive unit tests in
`tests/unit/test_oceanbase_peewee.py`
- Tests cover:
  - OceanBase database class existence and inheritance
  - Enum values for PooledDatabase, DatabaseLock, TextFieldType
  - Initialization with custom retry settings
  - Environment variable configuration

### Acceptance Criteria

 Can switch to OceanBase database via `DB_TYPE=oceanbase` environment
variable
 All database operations work normally in OceanBase environment  
 OceanBase uses MySQL compatibility mode (no additional dependencies)  

### Background

This is part of the RAGFlow + OceanBase Hackathon to allow users to
choose OceanBase as RAGFlow's primary database, leveraging OceanBase's
high availability and scalability.

---

## Related Issues
- **Primary**: https://github.com/infiniflow/ragflow/issues/12769
- **Context**: https://github.com/oceanbase/seekdb/issues/123 (OceanBase
Developer Challenge)

---

Closes infiniflow/ragflow#12769
2026-01-31 15:45:20 +08:00
23bdf25a1f feature:Add OceanBase Storage Support for Table Parser (#12923)
### What problem does this PR solve?

close #12770 

This PR adds OceanBase as a storage backend for the Table Parser. It
enables dynamic table schema storage via JSON and implements OceanBase
SQL execution for text-to-SQL retrieval.


### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

### Changes
- Table Parser stores row data into `chunk_data` when doc engine is
OceanBase. (table.py)
- OceanBase table schema adds `chunk_data` JSON column and migrates if
needed.
- Implemented OceanBase `sql()` to execute text-to-SQL results.
(ob_conn.py)
- Add `DOC_ENGINE_OCEANBASE` flag for engine detection (setting.py)

### Test
1. Set `DOC_ENGINE=oceanbase` (e.g. in `docker/.env`)
<img width="1290" height="783" alt="doc_engine_ob"
src="https://github.com/user-attachments/assets/7d1c609f-7bf2-4b2e-b4cc-4243e72ad4f1"
/>

2. Upload an Excel file to Knowledge Base.(for test, we use as below)
<img width="786" height="930" alt="excel"
src="https://github.com/user-attachments/assets/bedf82f2-cd00-426b-8f4d-6978a151231a"
/>

3. Choose **Table** as parsing method.
<img width="2550" height="1134" alt="parse_excel"
src="https://github.com/user-attachments/assets/aba11769-02be-4905-97e1-e24485e24cd0"
/>

4.Ask a natural language query in chat.
<img width="2550" height="1134" alt="query"
src="https://github.com/user-attachments/assets/26a910a6-e503-4ac7-b66a-f5754bbb0e91"
/>
2026-01-31 15:11:54 +08:00
ee23b9eb63 feature:Add OceanBase Support to Text-to-SQL Agent (#12919)
### What problem does this PR solve?

Close #12768.

This PR adds OceanBase support to RAGFlow’s Text-to-SQL (ExeSQL)
component.
OceanBase is integrated via MySQL compatibility mode, and the UI
`db_type` options are updated accordingly.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

### Changes

**Backend**
- Add `oceanbase` `db_type` validation and connection logic in
`exesql.py` and reuse existing MySQL compatibility mode

**Frontend**
- Add OceanBase option to the ExeSQL `db_type` selector

### How to test
1. Configure OceanBase connection in ExeSQL node
(host/port/user/password/database)
2. Input: “Show 10 rows from test table”
3. Generated SQL: `SELECT * FROM test LIMIT 10;`
4. Query executes successfully and results are returned

### Screenshots
- ExeSQL db_type includes OceanBase
<img width="649" height="1015" alt="2"
src="https://github.com/user-attachments/assets/e0a5f7b9-e282-402a-8639-64c1aef8fce6"
/>

- ExeSQL test OceanBase connection
<img width="2247" height="1140" alt="test_ob"
src="https://github.com/user-attachments/assets/f16ebd93-b48e-4d18-b53f-8496581e755d"
/>



- Query results from OceanBase shown in UI
<img width="2550" height="1351" alt="1"
src="https://github.com/user-attachments/assets/b44163dc-baab-420d-b31e-b644bdcb77a9"
/>
2026-01-31 15:03:40 +08:00
212d6f3660 Fix metadata in get_list() (#12906)
### What problem does this PR solve?

test_update_document.py failed as metadata is not included in the
response of get_list(), fix the issue.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-30 14:06:49 +08:00