Commit Graph

5460 Commits

Author SHA1 Message Date
7143954b48 Fix: chats_openai in none stream condition (#13495)
### What problem does this PR solve?

Fix: chats_openai in none stream condition #13453

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 13:44:17 +08:00
7c92f51133 Fix retrieval function when metadata_condtion is specified in retrieval API (#13473)
### What problem does this PR solve?

Fix https://github.com/infiniflow/ragflow/issues/13388

The following command returns empty when there is doc with the meta data
```
curl --request POST \
     --url http://localhost:9222/api/v1/retrieval \
     --header 'Content-Type: application/json' \
     --header 'Authorization: Bearer ragflow-fO3mPFePfLgUYg8-9gjBVVXbvHqrvMPLGaW0P86PvAk' \
     --data '{
          "question": "any question",
          "dataset_ids": ["9bb4f0591b8811f18a4a84ba59049aa3"],
           "metadata_condition": {
            "logic": "and",
            "conditions": [
              {
                "name": "character",
                "comparison_operator": "is",
                "value": "刘备"
              }
            ]
          }
     }'
```

When metadata_condtion is specified in the retrieval API, it is
converted to doc_ids and doc_ids is passed to retrieval function.
In retrieval funciton, when doc_ids is explicitly provided , we should
bypass threshold.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 11:57:32 +08:00
292a1a8566 fix: detect and fallback garbled PDF text to OCR (#13366) (#13404)
## Problem

When PDF fonts lack ToUnicode/CMap mappings, pdfplumber (pdfminer)
cannot map CIDs to correct Unicode characters, outputting PUA characters
(U+E000~U+F8FF) or `(cid:xxx)` placeholders. The original code fully
trusted pdfplumber text without any garbled detection, causing garbled
output in the final parsed result.

Relates to #13366

## Solution

### 1. Garbled text detection functions
- `_is_garbled_char(ch)`: Detects PUA characters (BMP/Plane 15/16),
replacement character U+FFFD, control characters, and
unassigned/surrogate codepoints
- `_is_garbled_text(text, threshold)`: Calculates garbled ratio and
detects `(cid:xxx)` patterns

### 2. Box-level fallback (in `__ocr()`)
When a text box has ≥50% garbled characters, discard pdfplumber text and
fallback to OCR recognition.

### 3. Page-level detection (in `__images__()`)
Sample characters from each page; if garbled rate ≥30%, clear all
pdfplumber characters for that page, forcing full OCR.

### 4. Layout recognizer CID filtering
Filter out `(cid:xxx)` patterns in `layout_recognizer.py` text
processing to prevent them from polluting layout analysis.

## Testing
- 29 unit tests covering: normal CJK/English text, PUA characters, CID
patterns, mixed text, boundary thresholds, edge cases
- All 85 existing project unit tests pass without regression
2026-03-10 11:20:31 +08:00
7f6a9e8ee9 Update ext field type of heartbeat message (#13490)
### What problem does this PR solve?

As title

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-10 10:49:39 +08:00
02108772d8 refactor: Moves the LLM factory initialization logic to the dao package. (#13476)
### What problem does this PR solve?

refactor: Moves the LLM factory initialization logic to the `dao`
package.

Removes the `init_data` package and integrates the LLM factory
initialization functionality into the `dao` package.
Adds a `utility` package to provide general utility functions.
Updates `server_main.go` to use the new initialization path.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-03-10 10:35:55 +08:00
88a40b95a2 fix: include missing modules in ragflow-cli PyPI package (#13457)
## Problem

The `ragflow-cli` PyPI package (v0.24.0) is missing `http_client.py`,
`ragflow_client.py`, and `user.py`, causing import errors when installed
from PyPI.

## Root Cause

`pyproject.toml` only lists `ragflow_cli` and `parser` in
`[tool.setuptools] py-modules`.

## Fix

Add the three missing modules to `py-modules`.

Fixes #13456

Co-authored-by: atian8179 <atian8179@users.noreply.github.com>
2026-03-10 10:02:21 +08:00
4fe706876c Service list and minio status (#13480)
### What problem does this PR solve?

1. Resolve standard user can access admin service
2. Get RAGFlow service status
3. Fix minio status fetching

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-10 09:56:43 +08:00
4f507c0058 Docs: Updated Switch chunk availability (#13482)
### What problem does this PR solve?

A quick editorial pass.

### Type of change

- [x] Documentation Update
2026-03-09 21:14:45 +08:00
7484298c82 Refa: convert download_img to async (#13477)
### What problem does this PR solve?

Convert download_img to async.

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2026-03-09 19:00:17 +08:00
52bcd98d29 Add scheduled tasks (#13470)
### What problem does this PR solve?

1. RAGFlow server will send heartbeat periodically.
2. This PR will including:
- Scheduled task
- API server message sending
- Admin server API to receive the message.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-09 17:48:29 +08:00
c732a1c8e0 Refactor the go_binding to binding (#13469)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-09 15:52:31 +08:00
25ace613b0 feat: Added LLM factory initialization functionality and knowledge base related API interfaces (#13472)
### What problem does this PR solve?

feat: Added LLM factory initialization functionality and knowledge base
related API interfaces

refactor(dao): Refactored the TenantLLMDAO query method
feat(handler): Implemented knowledge base related API endpoints
feat(service): Added LLM API key setting functionality
feat(model): Extended the knowledge base model definition
feat(config): Added default user LLM configuration

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-09 15:52:14 +08:00
d0465ba909 refactor: improve paddle ocr logic (#13467)
### What problem does this PR solve?

improve paddle ocr logic

### Type of change
- [x] Refactoring
2026-03-09 14:16:57 +08:00
3ce236c4e3 Feat: add switch_chunks endpoint to manage chunk availability (#13435)
### What problem does this commit solve?

This commit introduces a new API endpoint
`/datasets/<dataset_id>/documents/<document_id>/chunks/switch` that
allows users to switch the availability status of specified chunks in a
document as same as chunk_app.py

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-09 12:36:45 +08:00
32d31284cc Fix: upgrade pypdf to 6.7.5 and migrate from deprecated pypdf2 to fix CVE-2026-28804 and CVE-2023-36464 (#13454)
### What problem does this PR solve?

This PR addresses security vulnerabilities in PDF processing
dependencies identified by Trivy security scan:

1. CVE-2026-28804 (MEDIUM): pypdf 6.7.4 vulnerable to inefficient
decoding of ASCIIHexDecode streams
2. CVE-2023-36464 (MEDIUM): pypdf2 3.0.1 susceptible to infinite loop
when parsing malformed comments

Since pypdf2 is deprecated with no available fixes, this PR migrates all
pypdf2 usage to the actively maintained pypdf library (version 6.7.5),
which resolves
both vulnerabilities.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-09 12:06:00 +08:00
2634cfc06f Fix: undefined variable and wrong method name in agent components (#13462)
## Summary

This PR fixes two runtime bugs in agent components:

**Bug 1: `agent/component/invoke.py` — `NameError` in POST +
`clean_html` path**

The POST method's `clean_html` branch uses the variable `sections`
without ever defining it. Both the GET and PUT branches correctly call
`sections = HtmlParser()(None, response.content)` before referencing
`sections`, but this line was missing from the POST branch (copy-paste
omission). This causes a `NameError` whenever a user configures an
Invoke component with `method="post"` and `clean_html=True`.

**Bug 2: `agent/component/data_operations.py` — `AttributeError` in
`_recursive_eval`**

The `_recursive_eval` method recursively calls `self.recursive_eval()`
(without the leading underscore) instead of `self._recursive_eval()`.
Since the method is defined as `_recursive_eval`, this causes an
`AttributeError` at runtime when the `literal_eval` operation processes
nested dicts or lists.

## Test plan

- [ ] Configure an Invoke node with `method=post` and `clean_html=True`,
verify HTML is parsed correctly without `NameError`
- [ ] Configure a DataOperations node with `operations=literal_eval` on
nested data, verify no `AttributeError`

---------

Signed-off-by: JiangNan <1394485448@qq.com>
2026-03-09 11:09:47 +08:00
610c1b507d Add more API of admin server of go (#13403)
### What problem does this PR solve?

Add APIs to admin server.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-09 10:44:53 +08:00
ab6ca75245 fix(agent): ensure database connections are properly closed in ExeSQL tool (#13427)
## Summary

Fix a database connection and cursor resource leak in the ExeSQL agent
tool.

When SQL execution raises an exception (for example syntax error or
missing table),
the existing code path skips `cursor.close()` and `db.close()`, causing
database
connections to accumulate over time.

This can eventually lead to connection exhaustion in long-running agent
workflows.

## Root Cause

The cleanup logic for database cursors and connections is placed after
the SQL
execution loop without `try/finally` protection. If an exception occurs
during
`cursor.execute()`, `fetchmany()`, or result processing, the cleanup
code is not
reached and the connection remains open.

The same issue also exists in the IBM DB2 execution path where
`ibm_db.close(conn)`
may be skipped when exceptions occur.

## Fix

- Wrap SQL execution logic in `try/finally` blocks to guarantee resource
cleanup.
- Ensure `cursor.close()` and `db.close()` are always executed.
- Add explicit `db.close()` when `db.cursor()` creation fails.
- Remove redundant close calls in early-return branches since `finally`
now handles cleanup.

## Impact

- No change to normal execution behavior.
- Ensures database resources are always released when errors occur.
- Prevents connection leaks in long-running workflows.
- Only affects `agent/tools/exesql.py`.

## Testing

Manual test scenarios:

1. Valid SQL execution
2. SQL syntax error
3. Query against a non-existing table
4. Execution cancellation during query

In all scenarios the database cursor and connection are properly closed.

Code quality checks:

- `ruff check` passed
- No new warnings introduced
2026-03-09 10:36:02 +08:00
89e495e1bc Chore: update release workflow configuration (#13466)
### What problem does this PR solve?

update release workflow configuration

### Type of change

- [x] Update CI
2026-03-09 10:32:51 +08:00
c217b8f3d8 Feat: add DingTalk AI Table connector and integration for data synch… (#13413)
### What problem does this PR solve?

Add DingTalk AI Table connector and integration for data synchronization

Issue #13400

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: wangheyang <wangheyang@corp.netease.com>
2026-03-06 21:13:23 +08:00
094eae3cf5 refactor(ui): adjust dataset page styles (#13452)
### What problem does this PR solve?

- Adjust UI styles in **Dataset** pages.
- Adjust several shared components styles
- Modify files and directory structure in `src/layouts`

### Type of change

- [x] Refactoring
2026-03-06 21:13:14 +08:00
7166a7e50e Test: adjust test priority markers for API tests (#13450)
### What problem does this PR solve?

Changed test priority markers from p1/p2 to p3 in three test files:
- test_table_parser_dataset_chat.py: Adjusted priority for table parser
dataset chat test
- test_delete_chunks.py: Updated priority for chunk deletion test with
invalid IDs
- test_retrieval_chunks.py: Modified priority for chunks retrieval
pagination test

These changes demote the priority of specific test cases to p3,
indicating they are lower priority tests that can run later in the test
suite execution.

### Type of change

- [x] Test update
2026-03-06 20:17:39 +08:00
ae4645e01b Fix: Add folder upload #9743 (#13448)
### What problem does this PR solve?

Fix: Add folder upload  #9743

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 20:17:29 +08:00
82a616589b Feat: Add PublishConfirmDialog (#13447)
### What problem does this PR solve?

Feat: Add PublishConfirmDialog

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-03-06 20:17:21 +08:00
45cf24cd2f feat(memory): implement get_highlight for OceanBase memory (#13449)
### What problem does this PR solve?

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-06 20:17:11 +08:00
01a100bb29 Fix data models (#13444)
### What problem does this PR solve?

Since database model is updated in python version, go server also need
to update

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-06 20:05:10 +08:00
3ed91345aa fix(auth): return HTTP 401 for token-auth failures (#13420)
Follow-up to #12488 #13386

### What problem does this PR solve?

Previously, token authentication failures returned HTTP 200 with an
error code in the response body.

This PR updates `token_required` to raise `Unauthorized` and relies on
the global error handler to return a structured JSON response with HTTP
401 status.

The response body structure (`code`, `message`, `data`) remains
unchanged to preserve compatibility with the official SDK.

Frontend logic has been updated to handle HTTP 401 responses in addition
to checking `data.code`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 18:18:14 +08:00
51be1f1442 Refa: empty ids means no-op operation (#13439)
### What problem does this PR solve?

Empty ids means no-op operation.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update
- [x] Refactoring

---------

Co-authored-by: writinwaters <cai.keith@gmail.com>
2026-03-06 18:16:42 +08:00
7781c51a21 Revert aliyun registry to registry.cn-hangzhou.aliyuncs.com (#13445)
## Summary
- Revert aliyun registry from
`infiniflow-registry.cn-shanghai.cr.aliyuncs.com` back to
`registry.cn-hangzhou.aliyuncs.com`

## Test plan
- [ ] Verify the docker/.env file contains the correct registry URL

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 18:03:35 +08:00
826af383b4 Fix: paddle ocr missing outlines (#13441)
### What problem does this PR solve?

Fix: paddle ocr missing outlines #13422

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 17:19:51 +08:00
2504c3adde Fix docker file (#13438)
### What problem does this PR solve?

To copy infinity/resource into docker images

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-06 16:56:12 +08:00
81fd1811b8 Feat:Using Go to implement user registration logic (#13431)
### What problem does this PR solve?

Feat:Using Go to implement user registration logic

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-06 16:42:49 +08:00
37eb533fea Feat(memory): implement get_aggregation for OceanBase memory (#13428)
### What problem does this PR solve?

- Add aggregation_utils.aggregate_by_field for pure aggregation logic
- Wire OBConnection.get_aggregation to use it (unwrap tuple, pass
messages)
- Add unit tests for aggregate_by_field (no DB/heavy deps)

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-06 12:51:22 +08:00
383986dc5f fix: re-chunk documents when data source content is updated (#12918)
Closes: #12889 

### What problem does this PR solve?

When syncing external data sources (e.g., Jira, Confluence, Google
Drive), updated documents were not being re-chunked. The raw content was
correctly updated in blob storage, but the vector database retained
stale chunks, causing search results to return outdated information.

**Root cause:** The task digest used for chunk reuse optimization was
calculated only from parser configuration fields (`parser_id`,
`parser_config`, `kb_id`, etc.), without any content-dependent fields.
When a document's content changed but the parser configuration remained
the same, the system incorrectly reused old chunks instead of
regenerating new ones.

**Example scenario:**
1. User syncs a Jira issue: "Meeting scheduled for Monday"
2. User updates the Jira issue to: "Meeting rescheduled to Friday"
3. User triggers sync again
4. Raw content panel shows updated text ✓
5. Chunk panel still shows old text "Monday" ✗

**Solution:**
1. Include `update_time` and `size` in the chunking config, so the task
digest changes when document content is updated
2. Track updated documents separately in `upload_document()` and return
them for processing
3. Process updated documents through the re-parsing pipeline to
regenerate chunks


[1.webm](https://github.com/user-attachments/assets/d21d4dcd-e189-4d39-8700-053bae0ca5a0)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 12:48:47 +08:00
0214257886 Fix: init func (#13430)
### What problem does this PR solve?

Fix update_cnt add error in init_data.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 11:42:31 +08:00
6849d35bf5 Feat: Optimize the style of the chat page. (#13429)
### What problem does this PR solve?

Feat: Optimize the style of the chat page.
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-03-06 11:42:25 +08:00
6023eb27ac feat: add Ragcon provider (#13425)
### What problem does this PR solve?

This PR aims to extend the list of possible providers. Adds new Provider
"RAGcon" within the Ollama Modal. It provides all model types except OCR
via Openai-compatible endpoints.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Jakob <16180662+hauberj@users.noreply.github.com>
2026-03-06 09:37:27 +08:00
c35b210c3a fix(security): upgrade requests to 2.32.5 in agent/sandbox to fix CVE-2024-47081 (#13424)
### What problem does this PR solve?

This PR remediates CVE-2024-47081 (MEDIUM severity) in the agent/sandbox
component by upgrading the requests library from version 2.32.3 to
2.32.5. The vulnerability allows .netrc credentials to leak via
malicious URLs.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 09:29:18 +08:00
aa57bcf92a fix: upgrade urllib3 to 2.6.3 to resolve CVE-2025-66418, CVE-2025-66471, CVE-2026-21441 (#13423)
### What problem does this PR solve?

This PR remediates three HIGH severity vulnerabilities in urllib3
affecting the admin client and Python SDK:
- **CVE-2025-66418**: Unbounded decompression chain leads to resource
exhaustion
- **CVE-2025-66471**: Streaming API improperly handles highly compressed
data
- **CVE-2026-21441**: Decompression-bomb safeguard bypass when following
HTTP redirects
Trivy security scan identified urllib3 v2.5.0 as vulnerable in both
`admin/client/uv.lock` and `sdk/python/uv.lock`. This PR updates urllib3
to v2.6.3 to eliminate these security risks.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 09:29:10 +08:00
ef4cbe72a3 refactor(ui): adjust global navigation bar style (#13419)
### What problem does this PR solve?

Renovate global navigation bar, align styles to the design.
(May causes minor layout issues in sub-pages, will check and fix soon)

### Type of change

- [x] Refactoring
2026-03-05 20:47:29 +08:00
9e0e128ce5 Add checksum/values annotation to ragflow.yaml (#13409)
Add checksum annotation for values in ragflow.yaml

### What problem does this PR solve?

This PR is about this ticket: #13408

Ragflow helm charts do not include the Values.yaml in the list of
watched changes.
If you update the Values.yaml for an existing deployment, helm will not
detect it and not update the deployment.

This PR fixes that.

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)
2026-03-05 20:27:38 +08:00
963e31e9b5 Refact: Updated the doc structure. (#13414)
### What problem does this PR solve?

Updated the doc structure.

### Type of change


- [x] Documentation Update
2026-03-05 19:04:56 +08:00
d90d6026af Playwright : new chat multi model test (#13402)
### What problem does this PR solve?

new test for chat multiple model and other chat parameters under
playwright

### Type of change

- [x] Other (please describe): new test/ data-testid
2026-03-05 18:51:57 +08:00
d9785ea2ce Fix: Alibaba cloud OSS config issue (#13406)
### What problem does this PR solve?

 Alibaba Could OSS config issue #13390.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-05 18:13:45 +08:00
8b534c895e Fix: UI Placeholder and Hint Optimization (#13416)
### What problem does this PR solve?

Fix: UI Placeholder and Hint Optimization

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-05 18:13:19 +08:00
35fc5edc93 feat: Adds the tenant model ID field to the interface definition. (#13274)
### What problem does this PR solve?

feat: Adds the tenant model ID field to the interface definition

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-05 17:27:34 +08:00
62cb292635 Feat/tenant model (#13072)
### What problem does this PR solve?

Add id for table tenant_llm and apply in LLMBundle.

### Type of change

- [x] Refactoring

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
Co-authored-by: Liu An <asiro@qq.com>
2026-03-05 17:27:17 +08:00
47540a4147 Feat: published agent version control (#13410)
### What problem does this PR solve?

Feat: published agent version control

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-03-05 17:26:39 +08:00
8c9b080499 fix: update axios to 1.13.5+ to remediate CVE-2026-25639 DoS vulnerability (#13380)
### What problem does this PR solve?

This PR remediates CVE-2026-25639, a HIGH severity Denial of Service
vulnerability in axios caused by __proto__ pollution in the mergeConfig
function. The vulnerability affects both the web frontend and the
sandbox nodejs environment.

Trivy security scan identified axios versions below 1.13.5 as
vulnerable. This PR updates axios to secure versions (1.13.6 in web,
1.13.5 in sandbox) to eliminate the security risk.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-05 17:26:04 +08:00
f13a1fb007 Refa: improve model verification ux (#13392)
### What problem does this PR solve?

Improve model verification UX. #13395 

### Type of change

- [x] Refactoring

---------

Co-authored-by: Liu An <asiro@qq.com>
2026-03-05 17:23:47 +08:00