Commit Graph

5471 Commits

Author SHA1 Message Date
3c80a0ae09 Fix: support vLLM's new reasoning field (#13493)
### What problem does this PR solve?

Support vLLM's new reasoning field

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 21:13:14 +08:00
yzy
07c9cf6cbe Fix: return structured JSON output for non-streaming agent API (#13389)
### What problem does this PR solve?

Previously, when an Agent component was configured with structured
output, the non-streaming /agents/{agent_id}/completions API never
returned the structured field in its response.

The root cause: the non-streaming code path only collected message
events to build full_content, then returned the workflow_finished
payload — which only contains the output of the last component in the
execution path (typically a Message component).
Any structured output set by upstream components (e.g., Agent or LLM)
was silently discarded.

This PR fixes the non-streaming handler to iterate node_finished events
and collect structured output from intermediate components.
If any component produced a non-empty structured value, it is included
in the final response under data.structured. The streaming path is
unaffected, as it already exposes node_finished events to the caller.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 19:22:04 +08:00
08f83ff331 Feat: Support get aggregated parsing status to dataset via the API (#13481)
### What problem does this PR solve?

Support getting aggregated parsing status to dataset via the API

Issue: #12810

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: heyang.why <heyang.why@alibaba-inc.com>
2026-03-10 18:05:45 +08:00
68a623154a Fix: bin directory cannot be copied to docker image introduced by #13444 (#13502)
### What problem does this PR solve?

bin directory cannot be copied to docker image introduced by

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 17:31:20 +08:00
f14b53c764 feat(admin): Implemented default administrator initialization and login functionality. (#13504)
### What problem does this PR solve?

feat(admin): Implemented default administrator initialization and login
functionality.

Added support for default administrator configuration, including super
user nickname, email, and password.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 17:30:21 +08:00
81461b4505 Fix: The number of deleted session prompts is displayed incorrectly. #13499 (#13500)
### What problem does this PR solve?

Fix: The number of deleted session prompts is displayed incorrectly.
#13499
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 16:01:31 +08:00
675810e0cf Refact: optimize confluence performance (#13497)
### What problem does this PR solve?

Refact: optimize confluence performance #13494

### Type of change

- [x] Refactoring
2026-03-10 15:02:24 +08:00
9ba43ae4ee Fix "Coordinate lower is less than upper" error with MinerU (#13483)
### What problem does this PR solve?

Fixes #6004 #7142 #11959

Unlike #9207 we actually normalize the coordinates here

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 15:02:01 +08:00
aaf900cf16 Feat: Display release status in agent version history. (#13479)
### What problem does this PR solve?
Feat: Display release status in agent version history.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: balibabu <assassin_cike@163.com>
2026-03-10 14:25:27 +08:00
249b78561b Fix missmatch docnm_kwd in raptor chunks (#13451)
### What problem does this PR solve?

issue #13393 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 14:24:33 +08:00
185ab0d4ef Fix delete_document_metadata (#13496)
### What problem does this PR solve?

Avoid getting doc in function delete_document_metadata as the doc might
have been removed.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 13:44:24 +08:00
7143954b48 Fix: chats_openai in none stream condition (#13495)
### What problem does this PR solve?

Fix: chats_openai in none stream condition #13453

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 13:44:17 +08:00
7c92f51133 Fix retrieval function when metadata_condtion is specified in retrieval API (#13473)
### What problem does this PR solve?

Fix https://github.com/infiniflow/ragflow/issues/13388

The following command returns empty when there is doc with the meta data
```
curl --request POST \
     --url http://localhost:9222/api/v1/retrieval \
     --header 'Content-Type: application/json' \
     --header 'Authorization: Bearer ragflow-fO3mPFePfLgUYg8-9gjBVVXbvHqrvMPLGaW0P86PvAk' \
     --data '{
          "question": "any question",
          "dataset_ids": ["9bb4f0591b8811f18a4a84ba59049aa3"],
           "metadata_condition": {
            "logic": "and",
            "conditions": [
              {
                "name": "character",
                "comparison_operator": "is",
                "value": "刘备"
              }
            ]
          }
     }'
```

When metadata_condtion is specified in the retrieval API, it is
converted to doc_ids and doc_ids is passed to retrieval function.
In retrieval funciton, when doc_ids is explicitly provided , we should
bypass threshold.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 11:57:32 +08:00
292a1a8566 fix: detect and fallback garbled PDF text to OCR (#13366) (#13404)
## Problem

When PDF fonts lack ToUnicode/CMap mappings, pdfplumber (pdfminer)
cannot map CIDs to correct Unicode characters, outputting PUA characters
(U+E000~U+F8FF) or `(cid:xxx)` placeholders. The original code fully
trusted pdfplumber text without any garbled detection, causing garbled
output in the final parsed result.

Relates to #13366

## Solution

### 1. Garbled text detection functions
- `_is_garbled_char(ch)`: Detects PUA characters (BMP/Plane 15/16),
replacement character U+FFFD, control characters, and
unassigned/surrogate codepoints
- `_is_garbled_text(text, threshold)`: Calculates garbled ratio and
detects `(cid:xxx)` patterns

### 2. Box-level fallback (in `__ocr()`)
When a text box has ≥50% garbled characters, discard pdfplumber text and
fallback to OCR recognition.

### 3. Page-level detection (in `__images__()`)
Sample characters from each page; if garbled rate ≥30%, clear all
pdfplumber characters for that page, forcing full OCR.

### 4. Layout recognizer CID filtering
Filter out `(cid:xxx)` patterns in `layout_recognizer.py` text
processing to prevent them from polluting layout analysis.

## Testing
- 29 unit tests covering: normal CJK/English text, PUA characters, CID
patterns, mixed text, boundary thresholds, edge cases
- All 85 existing project unit tests pass without regression
2026-03-10 11:20:31 +08:00
7f6a9e8ee9 Update ext field type of heartbeat message (#13490)
### What problem does this PR solve?

As title

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-10 10:49:39 +08:00
02108772d8 refactor: Moves the LLM factory initialization logic to the dao package. (#13476)
### What problem does this PR solve?

refactor: Moves the LLM factory initialization logic to the `dao`
package.

Removes the `init_data` package and integrates the LLM factory
initialization functionality into the `dao` package.
Adds a `utility` package to provide general utility functions.
Updates `server_main.go` to use the new initialization path.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-03-10 10:35:55 +08:00
88a40b95a2 fix: include missing modules in ragflow-cli PyPI package (#13457)
## Problem

The `ragflow-cli` PyPI package (v0.24.0) is missing `http_client.py`,
`ragflow_client.py`, and `user.py`, causing import errors when installed
from PyPI.

## Root Cause

`pyproject.toml` only lists `ragflow_cli` and `parser` in
`[tool.setuptools] py-modules`.

## Fix

Add the three missing modules to `py-modules`.

Fixes #13456

Co-authored-by: atian8179 <atian8179@users.noreply.github.com>
2026-03-10 10:02:21 +08:00
4fe706876c Service list and minio status (#13480)
### What problem does this PR solve?

1. Resolve standard user can access admin service
2. Get RAGFlow service status
3. Fix minio status fetching

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-10 09:56:43 +08:00
4f507c0058 Docs: Updated Switch chunk availability (#13482)
### What problem does this PR solve?

A quick editorial pass.

### Type of change

- [x] Documentation Update
nightly
2026-03-09 21:14:45 +08:00
7484298c82 Refa: convert download_img to async (#13477)
### What problem does this PR solve?

Convert download_img to async.

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2026-03-09 19:00:17 +08:00
52bcd98d29 Add scheduled tasks (#13470)
### What problem does this PR solve?

1. RAGFlow server will send heartbeat periodically.
2. This PR will including:
- Scheduled task
- API server message sending
- Admin server API to receive the message.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-09 17:48:29 +08:00
c732a1c8e0 Refactor the go_binding to binding (#13469)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-09 15:52:31 +08:00
25ace613b0 feat: Added LLM factory initialization functionality and knowledge base related API interfaces (#13472)
### What problem does this PR solve?

feat: Added LLM factory initialization functionality and knowledge base
related API interfaces

refactor(dao): Refactored the TenantLLMDAO query method
feat(handler): Implemented knowledge base related API endpoints
feat(service): Added LLM API key setting functionality
feat(model): Extended the knowledge base model definition
feat(config): Added default user LLM configuration

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-09 15:52:14 +08:00
d0465ba909 refactor: improve paddle ocr logic (#13467)
### What problem does this PR solve?

improve paddle ocr logic

### Type of change
- [x] Refactoring
2026-03-09 14:16:57 +08:00
3ce236c4e3 Feat: add switch_chunks endpoint to manage chunk availability (#13435)
### What problem does this commit solve?

This commit introduces a new API endpoint
`/datasets/<dataset_id>/documents/<document_id>/chunks/switch` that
allows users to switch the availability status of specified chunks in a
document as same as chunk_app.py

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-09 12:36:45 +08:00
32d31284cc Fix: upgrade pypdf to 6.7.5 and migrate from deprecated pypdf2 to fix CVE-2026-28804 and CVE-2023-36464 (#13454)
### What problem does this PR solve?

This PR addresses security vulnerabilities in PDF processing
dependencies identified by Trivy security scan:

1. CVE-2026-28804 (MEDIUM): pypdf 6.7.4 vulnerable to inefficient
decoding of ASCIIHexDecode streams
2. CVE-2023-36464 (MEDIUM): pypdf2 3.0.1 susceptible to infinite loop
when parsing malformed comments

Since pypdf2 is deprecated with no available fixes, this PR migrates all
pypdf2 usage to the actively maintained pypdf library (version 6.7.5),
which resolves
both vulnerabilities.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-09 12:06:00 +08:00
2634cfc06f Fix: undefined variable and wrong method name in agent components (#13462)
## Summary

This PR fixes two runtime bugs in agent components:

**Bug 1: `agent/component/invoke.py` — `NameError` in POST +
`clean_html` path**

The POST method's `clean_html` branch uses the variable `sections`
without ever defining it. Both the GET and PUT branches correctly call
`sections = HtmlParser()(None, response.content)` before referencing
`sections`, but this line was missing from the POST branch (copy-paste
omission). This causes a `NameError` whenever a user configures an
Invoke component with `method="post"` and `clean_html=True`.

**Bug 2: `agent/component/data_operations.py` — `AttributeError` in
`_recursive_eval`**

The `_recursive_eval` method recursively calls `self.recursive_eval()`
(without the leading underscore) instead of `self._recursive_eval()`.
Since the method is defined as `_recursive_eval`, this causes an
`AttributeError` at runtime when the `literal_eval` operation processes
nested dicts or lists.

## Test plan

- [ ] Configure an Invoke node with `method=post` and `clean_html=True`,
verify HTML is parsed correctly without `NameError`
- [ ] Configure a DataOperations node with `operations=literal_eval` on
nested data, verify no `AttributeError`

---------

Signed-off-by: JiangNan <1394485448@qq.com>
2026-03-09 11:09:47 +08:00
610c1b507d Add more API of admin server of go (#13403)
### What problem does this PR solve?

Add APIs to admin server.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-09 10:44:53 +08:00
ab6ca75245 fix(agent): ensure database connections are properly closed in ExeSQL tool (#13427)
## Summary

Fix a database connection and cursor resource leak in the ExeSQL agent
tool.

When SQL execution raises an exception (for example syntax error or
missing table),
the existing code path skips `cursor.close()` and `db.close()`, causing
database
connections to accumulate over time.

This can eventually lead to connection exhaustion in long-running agent
workflows.

## Root Cause

The cleanup logic for database cursors and connections is placed after
the SQL
execution loop without `try/finally` protection. If an exception occurs
during
`cursor.execute()`, `fetchmany()`, or result processing, the cleanup
code is not
reached and the connection remains open.

The same issue also exists in the IBM DB2 execution path where
`ibm_db.close(conn)`
may be skipped when exceptions occur.

## Fix

- Wrap SQL execution logic in `try/finally` blocks to guarantee resource
cleanup.
- Ensure `cursor.close()` and `db.close()` are always executed.
- Add explicit `db.close()` when `db.cursor()` creation fails.
- Remove redundant close calls in early-return branches since `finally`
now handles cleanup.

## Impact

- No change to normal execution behavior.
- Ensures database resources are always released when errors occur.
- Prevents connection leaks in long-running workflows.
- Only affects `agent/tools/exesql.py`.

## Testing

Manual test scenarios:

1. Valid SQL execution
2. SQL syntax error
3. Query against a non-existing table
4. Execution cancellation during query

In all scenarios the database cursor and connection are properly closed.

Code quality checks:

- `ruff check` passed
- No new warnings introduced
2026-03-09 10:36:02 +08:00
89e495e1bc Chore: update release workflow configuration (#13466)
### What problem does this PR solve?

update release workflow configuration

### Type of change

- [x] Update CI
2026-03-09 10:32:51 +08:00
c217b8f3d8 Feat: add DingTalk AI Table connector and integration for data synch… (#13413)
### What problem does this PR solve?

Add DingTalk AI Table connector and integration for data synchronization

Issue #13400

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: wangheyang <wangheyang@corp.netease.com>
2026-03-06 21:13:23 +08:00
094eae3cf5 refactor(ui): adjust dataset page styles (#13452)
### What problem does this PR solve?

- Adjust UI styles in **Dataset** pages.
- Adjust several shared components styles
- Modify files and directory structure in `src/layouts`

### Type of change

- [x] Refactoring
2026-03-06 21:13:14 +08:00
7166a7e50e Test: adjust test priority markers for API tests (#13450)
### What problem does this PR solve?

Changed test priority markers from p1/p2 to p3 in three test files:
- test_table_parser_dataset_chat.py: Adjusted priority for table parser
dataset chat test
- test_delete_chunks.py: Updated priority for chunk deletion test with
invalid IDs
- test_retrieval_chunks.py: Modified priority for chunks retrieval
pagination test

These changes demote the priority of specific test cases to p3,
indicating they are lower priority tests that can run later in the test
suite execution.

### Type of change

- [x] Test update
2026-03-06 20:17:39 +08:00
ae4645e01b Fix: Add folder upload #9743 (#13448)
### What problem does this PR solve?

Fix: Add folder upload  #9743

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 20:17:29 +08:00
82a616589b Feat: Add PublishConfirmDialog (#13447)
### What problem does this PR solve?

Feat: Add PublishConfirmDialog

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-03-06 20:17:21 +08:00
45cf24cd2f feat(memory): implement get_highlight for OceanBase memory (#13449)
### What problem does this PR solve?

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-06 20:17:11 +08:00
01a100bb29 Fix data models (#13444)
### What problem does this PR solve?

Since database model is updated in python version, go server also need
to update

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-06 20:05:10 +08:00
3ed91345aa fix(auth): return HTTP 401 for token-auth failures (#13420)
Follow-up to #12488 #13386

### What problem does this PR solve?

Previously, token authentication failures returned HTTP 200 with an
error code in the response body.

This PR updates `token_required` to raise `Unauthorized` and relies on
the global error handler to return a structured JSON response with HTTP
401 status.

The response body structure (`code`, `message`, `data`) remains
unchanged to preserve compatibility with the official SDK.

Frontend logic has been updated to handle HTTP 401 responses in addition
to checking `data.code`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 18:18:14 +08:00
51be1f1442 Refa: empty ids means no-op operation (#13439)
### What problem does this PR solve?

Empty ids means no-op operation.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Documentation Update
- [x] Refactoring

---------

Co-authored-by: writinwaters <cai.keith@gmail.com>
2026-03-06 18:16:42 +08:00
7781c51a21 Revert aliyun registry to registry.cn-hangzhou.aliyuncs.com (#13445)
## Summary
- Revert aliyun registry from
`infiniflow-registry.cn-shanghai.cr.aliyuncs.com` back to
`registry.cn-hangzhou.aliyuncs.com`

## Test plan
- [ ] Verify the docker/.env file contains the correct registry URL

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-06 18:03:35 +08:00
826af383b4 Fix: paddle ocr missing outlines (#13441)
### What problem does this PR solve?

Fix: paddle ocr missing outlines #13422

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 17:19:51 +08:00
2504c3adde Fix docker file (#13438)
### What problem does this PR solve?

To copy infinity/resource into docker images

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-06 16:56:12 +08:00
81fd1811b8 Feat:Using Go to implement user registration logic (#13431)
### What problem does this PR solve?

Feat:Using Go to implement user registration logic

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-06 16:42:49 +08:00
37eb533fea Feat(memory): implement get_aggregation for OceanBase memory (#13428)
### What problem does this PR solve?

- Add aggregation_utils.aggregate_by_field for pure aggregation logic
- Wire OBConnection.get_aggregation to use it (unwrap tuple, pass
messages)
- Add unit tests for aggregate_by_field (no DB/heavy deps)

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-06 12:51:22 +08:00
383986dc5f fix: re-chunk documents when data source content is updated (#12918)
Closes: #12889 

### What problem does this PR solve?

When syncing external data sources (e.g., Jira, Confluence, Google
Drive), updated documents were not being re-chunked. The raw content was
correctly updated in blob storage, but the vector database retained
stale chunks, causing search results to return outdated information.

**Root cause:** The task digest used for chunk reuse optimization was
calculated only from parser configuration fields (`parser_id`,
`parser_config`, `kb_id`, etc.), without any content-dependent fields.
When a document's content changed but the parser configuration remained
the same, the system incorrectly reused old chunks instead of
regenerating new ones.

**Example scenario:**
1. User syncs a Jira issue: "Meeting scheduled for Monday"
2. User updates the Jira issue to: "Meeting rescheduled to Friday"
3. User triggers sync again
4. Raw content panel shows updated text ✓
5. Chunk panel still shows old text "Monday" ✗

**Solution:**
1. Include `update_time` and `size` in the chunking config, so the task
digest changes when document content is updated
2. Track updated documents separately in `upload_document()` and return
them for processing
3. Process updated documents through the re-parsing pipeline to
regenerate chunks


[1.webm](https://github.com/user-attachments/assets/d21d4dcd-e189-4d39-8700-053bae0ca5a0)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 12:48:47 +08:00
0214257886 Fix: init func (#13430)
### What problem does this PR solve?

Fix update_cnt add error in init_data.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 11:42:31 +08:00
6849d35bf5 Feat: Optimize the style of the chat page. (#13429)
### What problem does this PR solve?

Feat: Optimize the style of the chat page.
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-03-06 11:42:25 +08:00
6023eb27ac feat: add Ragcon provider (#13425)
### What problem does this PR solve?

This PR aims to extend the list of possible providers. Adds new Provider
"RAGcon" within the Ollama Modal. It provides all model types except OCR
via Openai-compatible endpoints.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Jakob <16180662+hauberj@users.noreply.github.com>
2026-03-06 09:37:27 +08:00
c35b210c3a fix(security): upgrade requests to 2.32.5 in agent/sandbox to fix CVE-2024-47081 (#13424)
### What problem does this PR solve?

This PR remediates CVE-2024-47081 (MEDIUM severity) in the agent/sandbox
component by upgrading the requests library from version 2.32.3 to
2.32.5. The vulnerability allows .netrc credentials to leak via
malicious URLs.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 09:29:18 +08:00
aa57bcf92a fix: upgrade urllib3 to 2.6.3 to resolve CVE-2025-66418, CVE-2025-66471, CVE-2026-21441 (#13423)
### What problem does this PR solve?

This PR remediates three HIGH severity vulnerabilities in urllib3
affecting the admin client and Python SDK:
- **CVE-2025-66418**: Unbounded decompression chain leads to resource
exhaustion
- **CVE-2025-66471**: Streaming API improperly handles highly compressed
data
- **CVE-2026-21441**: Decompression-bomb safeguard bypass when following
HTTP redirects
Trivy security scan identified urllib3 v2.5.0 as vulnerable in both
`admin/client/uv.lock` and `sdk/python/uv.lock`. This PR updates urllib3
to v2.6.3 to eliminate these security risks.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-06 09:29:10 +08:00