Commit Graph

5494 Commits

Author SHA1 Message Date
227c852e67 Fix typo: documnet_keyword -> document_keyword in Chunk class (#13531)
### What problem does this PR solve?
The Chunk class had a typo in the attribute name 'documnet_keyword',
which caused the document_name field to remain empty when retrieving
chunks via the SDK. This fix corrects the spelling to
'document_keyword'.

Changes:
- Line 36: Changed self.documnet_keyword to self.document_keyword
- Line 52: Updated backward compatibility code to use
self.document_keyword


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-12 15:23:55 +08:00
e78938c72c Update go admin server default port to 9383 (#13559)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-12 13:41:08 +08:00
31a8184f63 refactor(ui): update ui for user settings, etc. (#13532)
### What problem does this PR solve?

Update UI styles:
- **User settings**
- Component styles: 
   - `ui/button.tsx`
   - `ui/checkbox.tsx`
   - `avatar-upload.tsx`
   - `file-uploader.tsx`
   - `icon-font.tsx`

### Type of change

- [x] Refactoring
2026-03-12 13:33:36 +08:00
0da9c4618d feat(cli): Enhance CLI functionality and add administrator mode support (#13539)
### What problem does this PR solve?

feat(cli): Enhance CLI functionality and add administrator mode support

- Modify `parseActivateUser` in `parser.go` to support 'on'/'off' states
- Add administrator mode switching and host port settings functionality
to `cli.go`
- Implement user management API calls in `client.go`

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-12 13:33:13 +08:00
4bd5bb141d Fix: data-source-detail page style (#13507)
### What problem does this PR solve?

Fix: data-source-detail page style

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-12 13:32:39 +08:00
5cbdfc5f17 Fix Gitee embedding model URL error (#13553)
### What problem does this PR solve?

As title

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-12 13:13:06 +08:00
375a910bcf Fix: add deadlock retry (#13552)
### What problem does this PR solve?

 Add deadlock retry.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-12 12:39:01 +08:00
90afce192c Add license and fingerprint API hook (#13548)
### What problem does this PR solve?

For EE

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-12 11:52:39 +08:00
2fb1360d9d Add command line parameter and fix error message (#13526)
### What problem does this PR solve?

`./server_main -p 9380`

`./server_main -h`

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-12 09:50:57 +08:00
e1b632a7bb Feat: add delete all support for delete operations (#13530)
### What problem does this PR solve?

Add delete all support for delete operations.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update

---------

Co-authored-by: writinwaters <cai.keith@gmail.com>
2026-03-12 09:47:42 +08:00
d201a81db7 Add command history in ragflow cli (#13538)
### What problem does this PR solve?

In ragflow cli,  use Up/Down arrows to navigate command history,

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-11 19:14:18 +08:00
852393c114 Test: Lower priority of chat assistant and chunk list API tests (#13540)
### What problem does this PR solve?

Mark test cases as lower priority (p3) for:
- Creating chat assistants
- Deleting chat assistants
- Listing chat assistants
- Listing chunks within datasets

### Type of change

- [x] Update testcases
2026-03-11 19:00:18 +08:00
f75dc6a452 Docs: Fix normalization of case and some code blocks (#13520)
### What problem does this PR solve?

Standardize term capitalization in `deploy_local_llm.mdx` and improve
code block formatting.

### Type of change

- [x] Documentation Update
2026-03-11 17:51:13 +08:00
1cee8b1a7b fix: use context managers for file handles to prevent resource leaks (#13514)
## Summary
- Convert bare `open()` calls to `with` context managers or
`Path.read_text()`
- File handles leak if not properly closed, especially on exceptions
- Fixes in crypt.py, sequence2txt_model.py, term_weight.py,
deepdoc/vision/__init__.py

## Test plan
- [x] File operations work correctly with context managers
- [x] Resources properly cleaned up on exceptions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 16:47:06 +08:00
6afd13ff29 Feat/arabic language support (#13516)
### What problem does this PR solve?

This PR implements comprehensive Arabic language support for the RAGFlow
application. The changes include:
- Complete Arabic translation of all UI text elements in the web
interface
- RTL (right-to-left) layout support for Arabic content
- Localization updates for all supported languages (ar, bg, de, en, es,
fr, id, it, ja, pt-br, ru, vi, zh-traditional, zh)
- UI component adjustments to properly display Arabic text and support
RTL layout

The implementation ensures that Arabic-speaking users can fully interact
with the application in their native language with proper text rendering
and layout direction.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

<img width="2866" height="1617" alt="image"
src="https://github.com/user-attachments/assets/f2751b34-1b65-4867-b81d-a1068c17b9b7"
/>

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-11 15:06:07 +08:00
9ca2bac984 Feat: Implement user creation, deletion, and permission management functionality. (#13519)
### What problem does this PR solve?

Feat: Implement user creation, deletion, and permission management
functionality.

- Added the `ListByEmail` method to `user.go` to query users by email
address.

- Updated the user activation status handling logic in `handler.go`,
adding input validation.

- Added RSA password decryption functionality to `password.go`.

- Implemented complete user management functionality in `service.go`,
including user creation, deletion, password modification, activation
status, and permission management.

- Added input validation and error handling logic.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-11 14:04:00 +08:00
2028e895fd Add license and time record DAO (#13522)
### What problem does this PR solve?

1. Change go server default port to 9382
2. Compatible with EE data model.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-11 14:02:24 +08:00
1815f5950b Call get_flatted_meta_by_kbs in dify retrieval (#13509)
### What problem does this PR solve?

Fix https://github.com/infiniflow/ragflow/issues/13388

Call get_flatted_meta_by_kbs in dify retrieval. Remove get_meta_by_kbs.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-11 13:42:24 +08:00
2d2d3cdbcf Fix document metadata loading for paged listings (#13515)
## Summary
- scope normal document-list metadata lookups to the current page's
document IDs
- keep the `return_empty_metadata=True` path dataset-wide because it
needs full knowledge of docs that already have metadata
- add unit tests for both paged listing paths and the unchanged
empty-metadata behavior

## Why
`DocumentService.get_list()` and the normal `get_by_kb_id()` path were
calling `DocMetadataService.get_metadata_for_documents(None, kb_id)`,
which loads metadata for the entire dataset on every page request.

That becomes especially problematic on large datasets. The metadata scan
path paginates through the full metadata index without an explicit sort,
while the ES helper only switches to `search_after` beyond `10000`
results when a sort is present. In practice this can lead to unnecessary
full-dataset metadata work, slower document-list loading, and unreliable
`meta_fields` in list responses for large KBs.

This change keeps the existing empty-metadata filter behavior intact,
but scopes normal list responses to metadata for the current page only.
2026-03-11 13:42:16 +08:00
507ba4ea20 refactor(ui): update knowledge graph, chunk, metadata, agent log styles (#13518)
### What problem does this PR solve?

Update UI styles:
- **Dataset** > **Knowledge graph** tooltip
- **Dataset** > **Files** > **Manage metadata** modal
- **Dataset** > **Files** > **Modify Chunking Method** > **Auto
metadata** > **Manage generation settings** modal
- **Agent** > **Canvas (Ingestion pipeline)** > **Dataflow result**

### Type of change

- [x] Refactoring
2026-03-11 11:27:20 +08:00
2133fd76a8 Add auth middleware (#13506)
### What problem does this PR solve?

Use auth middle-ware to check authorization.

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-11 11:23:13 +08:00
d0ca388bec Refa: implement unified lazy image loading for Docx parsers (qa/manual) (#13329)
## Summary
This PR is the direct successor to the previous `docx` lazy-loading
implementation. It addresses the technical debt intentionally left out
in the last PR by fully migrating the `qa` and `manual` parsing
strategies to the new lazy-loading model.

Additionally, this PR comprehensively refactors the underlying `docx`
parsing pipeline to eliminate significant code redundancy and introduces
robust fallback mechanisms to handle completely corrupted image streams
safely.


## What's Changed

* **Centralized Abstraction (`docx_parser.py`)**: Moved the
`get_picture` extraction logic up to the `RAGFlowDocxParser` base class.
Previously, `naive`, `qa`, and `manual` parsers maintained separate,
redundant copies of this method. All downstream strategies now natively
gather raw blobs and return `LazyDocxImage` objects automatically.
* **Robust Corrupted Image Fallback (`docx_parser.py`)**: Handled edge
cases where `python-docx` encounters critically malformed magic headers.
Implemented an explicit `try-except` structure that safely intercepts
`UnrecognizedImageError` (and similar exceptions) and seamlessly falls
back to retrieving the raw binary via `getattr(related_part, "blob",
None)`, preventing parser crashes on damaged documents.

* **Legacy Code & Redundancy Purge**:
* Removed the duplicate `get_picture` methods from `naive.py`, `qa.py`,
and `manual.py`.
* Removed the standalone, immediate-decoding `concat_img` method in
`manual.py`. It has been completely replaced by the globally unified,
lazy-loading-compatible `rag.nlp.concat_img`.
* Cleaned up unused legacy imports (e.g., `PIL.Image`, docx exception
packages) across all updated strategy files.

## Scope
To keep this PR focused, I have restricted these changes strictly to the
unification of `docx` extraction logic and the lazy-load migration of
`qa` and `manual`.

## Validation & Testing
I've tested this to ensure no regressions and validated the fallback
logic:

* **Output Consistency**: Compared identical `.docx` inputs using `qa`
and `manual` strategies before and after this branch: chunk counts,
extracted text, table HTML, and attached images match perfectly.
* **Memory Footprint Drop**: Confirmed a noticeable drop in peak memory
usage when processing image-dense documents through the `qa` and
`manual` pipelines, bringing them up to parity with the `naive`
strategy's performance gains.

## Breaking Changes
* None.
2026-03-11 10:00:07 +08:00
d36e3c97d1 Feat: Add a user_id field to the message and retrieval operators. (#13508)
### What problem does this PR solve?

Feat: Add a user_id field to the message and retrieval operators.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-03-10 22:18:27 +08:00
3c80a0ae09 Fix: support vLLM's new reasoning field (#13493)
### What problem does this PR solve?

Support vLLM's new reasoning field

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 21:13:14 +08:00
yzy
07c9cf6cbe Fix: return structured JSON output for non-streaming agent API (#13389)
### What problem does this PR solve?

Previously, when an Agent component was configured with structured
output, the non-streaming /agents/{agent_id}/completions API never
returned the structured field in its response.

The root cause: the non-streaming code path only collected message
events to build full_content, then returned the workflow_finished
payload — which only contains the output of the last component in the
execution path (typically a Message component).
Any structured output set by upstream components (e.g., Agent or LLM)
was silently discarded.

This PR fixes the non-streaming handler to iterate node_finished events
and collect structured output from intermediate components.
If any component produced a non-empty structured value, it is included
in the final response under data.structured. The streaming path is
unaffected, as it already exposes node_finished events to the caller.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 19:22:04 +08:00
08f83ff331 Feat: Support get aggregated parsing status to dataset via the API (#13481)
### What problem does this PR solve?

Support getting aggregated parsing status to dataset via the API

Issue: #12810

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: heyang.why <heyang.why@alibaba-inc.com>
2026-03-10 18:05:45 +08:00
68a623154a Fix: bin directory cannot be copied to docker image introduced by #13444 (#13502)
### What problem does this PR solve?

bin directory cannot be copied to docker image introduced by

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 17:31:20 +08:00
f14b53c764 feat(admin): Implemented default administrator initialization and login functionality. (#13504)
### What problem does this PR solve?

feat(admin): Implemented default administrator initialization and login
functionality.

Added support for default administrator configuration, including super
user nickname, email, and password.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 17:30:21 +08:00
81461b4505 Fix: The number of deleted session prompts is displayed incorrectly. #13499 (#13500)
### What problem does this PR solve?

Fix: The number of deleted session prompts is displayed incorrectly.
#13499
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 16:01:31 +08:00
675810e0cf Refact: optimize confluence performance (#13497)
### What problem does this PR solve?

Refact: optimize confluence performance #13494

### Type of change

- [x] Refactoring
2026-03-10 15:02:24 +08:00
9ba43ae4ee Fix "Coordinate lower is less than upper" error with MinerU (#13483)
### What problem does this PR solve?

Fixes #6004 #7142 #11959

Unlike #9207 we actually normalize the coordinates here

### Type of change

- [X] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 15:02:01 +08:00
aaf900cf16 Feat: Display release status in agent version history. (#13479)
### What problem does this PR solve?
Feat: Display release status in agent version history.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: balibabu <assassin_cike@163.com>
2026-03-10 14:25:27 +08:00
249b78561b Fix missmatch docnm_kwd in raptor chunks (#13451)
### What problem does this PR solve?

issue #13393 

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 14:24:33 +08:00
185ab0d4ef Fix delete_document_metadata (#13496)
### What problem does this PR solve?

Avoid getting doc in function delete_document_metadata as the doc might
have been removed.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 13:44:24 +08:00
7143954b48 Fix: chats_openai in none stream condition (#13495)
### What problem does this PR solve?

Fix: chats_openai in none stream condition #13453

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 13:44:17 +08:00
7c92f51133 Fix retrieval function when metadata_condtion is specified in retrieval API (#13473)
### What problem does this PR solve?

Fix https://github.com/infiniflow/ragflow/issues/13388

The following command returns empty when there is doc with the meta data
```
curl --request POST \
     --url http://localhost:9222/api/v1/retrieval \
     --header 'Content-Type: application/json' \
     --header 'Authorization: Bearer ragflow-fO3mPFePfLgUYg8-9gjBVVXbvHqrvMPLGaW0P86PvAk' \
     --data '{
          "question": "any question",
          "dataset_ids": ["9bb4f0591b8811f18a4a84ba59049aa3"],
           "metadata_condition": {
            "logic": "and",
            "conditions": [
              {
                "name": "character",
                "comparison_operator": "is",
                "value": "刘备"
              }
            ]
          }
     }'
```

When metadata_condtion is specified in the retrieval API, it is
converted to doc_ids and doc_ids is passed to retrieval function.
In retrieval funciton, when doc_ids is explicitly provided , we should
bypass threshold.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-10 11:57:32 +08:00
292a1a8566 fix: detect and fallback garbled PDF text to OCR (#13366) (#13404)
## Problem

When PDF fonts lack ToUnicode/CMap mappings, pdfplumber (pdfminer)
cannot map CIDs to correct Unicode characters, outputting PUA characters
(U+E000~U+F8FF) or `(cid:xxx)` placeholders. The original code fully
trusted pdfplumber text without any garbled detection, causing garbled
output in the final parsed result.

Relates to #13366

## Solution

### 1. Garbled text detection functions
- `_is_garbled_char(ch)`: Detects PUA characters (BMP/Plane 15/16),
replacement character U+FFFD, control characters, and
unassigned/surrogate codepoints
- `_is_garbled_text(text, threshold)`: Calculates garbled ratio and
detects `(cid:xxx)` patterns

### 2. Box-level fallback (in `__ocr()`)
When a text box has ≥50% garbled characters, discard pdfplumber text and
fallback to OCR recognition.

### 3. Page-level detection (in `__images__()`)
Sample characters from each page; if garbled rate ≥30%, clear all
pdfplumber characters for that page, forcing full OCR.

### 4. Layout recognizer CID filtering
Filter out `(cid:xxx)` patterns in `layout_recognizer.py` text
processing to prevent them from polluting layout analysis.

## Testing
- 29 unit tests covering: normal CJK/English text, PUA characters, CID
patterns, mixed text, boundary thresholds, edge cases
- All 85 existing project unit tests pass without regression
2026-03-10 11:20:31 +08:00
7f6a9e8ee9 Update ext field type of heartbeat message (#13490)
### What problem does this PR solve?

As title

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-10 10:49:39 +08:00
02108772d8 refactor: Moves the LLM factory initialization logic to the dao package. (#13476)
### What problem does this PR solve?

refactor: Moves the LLM factory initialization logic to the `dao`
package.

Removes the `init_data` package and integrates the LLM factory
initialization functionality into the `dao` package.
Adds a `utility` package to provide general utility functions.
Updates `server_main.go` to use the new initialization path.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-03-10 10:35:55 +08:00
88a40b95a2 fix: include missing modules in ragflow-cli PyPI package (#13457)
## Problem

The `ragflow-cli` PyPI package (v0.24.0) is missing `http_client.py`,
`ragflow_client.py`, and `user.py`, causing import errors when installed
from PyPI.

## Root Cause

`pyproject.toml` only lists `ragflow_cli` and `parser` in
`[tool.setuptools] py-modules`.

## Fix

Add the three missing modules to `py-modules`.

Fixes #13456

Co-authored-by: atian8179 <atian8179@users.noreply.github.com>
2026-03-10 10:02:21 +08:00
4fe706876c Service list and minio status (#13480)
### What problem does this PR solve?

1. Resolve standard user can access admin service
2. Get RAGFlow service status
3. Fix minio status fetching

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-10 09:56:43 +08:00
4f507c0058 Docs: Updated Switch chunk availability (#13482)
### What problem does this PR solve?

A quick editorial pass.

### Type of change

- [x] Documentation Update
2026-03-09 21:14:45 +08:00
7484298c82 Refa: convert download_img to async (#13477)
### What problem does this PR solve?

Convert download_img to async.

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2026-03-09 19:00:17 +08:00
52bcd98d29 Add scheduled tasks (#13470)
### What problem does this PR solve?

1. RAGFlow server will send heartbeat periodically.
2. This PR will including:
- Scheduled task
- API server message sending
- Admin server API to receive the message.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-09 17:48:29 +08:00
c732a1c8e0 Refactor the go_binding to binding (#13469)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-09 15:52:31 +08:00
25ace613b0 feat: Added LLM factory initialization functionality and knowledge base related API interfaces (#13472)
### What problem does this PR solve?

feat: Added LLM factory initialization functionality and knowledge base
related API interfaces

refactor(dao): Refactored the TenantLLMDAO query method
feat(handler): Implemented knowledge base related API endpoints
feat(service): Added LLM API key setting functionality
feat(model): Extended the knowledge base model definition
feat(config): Added default user LLM configuration

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-09 15:52:14 +08:00
d0465ba909 refactor: improve paddle ocr logic (#13467)
### What problem does this PR solve?

improve paddle ocr logic

### Type of change
- [x] Refactoring
2026-03-09 14:16:57 +08:00
3ce236c4e3 Feat: add switch_chunks endpoint to manage chunk availability (#13435)
### What problem does this commit solve?

This commit introduces a new API endpoint
`/datasets/<dataset_id>/documents/<document_id>/chunks/switch` that
allows users to switch the availability status of specified chunks in a
document as same as chunk_app.py

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-09 12:36:45 +08:00
32d31284cc Fix: upgrade pypdf to 6.7.5 and migrate from deprecated pypdf2 to fix CVE-2026-28804 and CVE-2023-36464 (#13454)
### What problem does this PR solve?

This PR addresses security vulnerabilities in PDF processing
dependencies identified by Trivy security scan:

1. CVE-2026-28804 (MEDIUM): pypdf 6.7.4 vulnerable to inefficient
decoding of ASCIIHexDecode streams
2. CVE-2023-36464 (MEDIUM): pypdf2 3.0.1 susceptible to infinite loop
when parsing malformed comments

Since pypdf2 is deprecated with no available fixes, this PR migrates all
pypdf2 usage to the actively maintained pypdf library (version 6.7.5),
which resolves
both vulnerabilities.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-09 12:06:00 +08:00
2634cfc06f Fix: undefined variable and wrong method name in agent components (#13462)
## Summary

This PR fixes two runtime bugs in agent components:

**Bug 1: `agent/component/invoke.py` — `NameError` in POST +
`clean_html` path**

The POST method's `clean_html` branch uses the variable `sections`
without ever defining it. Both the GET and PUT branches correctly call
`sections = HtmlParser()(None, response.content)` before referencing
`sections`, but this line was missing from the POST branch (copy-paste
omission). This causes a `NameError` whenever a user configures an
Invoke component with `method="post"` and `clean_html=True`.

**Bug 2: `agent/component/data_operations.py` — `AttributeError` in
`_recursive_eval`**

The `_recursive_eval` method recursively calls `self.recursive_eval()`
(without the leading underscore) instead of `self._recursive_eval()`.
Since the method is defined as `_recursive_eval`, this causes an
`AttributeError` at runtime when the `literal_eval` operation processes
nested dicts or lists.

## Test plan

- [ ] Configure an Invoke node with `method=post` and `clean_html=True`,
verify HTML is parsed correctly without `NameError`
- [ ] Configure a DataOperations node with `operations=literal_eval` on
nested data, verify no `AttributeError`

---------

Signed-off-by: JiangNan <1394485448@qq.com>
2026-03-09 11:09:47 +08:00