### What problem does this PR solve?
RAGFlow server isn't available when admin server isn't connected.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Feature (System Settings): Implemented system settings management
functionality
- Added a new SystemSettings model, including creation and update time
fields.
- Implemented SystemSettingsDAO, providing CRUD operations and
transaction support.
- Implemented management interfaces for variables, configurations, and
environment variables in the admin service.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
### What problem does this PR solve?
This PR fixes two security vulnerabilities in web dependencies
identified by Trivy:
1. CVE-2025-13465 (lodash): Prototype pollution vulnerability in _.unset
and _.omit functions
2. CVE-2026-0540 (dompurify): Cross-site scripting (XSS) vulnerability
**Changes:**
- Upgraded lodash from 4.17.21 to 4.17.23
- Upgraded dompurify from 3.3.1 to 3.3.2
- Added npm override to force monaco-editor's transitive dependency on
dompurify to use 3.3.2 (monaco-editor still depends on vulnerable 3.2.7)
Both upgrades are backward-compatible patch versions. Build verified
successfully with no breaking changes.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
- Add documentation for the `-p project_name` flag in the migration
script, covering all steps (stop, backup, restore, start)
- Add a note explaining how Docker volume name prefixes relate to the
Compose project name
- Update `docker-compose` to `docker compose` (Compose V2 syntax) for
consistency
- Fix `sh` to `bash` to match the script's shebang line
This is the documentation follow-up to #12187 which added `-p` project
name support to `docker/migration.sh`.
## Test plan
- [ ] Verify the documentation renders correctly on the docs site
- [ ] Confirm all example commands are accurate against the current
`migration.sh`
### What problem does this PR solve?
Implement: minio, s3, oss, azure_sas, azure_spn, gcs, opendal
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fixes#13285
When an LLM returns a transient error (e.g. overloaded) during parsing,
the task progress is set to -1. Previously, the progress could never be
updated again, leaving the document permanently stuck in FAIL status
even after the task successfully recovered and completed.
Three coordinated changes address this:
1. task_service.update_progress: relax the progress update guard to
accept prog >= 1 even when current progress is -1, so a task that
recovers from a transient failure can report completion.
2. document_service.get_unfinished_docs: include documents that are
marked FAIL (progress == -1) but still have at least one non-failed task
(task.progress >= 0) in the polling set, so their status can be
re-synced once a task recovers. Documents where all tasks have
permanently failed are excluded to avoid unnecessary polling.
3. document_service.update_progress: explicitly set document status to
RUNNING when not all tasks have finished, instead of preserving whatever
stale status (potentially FAIL) the document previously had.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: image pdf in ingestion pipeline #13550
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
This PR adds support for parsing PDFs through an external Docling
server, so RAGFlow can connect to remote `docling serve` deployments
instead of relying only on local in-process Docling.
It addresses the feature request in
[#13426](https://github.com/infiniflow/ragflow/issues/13426) and aligns
with the external-server usage pattern already used by MinerU.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What is changed?
- Add external Docling server support in `DoclingParser`:
- Use `DOCLING_SERVER_URL` to enable remote parsing mode.
- Try `POST /v1/convert/source` first, and fallback to
`/v1alpha/convert/source`.
- Keep existing local Docling behavior when `DOCLING_SERVER_URL` is not
set.
- Wire Docling env settings into parser invocation paths:
- `rag/app/naive.py`
- `rag/flow/parser/parser.py`
- Add Docling env hints in constants and update docs:
- `docs/guides/dataset/select_pdf_parser.md`
- `docs/guides/agent/agent_component_reference/parser.md`
- `docs/faq.mdx`
### Why this approach?
This keeps the change focused on one issue and one capability (external
Docling connectivity), without introducing unrelated provider-model
plumbing.
### Validation
- Static checks:
- `python -m py_compile` on changed Python files
- `python -m ruff check` on changed Python files
- Functional checks:
- Remote v1 endpoint path works
- v1alpha fallback works
- Local Docling path remains available when server URL is unset
### Related links
- Feature request: [Support external Docling server (issue
#13426)](https://github.com/infiniflow/ragflow/issues/13426)
- Compare view for this branch:
[main...feat/docling-server](https://github.com/infiniflow/ragflow/compare/main...spider-yamet:ragflow:feat/docling-server?expand=1)
##### Fixes [#13426](https://github.com/infiniflow/ragflow/issues/13426)
## Summary
Fix knowledge-base chat retrieval when no individual document IDs are
selected.
## Root Cause
`async_chat()` initialized `doc_ids` as an empty list when the request
did not explicitly select documents. That empty list was then forwarded
into retrieval as an active `doc_id` filter, effectively becoming
`doc_id IN []` and suppressing all chunk matches.
## Changes
- treat missing selected document IDs as `None` instead of `[]`
- keep explicit document filtering when IDs are actually provided
- add regression coverage for the shared chat retrieval path
## Validation
- `python3 -m py_compile api/db/services/dialog_service.py
test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py`
- `.venv/bin/python -m pytest
test/unit_test/api/db/services/test_dialog_service_use_sql_source_columns.py`
- manually verified that chat completions again inject retrieved
knowledge into the prompt
---------
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
### What problem does this PR solve?
The Chunk class had a typo in the attribute name 'documnet_keyword',
which caused the document_name field to remain empty when retrieving
chunks via the SDK. This fix corrects the spelling to
'document_keyword'.
Changes:
- Line 36: Changed self.documnet_keyword to self.document_keyword
- Line 52: Updated backward compatibility code to use
self.document_keyword
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
feat(cli): Enhance CLI functionality and add administrator mode support
- Modify `parseActivateUser` in `parser.go` to support 'on'/'off' states
- Add administrator mode switching and host port settings functionality
to `cli.go`
- Implement user management API calls in `client.go`
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
As title
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
For EE
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
`./server_main -p 9380`
`./server_main -h`
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Add delete all support for delete operations.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
---------
Co-authored-by: writinwaters <cai.keith@gmail.com>
### What problem does this PR solve?
In ragflow cli, use Up/Down arrows to navigate command history,
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Mark test cases as lower priority (p3) for:
- Creating chat assistants
- Deleting chat assistants
- Listing chat assistants
- Listing chunks within datasets
### Type of change
- [x] Update testcases
### What problem does this PR solve?
Standardize term capitalization in `deploy_local_llm.mdx` and improve
code block formatting.
### Type of change
- [x] Documentation Update
## Summary
- Convert bare `open()` calls to `with` context managers or
`Path.read_text()`
- File handles leak if not properly closed, especially on exceptions
- Fixes in crypt.py, sequence2txt_model.py, term_weight.py,
deepdoc/vision/__init__.py
## Test plan
- [x] File operations work correctly with context managers
- [x] Resources properly cleaned up on exceptions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
### What problem does this PR solve?
This PR implements comprehensive Arabic language support for the RAGFlow
application. The changes include:
- Complete Arabic translation of all UI text elements in the web
interface
- RTL (right-to-left) layout support for Arabic content
- Localization updates for all supported languages (ar, bg, de, en, es,
fr, id, it, ja, pt-br, ru, vi, zh-traditional, zh)
- UI component adjustments to properly display Arabic text and support
RTL layout
The implementation ensures that Arabic-speaking users can fully interact
with the application in their native language with proper text rendering
and layout direction.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
<img width="2866" height="1617" alt="image"
src="https://github.com/user-attachments/assets/f2751b34-1b65-4867-b81d-a1068c17b9b7"
/>
---------
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
### What problem does this PR solve?
Feat: Implement user creation, deletion, and permission management
functionality.
- Added the `ListByEmail` method to `user.go` to query users by email
address.
- Updated the user activation status handling logic in `handler.go`,
adding input validation.
- Added RSA password decryption functionality to `password.go`.
- Implemented complete user management functionality in `service.go`,
including user creation, deletion, password modification, activation
status, and permission management.
- Added input validation and error handling logic.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
1. Change go server default port to 9382
2. Compatible with EE data model.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Fix https://github.com/infiniflow/ragflow/issues/13388
Call get_flatted_meta_by_kbs in dify retrieval. Remove get_meta_by_kbs.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
- scope normal document-list metadata lookups to the current page's
document IDs
- keep the `return_empty_metadata=True` path dataset-wide because it
needs full knowledge of docs that already have metadata
- add unit tests for both paged listing paths and the unchanged
empty-metadata behavior
## Why
`DocumentService.get_list()` and the normal `get_by_kb_id()` path were
calling `DocMetadataService.get_metadata_for_documents(None, kb_id)`,
which loads metadata for the entire dataset on every page request.
That becomes especially problematic on large datasets. The metadata scan
path paginates through the full metadata index without an explicit sort,
while the ES helper only switches to `search_after` beyond `10000`
results when a sort is present. In practice this can lead to unnecessary
full-dataset metadata work, slower document-list loading, and unreliable
`meta_fields` in list responses for large KBs.
This change keeps the existing empty-metadata filter behavior intact,
but scopes normal list responses to metadata for the current page only.
### What problem does this PR solve?
Use auth middle-ware to check authorization.
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
## Summary
This PR is the direct successor to the previous `docx` lazy-loading
implementation. It addresses the technical debt intentionally left out
in the last PR by fully migrating the `qa` and `manual` parsing
strategies to the new lazy-loading model.
Additionally, this PR comprehensively refactors the underlying `docx`
parsing pipeline to eliminate significant code redundancy and introduces
robust fallback mechanisms to handle completely corrupted image streams
safely.
## What's Changed
* **Centralized Abstraction (`docx_parser.py`)**: Moved the
`get_picture` extraction logic up to the `RAGFlowDocxParser` base class.
Previously, `naive`, `qa`, and `manual` parsers maintained separate,
redundant copies of this method. All downstream strategies now natively
gather raw blobs and return `LazyDocxImage` objects automatically.
* **Robust Corrupted Image Fallback (`docx_parser.py`)**: Handled edge
cases where `python-docx` encounters critically malformed magic headers.
Implemented an explicit `try-except` structure that safely intercepts
`UnrecognizedImageError` (and similar exceptions) and seamlessly falls
back to retrieving the raw binary via `getattr(related_part, "blob",
None)`, preventing parser crashes on damaged documents.
* **Legacy Code & Redundancy Purge**:
* Removed the duplicate `get_picture` methods from `naive.py`, `qa.py`,
and `manual.py`.
* Removed the standalone, immediate-decoding `concat_img` method in
`manual.py`. It has been completely replaced by the globally unified,
lazy-loading-compatible `rag.nlp.concat_img`.
* Cleaned up unused legacy imports (e.g., `PIL.Image`, docx exception
packages) across all updated strategy files.
## Scope
To keep this PR focused, I have restricted these changes strictly to the
unification of `docx` extraction logic and the lazy-load migration of
`qa` and `manual`.
## Validation & Testing
I've tested this to ensure no regressions and validated the fallback
logic:
* **Output Consistency**: Compared identical `.docx` inputs using `qa`
and `manual` strategies before and after this branch: chunk counts,
extracted text, table HTML, and attached images match perfectly.
* **Memory Footprint Drop**: Confirmed a noticeable drop in peak memory
usage when processing image-dense documents through the `qa` and
`manual` pipelines, bringing them up to parity with the `naive`
strategy's performance gains.
## Breaking Changes
* None.
### What problem does this PR solve?
Feat: Add a user_id field to the message and retrieval operators.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Previously, when an Agent component was configured with structured
output, the non-streaming /agents/{agent_id}/completions API never
returned the structured field in its response.
The root cause: the non-streaming code path only collected message
events to build full_content, then returned the workflow_finished
payload — which only contains the output of the last component in the
execution path (typically a Message component).
Any structured output set by upstream components (e.g., Agent or LLM)
was silently discarded.
This PR fixes the non-streaming handler to iterate node_finished events
and collect structured output from intermediate components.
If any component produced a non-empty structured value, it is included
in the final response under data.structured. The streaming path is
unaffected, as it already exposes node_finished events to the caller.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Support getting aggregated parsing status to dataset via the API
Issue: #12810
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: heyang.why <heyang.why@alibaba-inc.com>
### What problem does this PR solve?
bin directory cannot be copied to docker image introduced by
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
feat(admin): Implemented default administrator initialization and login
functionality.
Added support for default administrator configuration, including super
user nickname, email, and password.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: The number of deleted session prompts is displayed incorrectly.
#13499
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fixes#6004#7142#11959
Unlike #9207 we actually normalize the coordinates here
### Type of change
- [X] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Display release status in agent version history.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: balibabu <assassin_cike@163.com>
### What problem does this PR solve?
Avoid getting doc in function delete_document_metadata as the doc might
have been removed.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: chats_openai in none stream condition #13453
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix https://github.com/infiniflow/ragflow/issues/13388
The following command returns empty when there is doc with the meta data
```
curl --request POST \
--url http://localhost:9222/api/v1/retrieval \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer ragflow-fO3mPFePfLgUYg8-9gjBVVXbvHqrvMPLGaW0P86PvAk' \
--data '{
"question": "any question",
"dataset_ids": ["9bb4f0591b8811f18a4a84ba59049aa3"],
"metadata_condition": {
"logic": "and",
"conditions": [
{
"name": "character",
"comparison_operator": "is",
"value": "刘备"
}
]
}
}'
```
When metadata_condtion is specified in the retrieval API, it is
converted to doc_ids and doc_ids is passed to retrieval function.
In retrieval funciton, when doc_ids is explicitly provided , we should
bypass threshold.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Problem
When PDF fonts lack ToUnicode/CMap mappings, pdfplumber (pdfminer)
cannot map CIDs to correct Unicode characters, outputting PUA characters
(U+E000~U+F8FF) or `(cid:xxx)` placeholders. The original code fully
trusted pdfplumber text without any garbled detection, causing garbled
output in the final parsed result.
Relates to #13366
## Solution
### 1. Garbled text detection functions
- `_is_garbled_char(ch)`: Detects PUA characters (BMP/Plane 15/16),
replacement character U+FFFD, control characters, and
unassigned/surrogate codepoints
- `_is_garbled_text(text, threshold)`: Calculates garbled ratio and
detects `(cid:xxx)` patterns
### 2. Box-level fallback (in `__ocr()`)
When a text box has ≥50% garbled characters, discard pdfplumber text and
fallback to OCR recognition.
### 3. Page-level detection (in `__images__()`)
Sample characters from each page; if garbled rate ≥30%, clear all
pdfplumber characters for that page, forcing full OCR.
### 4. Layout recognizer CID filtering
Filter out `(cid:xxx)` patterns in `layout_recognizer.py` text
processing to prevent them from polluting layout analysis.
## Testing
- 29 unit tests covering: normal CJK/English text, PUA characters, CID
patterns, mixed text, boundary thresholds, edge cases
- All 85 existing project unit tests pass without regression
### What problem does this PR solve?
As title
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
refactor: Moves the LLM factory initialization logic to the `dao`
package.
Removes the `init_data` package and integrates the LLM factory
initialization functionality into the `dao` package.
Adds a `utility` package to provide general utility functions.
Updates `server_main.go` to use the new initialization path.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: Jin Hai <haijin.chn@gmail.com>