### What problem does this PR solve?
Codecov’s coverage report shows that several RAGFlow code paths are
currently untested or under-tested. This makes it easier for regressions
to slip in during refactors and feature work.
This PR adds targeted automated tests to cover the files and branches
highlighted by Codecov, improving confidence in core behavior while
keeping runtime functionality unchanged.
### Type of change
- [x] Other (please describe): Test coverage improvement (adds/extends
unit and integration tests to address Codecov-reported gaps)
…ff publishing this guide.
### What problem does this PR solve?
Removed failsure mode checklist per your request. @JinHai-CN
### Type of change
- [x] Documentation Update
### What problem does this PR solve?
Fix: The agent is embedded in the webpage; interrupting its operation
will redirect to the login page. #12697
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
Fixes the initial enabled/disabled state of chat variable checkboxes by
correcting a helper function that previously always returned .
## Problem
in had two statements:
Because of the early , the function always returned , so all chat
variable checkboxes were initially disabled regardless of the field.
This also made the helper inconsistent with , which enables all fields
by default except .
## Fix
Update to use the same condition as :
This ensures:
- All chat variable checkboxes are enabled by default
- remains the only field disabled by default
- Behavior is consistent between the helper and the checkbox map
initialization in .
No API or backend changes are involved; this is a small, isolated
frontend bugfix.
### What problem does this PR solve?
Feat: optimize ingestion pipeline with preprocess
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
This PR adds a new guide: **"RAG failure modes checklist"**.
RAG systems often fail in ways that are not immediately visible from a
single metric like accuracy or latency. In practice, debugging
production RAG applications requires identifying recurring failure
patterns across retrieval, routing, evaluation, and deployment stages.
This guide introduces a structured, pattern-based checklist (P01–P12) to
help users interpret traces, evaluation results, and dataset behavior
within RAGFlow. The goal is to provide a practical way to classify
incidents (e.g., retrieval hallucination, chunking issues, index
staleness, routing misalignment) and reason about minimal structural
fixes rather than ad-hoc prompt changes.
The change is documentation-only and does not modify any code or
configuration.
Refs #13138
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Codecov’s coverage report shows that several RAGFlow code paths are
currently untested or under-tested. This makes it easier for regressions
to slip in during refactors and feature work.
This PR adds targeted automated tests to cover the files and branches
highlighted by Codecov, improving confidence in core behavior while
keeping runtime functionality unchanged.
### Type of change
- [x] Other (please describe): Test coverage improvement (adds/extends
unit and integration tests to address Codecov-reported gaps)
### What problem does this PR solve?
Fix: Note component text area does not resize with component #13065
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
User experience enhancement for variable picker in prompt editor:
- Add case-insensitive string search for variables.
- Add basic keyboard navigation in variable picker:
- Hit <kbd>UpArrow</kbd> and <kbd>DownArrow</kbd> for navigating.
- Hit <kbd>Tab</kbd> or <kbd>Enter</kbd> for selecting focused item into
editor.
- Fix unexpectedly inserting invalid variable into editor by hitting
<kbd>Tab</kbd>.
_Note: you still need to pick variables inside secondary menu (agent
structured output, etc.) by using your pointing device. May finish these
later._
### Type of change
- [x] Refactoring
Actual behavior
When using OceanBase as storage, the list_chunk sorting is abnormal. The
following is the SQL statement.
SELECT id, content_with_weight, important_kwd, question_kwd, img_id,
available_int, position_int, doc_type_kwd, create_timestamp_flt,
create_time, array_to_string(page_num_int, ',') AS page_num_int_sort,
array_to_string(top_int, ',') AS top_int_sort FROM
rag_store_284250730805059584 WHERE doc_id = '' AND kb_id IN ('') ORDER
BY page_num_int_sort ASC, top_int_sort ASC, create_timestamp_flt DESC
LIMIT 0, 20
<img width="1610" height="740" alt="image"
src="https://github.com/user-attachments/assets/84e14c30-a97f-4e8f-8c8c-6ccac915d97d"
/>
Co-authored-by: Aron.Yao <yaowei@yaoweideMacBook-Pro.local>
### What problem does this PR solve?
When users start RAGFlow with `docker compose -p <alias>`, Docker
creates volumes prefixed with the alias (e.g., `myproject_mysql_data`).
The migration script (`docker/migration.sh`) previously hardcoded the
`docker_` prefix in volume names, causing backup/restore to silently
skip all volumes for any non-default project name.
This PR adds a `-p <project_name>` option so the script correctly
targets volumes regardless of the Docker Compose project name used.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### Changes
- Add `-p <project_name>` flag (default: `docker`) for specifying Docker
Compose project name
- Build volume names dynamically: `${project_name}_${base_name}`
- Update help text with new option documentation and examples
- Show project-aware `docker compose` commands in error messages
- Fix deprecated `docker-compose` to `docker compose` in hints
- Use dynamic step count instead of hardcoded `4`
- Fully backward compatible — existing usage without `-p` works
unchanged
### Usage
```bash
# Existing usage (unchanged)
./migration.sh backup
./migration.sh restore my_backup
# New: custom project name
./migration.sh -p myproject backup
./migration.sh -p myproject restore my_backup
```
### What problem does this PR solve?
Fix authorization bypass (IDOR) in `/v1/document/web_crawl` allows
Cross-Tenant Dataset Modification.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
The RDBMS (MySQL/PostgreSQL) connector generates document filenames
using the first 100 characters of the content column
(semantic_identifier). When the content contains newline characters
(\n), the resulting filename includes those newlines — for example:
Category: غير صحيح كليًا\nTitle: تفنيد حقائق....txt
RAGFlow's filename_type() function uses re.match(r".*\.txt$", filename)
to detect file types, but .* does not match newline characters by
default in Python regex. This causes the regex to fail, returning
FileType.OTHER, which triggers:
pythonraise RuntimeError("This type of file has not been supported
yet!")
As a result, all documents synced via the MySQL/PostgreSQL connector are
silently discarded. The sync logs report success (e.g., "399 docs
synchronized"), but zero documents actually appear in the dataset. This
is the root cause of issue #13001.
Root cause trace:
rdbms_connector.py → _row_to_document() sets semantic_identifier from
raw content (may contain \n)
connector_service.py → duplicate_and_parse() uses semantic_identifier as
the filename
file_service.py → upload_document() calls filename_type(filename)
file_utils.py → filename_type() regex .*\.txt$ fails on newlines →
returns FileType.OTHER
upload_document() raises "This type of file has not been supported yet!"
Fix: Sanitize the semantic_identifier in _row_to_document() by replacing
newlines and carriage returns with spaces before truncating to 100
characters.
Relates to: #13001, #12817
Type of change
Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
### What problem does this PR solve?
Fix LFI vulnerability in document parsing API.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
Fixes MinIO SSL/TLS support in two places: the MinIO **client**
connection and the **health check** used by the Admin/Service Health
dashboard. Both now respect the `secure` and `verify` settings from the
MinIO configuration.
Closes#13158Closes#13159
---
## Problem
**#13158 – MinIO client:** The client in `rag/utils/minio_conn.py` was
hardcoded with `secure=False`, so RAGFlow could not connect to MinIO
over HTTPS even when `secure: true` was set in config. There was also no
way to disable certificate verification for self-signed certs.
**#13159 – MinIO health check:** In `api/utils/health_utils.py`, the
MinIO liveness check always used `http://` for the health URL. When
MinIO was configured with SSL, the health check failed and the dashboard
showed "timeout" even though MinIO was reachable over HTTPS.
---
## Solution
### MinIO client (`rag/utils/minio_conn.py`)
- Read `MINIO.secure` (default `false`) and pass it into the `Minio()`
constructor so HTTPS is used when configured.
- Add `_build_minio_http_client()` that reads `MINIO.verify` (default
`true`). When `verify` is false, return an `urllib3.PoolManager` with
`cert_reqs=ssl.CERT_NONE` and pass it as `http_client` to `Minio()` so
self-signed certificates are accepted.
- Support string values for `secure` and `verify` (e.g. `"true"`,
`"false"`).
### MinIO health check (`api/utils/health_utils.py`)
- Add `_minio_scheme_and_verify()` to derive URL scheme (http/https) and
the `verify` flag from `MINIO.secure` and `MINIO.verify`.
- Update `check_minio_alive()` to use the correct scheme, pass `verify`
into `requests.get(..., verify=verify)`, and use `timeout=10`.
### Config template (`docker/service_conf.yaml.template`)
- Add commented optional MinIO keys `secure` and `verify` (and env vars
`MINIO_SECURE`, `MINIO_VERIFY`) so deployers know they can enable HTTPS
and optional cert verification.
### Tests
- **`test/unit_test/utils/test_health_utils_minio.py`** – Tests for
`_minio_scheme_and_verify()` and `check_minio_alive()` (scheme, verify,
status codes, timeout, errors).
- **`test/unit_test/utils/test_minio_conn_ssl.py`** – Tests for
`_build_minio_http_client()` (verify true/false/missing, string values,
`CERT_NONE` when verify is false).
---
## Testing
- Unit tests added/updated as above; run with the project's test runner.
- Manually: configure MinIO with HTTPS and `secure: true` (and
optionally `verify: false` for self-signed); confirm client operations
work and the Service Health dashboard shows MinIO as alive instead of
timeout.
### What problem does this PR solve?
Fix stored XSS via HTML file upload and inline rendering in
/v1/file/get/<id>
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Type of Change
- [x] Bug fix
## Description
Closes#13119
The current IMAP connector uses `split(',')` to parse email headers,
which crashes when a sender's display name contains a comma inside
quotes (e.g., `"Doe, John" <john@example.com>`).
This PR replaces the manual string splitting with Python's standard
`email.utils.getaddresses`. This correctly handles RFC 5322 quoted
strings and prevents the `RuntimeError: Expected a singular address`.
## Checklist
- [x] I have checked the code and it works as expected.
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
When using a chat assistant that has a hardcoded `empty_response`, that
response was not returned correctly in streaming mode when no
information is found in the knowledge base. In this case only one
response with `"content": null` was yielded. If `"references": true`,
then the `empty_response` is still put into the `final_content` so there
is technically some content returned, but when `"references": false` no
content at all is returned.
I update the OpenAI chat completion endpoint to yield an additional
response with the `empty_response` in the content.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Fixes AttributeError in _remove_reasoning_content() when LLM returns
None, and improves JSON parsing regex for markdown code fences in
agent_with_tools.py
### What problem does this PR solve?
Fix: Metadata mult-selected display error
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
- Use negative lookbehind (?<![a-zA-Z]) so \] and \) inside commands
(e.g. \right], \big)) are not treated as block/inline delimiters
- Use greedy matching to capture up to the last valid delimiter, fixing
truncated formulas (e.g. C_{seq}(y|x) = \frac{1}{|y|} ...)
- Add unit tests for preprocessLaTeX
Closes#13134
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
- Fix duplicate YAML mapping keys in `helm/templates/env.yaml` that
cause deployment failures with strict YAML parsers
## Problem
The `range` loop in `env.yaml` iterates over all `.Values.env` keys and
emits them into a Secret. The exclusion filter skips host/port/user
keys, but does **not** skip password keys (`MYSQL_PASSWORD`,
`REDIS_PASSWORD`, `MINIO_PASSWORD`, `ELASTIC_PASSWORD`,
`OPENSEARCH_PASSWORD`). These same keys are then explicitly defined
again later in the template, producing duplicate YAML mapping keys.
Go's `yaml.v3` (used by Flux's helm-controller for post-rendering)
rejects duplicate keys per the YAML spec:
```
Helm install failed: yaml: unmarshal errors:
mapping key "MINIO_PASSWORD" already defined
mapping key "MYSQL_PASSWORD" already defined
mapping key "REDIS_PASSWORD" already defined
```
Plain `helm install` does not surface this because Helm's internal
parser (`yaml.v2`) silently accepts duplicate keys (last value wins).
## Fix
Add password keys to the exclusion filter on line 12 so they are only
emitted by their explicit definitions later in the template.
Note: `MINIO_ROOT_USER` is intentionally **not** excluded — it is only
emitted by the range loop and has no explicit definition elsewhere.
Excluding it causes MinIO to crash with `Missing credential environment
variable, "MINIO_ROOT_USER"`.
## Test plan
- [ ] Deploy with Flux helm-controller (uses yaml.v3) — no duplicate key
errors
- [ ] Verify all passwords are present in the rendered Secret
- [ ] Verify `MINIO_ROOT_USER` is present in the rendered Secret
- [ ] Test with `DOC_ENGINE=elasticsearch` (ELASTIC_PASSWORD)
- [ ] Test with `DOC_ENGINE=opensearch` (OPENSEARCH_PASSWORD)
Fixes#13135
### What problem does this PR solve?
This fixes the bug described in #13130. When starting RAGFlow with
Postgres the admin tenant create failed because the rerank model was not
set.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Refact: switch from oogle-generativeai to google-genai #13132
Refact: commnet out unused pywencai.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
The Docker Compose configuration was using hub.icert.top as the registry
for the OpenSearch image. That registry is not reachable in our
environment, which causes podman pull and docker compose pull to fail
with a connection refused error. As a result, the application cannot
start because the OpenSearch image cannot be downloaded.
This PR updates the image reference to use the official Docker Hub image
(opensearchproject/opensearch:2.19.1) instead of the hub.icert.top
mirror. After this change, the image pulls successfully and the services
start as expected.

### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Co-authored-by: Shynggys Samarkhanov <shynggys.samarkhanov@nixs.com>
### What problem does this PR solve?
RAGFlow supports 12 UI languages but does not include Bulgarian. This PR
adds Bulgarian (`bg` / `Български`) as the 13th supported language,
covering the full UI translation (2001 keys across all 26 sections) and
OCR/PDF parser language mapping.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### Changes
- **`web/src/constants/common.ts`** — Registered Bulgarian in all 5
language data structures (`LanguageList`, `LanguageMap`,
`LanguageAbbreviation` enum, `LanguageAbbreviationMap`,
`LanguageTranslationMap`)
- **`web/src/locales/config.ts`** — Added lazy-loading dynamic import
for the `bg` locale
- **`web/src/locales/bg.ts`** *(new)* — Full Bulgarian translation file
with all 26 sections, matching the English source (`en.ts`). All
interpolation placeholders, HTML tags, and technical terms are preserved
as-is
- **`deepdoc/parser/mineru_parser.py`** — Mapped `'Bulgarian'` to
`'cyrillic'` in `LANGUAGE_TO_MINERU_MAP` for OCR/PDF parser support
### How it works
The language selector automatically picks up the new entry. When a user
selects "Български", the translation bundle is lazy-loaded on demand.
The preference is persisted to the database and localStorage across
sessions.
### What problem does this PR solve?
Refactor: i18n language pack for on-demand import
### Type of change
- [x] Refactoring
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Refactoring
### What problem does this PR solve?
This PR fixes missing metadata on documents synced from the Moodle
connector, especially for **Book** modules.
Background:
- Moodle Book metadata includes fields like `chapters`, which is a
`list[dict]`.
- During metadata normalization in
`DocMetadataService._split_combined_values`, list deduplication used
`dict.fromkeys(...)`.
- `dict.fromkeys(...)` fails for unhashable values (like `dict`),
causing metadata update to fail.
- Result: documents were imported, but metadata was not saved for
affected module types (notably Books).
What this PR changes:
- Replaces hash-based list deduplication with `dedupe_list(...)`, which
safely handles unhashable list items while preserving order.
- This allows Book metadata (and other complex list metadata) to be
persisted correctly.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Contribution during my time at RAGcon GmbH.
### What problem does this PR solve?
Fix: replace session page icons and fix nested list search functionality
in filters
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
What problem does this PR solve?
The sync_data_source.py module imports WebDAVConnector from
common.data_source, but WebDAVConnector was never registered in the
package's __init__.py. This causes an ImportError at startup, crashing
the data sync service:
ImportError: cannot import name 'WebDAVConnector' from
'common.data_source'
The webdav_connector.py file already exists in the common/data_source/
directory — it just wasn't exported. This PR adds the import and
registers it in __all__.
Type of change
Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
Fix the issue where the server-side parameter validation fails when the
id parameter is None in the asynchronous list_datasets method.
### What problem does this PR solve?
Fix the issue where the server-side parameter validation fails when the
id parameter is None in the asynchronous list_datasets method.
### Type of change
- [√ ] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: Bugs fixed (#13109)
- chat pdf preview error
- data source add box error
- change route next-chat -> chat , next-search->search ...
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Renamed test/unit/test_delete_query_construction.py to
test/unit_test/common/test_delete_query_construction.py to align with
the project's directory structure and improve test categorization.
### Type of change
- [x] Refactoring