### What problem does this PR solve?
This PR adds comprehensive **Right-to-Left (RTL) language support**,
primarily targeting Arabic and other RTL scripts (Hebrew, Persian, Urdu,
etc.).
Previously, RTL content had multiple rendering issues:
- Incorrect sentence splitting for Arabic punctuation in citation logic
- Misaligned text in chat messages and markdown components
- Improper positioning of blockquotes and “think” sections
- Incorrect table alignment
- Citation placement ambiguity in RTL prompts
- UI layout inconsistencies when mixing LTR and RTL text
This PR introduces backend and frontend improvements to properly detect,
render, and style RTL content while preserving existing LTR behavior.
#### Backend
- Updated sentence boundary regex in `rag/nlp/search.py` to include
Arabic punctuation:
- `،` (comma)
- `؛` (semicolon)
- `؟` (question mark)
- `۔` (Arabic full stop)
- Ensures citation insertion works correctly in RTL sentences.
- Updated citation prompt instructions to clarify citation placement
rules for RTL languages.
#### Frontend
- Introduced a new utility: `text-direction.ts`
- Detects text direction based on Unicode ranges.
- Supports Arabic, Hebrew, Syriac, Thaana, and related scripts.
- Provides `getDirAttribute()` for automatic `dir` assignment.
- Applied dynamic `dir` attributes across:
- Markdown rendering
- Chat messages
- Search results
- Tables
- Hover cards and reference popovers
- Added proper RTL styling in LESS:
- Text alignment adjustments
- Blockquote border flipping
- Section indentation correction
- Table direction switching
- Use of `<bdi>` for figure labels to prevent bidirectional conflicts
#### DevOps / Environment
- Added Windows backend launch script with retry handling.
- Updated dependency metadata.
- Adjusted development-only React debugging behavior.
---
### Type of change
- [x] Bug Fix (non-breaking change which fixes RTL rendering and
citation issues)
- [x] New Feature (non-breaking change which adds RTL detection and
dynamic direction handling)
---------
Co-authored-by: 6ba3i <isbaaoui09@gmail.com>
Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
Co-authored-by: Ahmad Intisar <168020872+ahmadintisar@users.noreply.github.com>
Co-authored-by: Liu An <asiro@qq.com>
### What problem does this PR solve?
This PR fixes 2 bugs related to RAGFlow's init superuser functionality.
#### Bug 1
When the RAGFlow server was started with the `--init-superuser` option
it would always create a new admin user even if it already exists
resulting in duplicate users.
To fix this, I added an additional check before create the superuser and
added the *unique* constraint to the email column of the database, to
mitigate potential TOCTOU race conditions. Since existing databases
could contain duplicate emails I added email de-duplication to the
database migration.
#### Bug 2
When the RAGFlow server was started with the `--init-superuser` option
but without configured default LLM and embedding models it would fail to
start because the `init_superuser` function would always make test
request to the models even if they were not set.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix [#13210](https://github.com/infiniflow/ragflow/issues/13210)
Remove limit in _search_metadata, use pagination in _search_metadata.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Fixes AttributeError in _remove_reasoning_content() when LLM returns
None, and improves JSON parsing regex for markdown code fences in
agent_with_tools.py
### What problem does this PR solve?
This fixes the bug described in #13130. When starting RAGFlow with
Postgres the admin tenant create failed because the rerank model was not
set.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
This PR fixes missing metadata on documents synced from the Moodle
connector, especially for **Book** modules.
Background:
- Moodle Book metadata includes fields like `chapters`, which is a
`list[dict]`.
- During metadata normalization in
`DocMetadataService._split_combined_values`, list deduplication used
`dict.fromkeys(...)`.
- `dict.fromkeys(...)` fails for unhashable values (like `dict`),
causing metadata update to fail.
- Result: documents were imported, but metadata was not saved for
affected module types (notably Books).
What this PR changes:
- Replaces hash-based list deduplication with `dedupe_list(...)`, which
safely handles unhashable list items while preserving order.
- This allows Book metadata (and other complex list metadata) to be
persisted correctly.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
Contribution during my time at RAGcon GmbH.
When match_expressions contains coroutine objects (from GraphRAG's
Dealer.get_vector()), the code cannot identify this type because it only
checks for MatchTextExpr, MatchDenseExpr, or FusionExpr.
As a result:
score_func remains initialized as an empty string ""
This empty string is appended to the output list
The output list is passed to Infinity SDK's table_instance.output()
method
Infinity's SQL parser (via sqlglot) fails to parse the empty string,
throwing a ParseError
### What problem does this PR solve?
Adjust highlight parsing, add row-count SQL override, tweak retrieval
thresholding, and update tests with engine-aware skips/utilities.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix "metadata table not exists" when updating a meta data.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
As title.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
Co-authored-by: Liu An <asiro@qq.com>
## Description
This PR focuses on API performance optimization and refining the model
capability detection logic in the Agent/Canvas module.
### 1. Performance Optimization (Backend)
- **Changes**: Removed `cls.model.dsl` from query fields in
`UserCanvasService.get_by_tenant_ids`.
- **Reasoning**: The `dsl` object is large and unnecessary for the Agent
list view. Excluding it reduces the payload size of the
`/v1/canvas/list` API, leading to faster serialization and reduced
network latency.
- **Consistency**: Full DSL data remains accessible via the individual
`/v1/canvas/get/<id>` endpoint used in the detail view.
### 2. Multimodal Detection Refinement (Frontend)
- **Changes**: Replaced `model_type === LlmModelType.Image2text` with
`tags?.includes('IMAGE2TEXT')`.
- **Reasoning**: In RAGFlow, `model_type` defines the primary role of a
model (e.g., `chat`). However, many advanced Chat models are also
vision-capable. Since `model_type` is a single-value field, it cannot
represent these multiple capabilities.
- **Solution**: Utilizing the `tags` field (which supports multiple
attributes) to check for `IMAGE2TEXT` ensures that models like
`gpt-5.2-pro` correctly display multimodal input options.
## Type of Change
- [x] Bug fix (logic correction for multimodal detection)
- [x] Optimization (performance improvement for list API)
## Main Changes
- `api/db/services/canvas_service.py`: Optimized DB query by excluding
heavy DSL fields.
- `web/src/pages/agent/form/agent-form/index.tsx`: Enhanced capability
detection using the tags system.
## Verification
- [x] Verified Agent list loads faster with reduced response payload.
- [x] Confirmed that `chat` models with the `IMAGE2TEXT` tag now
correctly enable the multimodal input UI.
### What problem does this PR solve?
- Remove unused imports (Mock, patch, MagicMock, json, os,
RAGFLOW_COLUMNS, VECTOR_FIELD_PATTERN) from multiple files
- Replace f-string formatting with regular strings for console output
messages in cli.py
- Clean up unnecessary imports that were no longer being used in the
codebase
### Type of change
- [x] Refactoring
## Summary
This PR adds Peewee ORM support for OceanBase as the primary database in
RAGFlow, as requested in issue #12769.
## Changes
### Core Implementation
1. **RetryingPooledOceanBaseDatabase Class**
- Inherits from `PooledMySQLDatabase` (OceanBase is MySQL-compatible)
- Implements retry mechanism for connection issues
- Handles MySQL-specific error codes (2013, 2006 for connection loss)
- Provides connection pool management
2. **PooledDatabase Enum**
- Added `OCEANBASE = RetryingPooledOceanBaseDatabase`
3. **DatabaseLock Enum**
- Added `OCEANBASE = MysqlDatabaseLock`
- OceanBase uses MySQL-style locking
4. **TextFieldType Enum**
- Added `OCEANBASE = "LONGTEXT"`
- OceanBase uses same text field type as MySQL
5. **DatabaseMigrator Enum**
- Added `OCEANBASE = MySQLMigrator`
- OceanBase uses MySQL migration tools
### Usage
```bash
# Set environment variable to use OceanBase
export DB_TYPE=oceanbase
# Configure connection (in docker/.env or environment)
OCEANBASE_HOST=localhost
OCEANBASE_PORT=2881
OCEANBASE_USER=root
OCEANBASE_PASSWORD=password
OCEANBASE_DATABASE=ragflow
```
### Technical Details
- **Location**: `api/db/db_models.py`
- **Dependencies**: No new dependencies (uses existing Peewee MySQL
support)
- **Code Size**: ~90 lines
- **Difficulty**: Simple
### Testing
- Added comprehensive unit tests in
`tests/unit/test_oceanbase_peewee.py`
- Tests cover:
- OceanBase database class existence and inheritance
- Enum values for PooledDatabase, DatabaseLock, TextFieldType
- Initialization with custom retry settings
- Environment variable configuration
### Acceptance Criteria
✅ Can switch to OceanBase database via `DB_TYPE=oceanbase` environment
variable
✅ All database operations work normally in OceanBase environment
✅ OceanBase uses MySQL compatibility mode (no additional dependencies)
### Background
This is part of the RAGFlow + OceanBase Hackathon to allow users to
choose OceanBase as RAGFlow's primary database, leveraging OceanBase's
high availability and scalability.
---
## Related Issues
- **Primary**: https://github.com/infiniflow/ragflow/issues/12769
- **Context**: https://github.com/oceanbase/seekdb/issues/123 (OceanBase
Developer Challenge)
---
Closesinfiniflow/ragflow#12769
### What problem does this PR solve?
close#12770
This PR adds OceanBase as a storage backend for the Table Parser. It
enables dynamic table schema storage via JSON and implements OceanBase
SQL execution for text-to-SQL retrieval.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### Changes
- Table Parser stores row data into `chunk_data` when doc engine is
OceanBase. (table.py)
- OceanBase table schema adds `chunk_data` JSON column and migrates if
needed.
- Implemented OceanBase `sql()` to execute text-to-SQL results.
(ob_conn.py)
- Add `DOC_ENGINE_OCEANBASE` flag for engine detection (setting.py)
### Test
1. Set `DOC_ENGINE=oceanbase` (e.g. in `docker/.env`)
<img width="1290" height="783" alt="doc_engine_ob"
src="https://github.com/user-attachments/assets/7d1c609f-7bf2-4b2e-b4cc-4243e72ad4f1"
/>
2. Upload an Excel file to Knowledge Base.(for test, we use as below)
<img width="786" height="930" alt="excel"
src="https://github.com/user-attachments/assets/bedf82f2-cd00-426b-8f4d-6978a151231a"
/>
3. Choose **Table** as parsing method.
<img width="2550" height="1134" alt="parse_excel"
src="https://github.com/user-attachments/assets/aba11769-02be-4905-97e1-e24485e24cd0"
/>
4.Ask a natural language query in chat.
<img width="2550" height="1134" alt="query"
src="https://github.com/user-attachments/assets/26a910a6-e503-4ac7-b66a-f5754bbb0e91"
/>
### What problem does this PR solve?
test_update_document.py failed as metadata is not included in the
response of get_list(), fix the issue.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Put document metadata in ES/Infinity.
Index name of meta data: ragflow_doc_meta_{tenant_id}
### Type of change
- [x] Refactoring
### What problem does this PR solve?
When deleting the knowledge base, the records in the Document and
Knowledgebase tables are immediately deleted
But there are still a large number of pending task messages in the Redis
queue (asynchronous queue) if you did not click on stopping tasks before
deleting knowledge base.
TaskService.get_task() uses a JOIN query to associate three tables (Task
← Document ← Knowledgebase)
Since Document/Knowledgebase have been deleted, the JOIN returns an
empty result, even though the Task records still exist
task-executor considers the task does not exist ("collect task xxx is
unknown"), can only skip and warn
log:2026-01-23 16:43:21,716 WARNING 1190179 collect task
110fbf70f5bd11f0945a23b0930487df is unknown
2026-01-23 16:43:21,818 WARNING 1190179 collect task
11146bc4f5bd11f0945a23b0930487df is unknown
2026-01-23 16:43:21,918 WARNING 1190179 collect task
111c3336f5bd11f0945a23b0930487df is unknown
2026-01-23 16:43:22,021 WARNING 1190179 collect task
112471b8f5bd11f0945a23b0930487df is unknown
2026-01-23 16:43:26,719 WARNING 1190179 collect task
112e855ef5bd11f0945a23b0930487df is unknown
2026-01-23 16:43:26,734 WARNING 1190179 collect task
1134380af5bd11f0945a23b0930487df is unknown
2026-01-23 16:43:26,834 WARNING 1190179 collect task
1138cb2cf5bd11f0945a23b0930487df is unknown
As a consequence, a large number of such tasks occupy the queue
processing capacity, causing new tasks to queue and wait
<img width="1910" height="947"
alt="9a00f2e0-9112-4dbb-b357-7f66b8eb5acf"
src="https://github.com/user-attachments/assets/0e1227c2-a2df-4ef3-ba8f-e04c3f6ef0e1"
/>
Solution
Add logic to stop all ongoing tasks before deleting the knowledge base
and Tasks
### Type of change
- Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Add tokenized content es field to query zh message.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
This PR adds missing web API tests (system, search, KB, LLM, plugin,
connector). It also addresses a contract mismatch that was causing test
failures: metadata updates did not persist new keys (update‑only
behavior).
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [x] Other (please describe): Test coverage expansion and test helper
instrumentation
### What problem does this PR solve?
1) Create dataset using table parser for infinity
2) Answer questions in chat using SQL
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
## Problem
When database connection is lost, the reconnection logic had a bug: if
the first reconnect attempt failed, the second attempt was not wrapped
in error handling, causing unhandled exceptions.
## Solution
Added proper try-except blocks around the second reconnect attempt in
both MySQL and PostgreSQL database classes to ensure errors are properly
logged and handled.
## Changes
- Fixed `_handle_connection_loss()` in `RetryingPooledMySQLDatabase`
- Fixed `_handle_connection_loss()` in
`RetryingPooledPostgresqlDatabase`
Fixes#12294
---
Contribution by Gittensor, see my contribution statistics at
https://gittensor.io/miners/details?githubId=158349177
Co-authored-by: SID <158349177+0xsid0703@users.noreply.github.com>
### What problem does this PR solve?
Fixes web API behavior mismatches that caused test failures by
normalizing error responses, tightening validations, correcting error
messages, and closing upload file handles.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Summary
Fixes#12520 - Deleted chunks should not appear in retrieval/reference
results.
## Changes
### Core Fix
- **api/apps/chunk_app.py**: Include \doc_id\ in delete condition to
properly scope the delete operation
### Improved Error Handling
- **api/db/services/document_service.py**: Better separation of concerns
with individual try-catch blocks and proper logging for each cleanup
operation
### Doc Store Updates
- **rag/utils/es_conn.py**: Updated delete query construction to support
compound conditions
- **rag/utils/opensearch_conn.py**: Same updates for OpenSearch
compatibility
### Tests
- **test/testcases/.../test_retrieval_chunks.py**: Added
\TestDeletedChunksNotRetrievable\ class with regression tests
- **test/unit/test_delete_query_construction.py**: Unit tests for delete
query construction
## Testing
- Added regression tests that verify deleted chunks are not returned by
retrieval API
- Tests cover single chunk deletion and batch deletion scenarios
### What problem does this PR solve?
Feat: Hash doc id to avoid duplicate name.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
## Description
Fixes connection error handling when langfuse service is unavailable.
The application now gracefully handles connection failures instead of
crashing.
## Changes
- Wrapped `langfuse.auth_check()` calls in try-except blocks in:
- `api/db/services/dialog_service.py`
- `api/db/services/tenant_llm_service.py`
## Problem
When langfuse service is unavailable or connection is refused,
`langfuse.auth_check()` throws `httpx.ConnectError: [Errno 111]
Connection refused`, causing the application to crash during document
parsing or dialog operations.
## Solution
Added try-except blocks around `langfuse.auth_check()` calls to catch
connection errors and gracefully skip langfuse tracing instead of
crashing. The application continues functioning normally even when
langfuse is unavailable.
## Related Issue
Fixes#12621
---
Contribution by Gittensor, see my contribution statistics at
https://gittensor.io/miners/details?githubId=158349177
### What problem does this PR solve?
This PR eliminates unnecessary debug print statements that were left in
hot paths of the codebase.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Add PaddleOCR as a new PDF parser.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
1. Fix redundant column adding
2. Refactor the code
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
change:
Enhance delta streaming in chat functions for improved reasoning and
content handling
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Feat: support context window for docx
#12303
Done:
- [x] naive.py
- [x] one.py
TODO:
- [ ] book.py
- [ ] manual.py
Fix: incorrect image position
Fix: incorrect chunk type tag
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add task status for raw message, and move extract message as a nested
property under raw message
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Refactor setting type
### Type of change
- [x] Refactoring
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
#12409
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Signed-off-by: Jin Hai <haijin.chn@gmail.com>