5071 Commits

Author SHA1 Message Date
38f0a92da9 Use RAGFlow CLI to replace RAGFlow Admin CLI (#12653)
### What problem does this PR solve?

```
$ python admin/client/ragflow_cli.py -t user -u aaa@aaa.com -p 9380

ragflow> list datasets;
ragflow> list default models;
ragflow> show version;

```


### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
nightly
2026-01-17 17:52:38 +08:00
067ddcbf23 Docs: Added configure memory (#12665)
### What problem does this PR solve?

As title.

### Type of change

- [x] Documentation Update
2026-01-17 17:49:19 +08:00
46305ef35e Add User API Token Management to Admin API and CLI (#12595)
## Summary

This PR extends the RAGFlow Admin API and CLI with comprehensive user
API token management capabilities. Administrators can now generate,
list, and delete API tokens for users through both the REST API and the
Admin CLI interface.

## Changes

### Backend API (`admin/server/`)

#### New Endpoints
- **POST `/api/v1/admin/users/<username>/new_token`** - Generate a new
API token for a user
- **GET `/api/v1/admin/users/<username>/token_list`** - List all API
tokens for a user
- **DELETE `/api/v1/admin/users/<username>/token/<token>`** - Delete a
specific API token for a user

#### Service Layer Updates (`services.py`)
- Added `get_user_api_key(username)` - Retrieves all API tokens for a
user
- Added `save_api_token(api_token)` - Saves a new API token to the
database
- Added `delete_api_token(username, token)` - Deletes an API token for a
user

### Admin CLI (`admin/client/`)

#### New Commands
- **`GENERATE TOKEN FOR USER <username>;`** - Generate a new API token
for the specified user
- **`LIST TOKENS OF <username>;`** - List all API tokens associated with
a user
- **`DROP TOKEN <token> OF <username>;`** - Delete a specific API token
for a user

### Testing

Added comprehensive test suite in `test/testcases/test_admin_api/`:
- **`test_generate_user_api_key.py`** - Tests for API token generation
- **`test_get_user_api_key.py`** - Tests for listing user API tokens
- **`test_delete_user_api_key.py`** - Tests for deleting API tokens
- **`conftest.py`** - Shared test fixtures and utilities

## Technical Details

### Token Generation
- Tokens are generated using `generate_confirmation_token()` utility
- Each token includes metadata: `tenant_id`, `token`, `beta`,
`create_time`, `create_date`
- Tokens are associated with user tenants automatically

### Security Considerations
- All endpoints require admin authentication (`@check_admin_auth`)
- Tokens are URL-encoded when passed in DELETE requests to handle
special characters
- Proper error handling for unauthorized access and missing resources

### API Response Format
All endpoints follow the standard RAGFlow response format:
```json
{
  "code": 0,
  "data": {...},
  "message": "Success message"
}
```

## Files Changed

- `admin/client/admin_client.py` - CLI token management commands
- `admin/server/routes.py` - New API endpoints
- `admin/server/services.py` - Token management service methods
- `docs/guides/admin/admin_cli.md` - CLI documentation updates
- `test/testcases/test_admin_api/conftest.py` - Test fixtures
- `test/testcases/test_admin_api/test_user_api_key_management/*` - Test
suites

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Alexander Strasser <alexander.strasser@ondewo.com>
Co-authored-by: Hetavi Shah <your.email@example.com>
2026-01-17 15:21:00 +08:00
bd9163904a fix(ob_conn): ignore duplicate errors when executing 'create_idx' (#12661)
### What problem does this PR solve?

Skip duplicate errors to avoid 'create_idx' failures caused by slow
metadata refresh or external modifications.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-16 20:46:37 +08:00
b6d7733058 Feat: metadata settings in KB. (#12662)
### What problem does this PR solve?

#11910

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-01-16 20:14:02 +08:00
4f036a881d Fix: Infinity keyword round-trip, highlight fallback, and KB update guards (#12660)
### What problem does this PR solve?

Fixes Infinity-specific API regressions: preserves ```important_kwd```
round‑trip for ```[""]```, restores required highlight key in retrieval
responses, and enforces Infinity guards for unsupported
```parser_id=tag``` and pagerank in ```/v1/kb/update```. Also removes a
slow/buggy pandas row-wise apply that was throwing ```ValueError``` and
causing flakiness.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-16 20:03:52 +08:00
59075a0b58 Fix : p3 level sdk test error for update chat (#12654)
### What problem does this PR solve?

fix for update chat failing

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-16 17:47:12 +08:00
30bd25716b Fix PDF Generator output variables not appearing in subsequent agent steps (#12619)
This commit fixes multiple issues preventing PDF Generator (Docs
Generator) output variables from being visible in the Output section and
available to downstream nodes.

### What problem does this PR solve?

Issues Fixed:
1. PDF Generator nodes initialized with empty object instead of proper
initial values
2. Output structure mismatch (had 'value' property that system doesn't
expect)
3. Missing 'download' output in form schema
4. Output list computed from static values instead of form state
5. Added null/undefined guard to transferOutputs function

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
Changes:
- web/src/pages/agent/constant/index.tsx: Fixed output structure in
initialPDFGeneratorValues
- web/src/pages/agent/hooks/use-add-node.ts: Initialize PDF Generator
with proper values
- web/src/pages/agent/form/pdf-generator-form/index.tsx: Fixed schema
and use form.watch
- web/src/pages/agent/form/components/output.tsx: Added null guard and
spacing
2026-01-16 16:50:53 +08:00
99dae3c64c Fix: In the agent loop, if the await response is selected as the variable, the operator cannot be selected. #12656 (#12657)
### What problem does this PR solve?

Fix: In the agent loop, if the await response is selected as the
variable, the operator cannot be selected. #12656

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-16 16:49:48 +08:00
045314a1aa Fix: duplicate content in chunk (#12655)
### What problem does this PR solve?

Fix: duplicate content in chunk #12336

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-16 15:32:04 +08:00
2b20d0b3bb Fix : Web API tests by normalizing errors, validation, and uploads (#12620)
### What problem does this PR solve?

Fixes web API behavior mismatches that caused test failures by
normalizing error responses, tightening validations, correcting error
messages, and closing upload file handles.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-16 11:09:22 +08:00
59f4c51222 fix(entrypoint): Preserve $ in passwords during template expansion (#12509)
### What problem does this PR solve?

Fix shell variable expansion to preserve $ in password defaults when 
env vars are unset. Fixes Azure RDS auto-rotated passwords (that contain
$) being
truncated during template processing.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-15 19:30:33 +08:00
8c1fbfb130 Fix:Some bugs (#12648)
### What problem does this PR solve?

Fix: Modified and optimized the metadata condition card component.
Fix: Use startOfDay and endOfDay to ensure the date range includes a
full day.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-15 19:28:22 +08:00
cec06bfb5d Fix: empty chunk issue. (#12638)
#12570

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-15 17:46:21 +08:00
2167e3a3c0 Docs: Added share memory (#12647)
### Type of change

- [x] Documentation Update
2026-01-15 17:21:36 +08:00
2ea8dddef6 fix(infinity): Use comma separator for important_kwd to preserve mult… (#12618)
## Problem

The \`important_kwd\` field in Infinity connector was using mismatched
separators:
- **Storage**: \`list2str(v)\` uses space as default separator
- **Reading**: \`v.split()\` splits by all whitespace

This causes multi-word keywords like \`\"Senior Fund Manager\"\` to be
incorrectly split into \`[\"Senior\", \"Fund\", \"Manager\"]\`.

## Solution

Use comma \`,\` as separator for both storing and reading, consistent
with:
1. The LLM output format in \`keyword_prompt.md\` (\"delimited by
ENGLISH COMMA\")
2. The \`cached.split(\",\")\` in \`task_executor.py\`

## Changes

- \`insert()\`: \`list2str(v)\` → \`list2str(v, \",\")\`
- \`update()\`: \`list2str(v)\` → \`list2str(v, \",\")\`
- \`get_fields()\`: \`v.split()\` → \`v.split(\",\") if v else []\`

## Impact

This bug affects:
- Python-level reranking weight calculation (\`important_kwd * 5\`)
- API response keyword display
- Search precision due to fragmented keywords
2026-01-15 15:32:40 +08:00
18867daba7 chore: bump pyobvector from 0.2.18 to 0.2.22 (#12640)
### What problem does this PR solve?

Update ob client

### Type of change

- [x] Other (please describe):dependency upgrade
2026-01-15 15:21:34 +08:00
d68176326d feat: add oceanbase mount to gitignore (#12642)
### What problem does this PR solve?

feat: add oceanbase mount to .gitignore

### Type of change

- [x] Refactoring
2026-01-15 15:20:40 +08:00
d531bd4f1a Fix: Editing the agent greeting causes the greeting to be continuously added to the message list. #12635 (#12636)
### What problem does this PR solve?

Fix: Editing the agent greeting causes the greeting to be continuously
added to the message list. #12635
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-15 14:55:19 +08:00
ac936005e6 fix: ensure deleted chunks are not returned in retrieval (#12520) (#12546)
## Summary
Fixes #12520 - Deleted chunks should not appear in retrieval/reference
results.

## Changes

### Core Fix
- **api/apps/chunk_app.py**: Include \doc_id\ in delete condition to
properly scope the delete operation

### Improved Error Handling
- **api/db/services/document_service.py**: Better separation of concerns
with individual try-catch blocks and proper logging for each cleanup
operation

### Doc Store Updates
- **rag/utils/es_conn.py**: Updated delete query construction to support
compound conditions
- **rag/utils/opensearch_conn.py**: Same updates for OpenSearch
compatibility

### Tests
- **test/testcases/.../test_retrieval_chunks.py**: Added
\TestDeletedChunksNotRetrievable\ class with regression tests
- **test/unit/test_delete_query_construction.py**: Unit tests for delete
query construction

## Testing
- Added regression tests that verify deleted chunks are not returned by
retrieval API
- Tests cover single chunk deletion and batch deletion scenarios
2026-01-15 14:45:55 +08:00
d8192f8f17 Fix: validate regex pattern in split_with_pattern to prevent crash (#12633)
### What problem does this PR solve?

Fix regex pattern validation in split_with_pattern (#12605)

- Add try-except block to validate user-provided regex patterns before
use
- Gracefully fallback to single chunk when invalid regex is provided
- Prevent server crash during DOCX parsing with malformed delimiters

## Problem

Parsing DOCX files with custom regex delimiters crashes with `re.error:
nothing to repeat at position 9` when users provide invalid regex
patterns.

Closes #12605 

## Solution

Validate and compile regex pattern before use. On invalid pattern, log
warning and return content as single chunk instead of crashing.

## Changes

- `rag/nlp/__init__.py`: Add regex validation in `split_with_pattern()`
function

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Contribution by Gittensor, see my contribution statistics at
https://gittensor.io/miners/details?githubId=42954461
2026-01-15 14:24:51 +08:00
eb35e2b89f Fix: async invocation isssue. (#12634)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-15 14:22:16 +08:00
97b983fd0b fix: add fallback parser list for empty parser_ids (#12632)
### What problem does this PR solve?

Fixes #12570 - The slicing method dropdown was empty when deploying
RAGFlow v0.23.1 from source code.

The issue occurred because `parser_ids` from the tenant info was empty
or undefined, causing `useSelectParserList` to return an empty array.
This PR adds a fallback to a default parser list when `parser_ids` is
empty, ensuring the dropdown always has options.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---
Contribution by Gittensor, see my contribution statistics at
https://gittensor.io/miners/details?githubId=94194147
2026-01-15 14:05:25 +08:00
b40a7b2e7d Feat: Hash doc id to avoid duplicate name. (#12573)
### What problem does this PR solve?

Feat: Hash doc id to avoid duplicate name. 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-01-15 14:02:15 +08:00
9a10558f80 Refa: async retrieval process. (#12629)
### Type of change

- [x] Refactoring
- [x] Performance Improvement
2026-01-15 12:28:49 +08:00
SID
f82628c40c Fix: langfuse connection error handling #12621 (#12626)
## Description

Fixes connection error handling when langfuse service is unavailable.
The application now gracefully handles connection failures instead of
crashing.

## Changes

- Wrapped `langfuse.auth_check()` calls in try-except blocks in:
  - `api/db/services/dialog_service.py`
  - `api/db/services/tenant_llm_service.py`

## Problem

When langfuse service is unavailable or connection is refused,
`langfuse.auth_check()` throws `httpx.ConnectError: [Errno 111]
Connection refused`, causing the application to crash during document
parsing or dialog operations.

## Solution

Added try-except blocks around `langfuse.auth_check()` calls to catch
connection errors and gracefully skip langfuse tracing instead of
crashing. The application continues functioning normally even when
langfuse is unavailable.

## Related Issue

Fixes #12621

---

Contribution by Gittensor, see my contribution statistics at
https://gittensor.io/miners/details?githubId=158349177
2026-01-15 11:23:15 +08:00
7af98328f5 Fix: the styles of the multi-select component and the filter pop-up. (#12628)
### What problem does this PR solve?

Fix: Fix the styles of the multi-select component and the filter pop-up.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-15 10:53:18 +08:00
678a4f959c Fix: skip internal bookmark references in DOCX parsing (#12604) (#12611)
### What problem does this PR solve?

Fixes #12604 - DOCX files containing hyperlinks to internal bookmarks
(e.g., `#_文档目录`) cause a `KeyError` during parsing:

```
KeyError: "There is no item named 'word/#_文档目录' in the archive"
```

This happens because python-docx incorrectly tries to read internal
bookmark references as files from the ZIP archive. Internal bookmarks
are relationship targets starting with `#` and are not actual files.

This PR extends the existing `load_from_xml_v2` workaround (which
already handles `NULL` targets) to also skip relationship targets
starting with `#`.

Related upstream issue:
https://github.com/python-openxml/python-docx/issues/902

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---
Contribution by Gittensor, see my contribution statistics at
https://gittensor.io/miners/details?githubId=94194147
2026-01-14 19:08:46 +08:00
15a8bb2e9c Fix: chunk list async issue. (#12615)
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-14 17:32:07 +08:00
b091ff2730 Fix enable_thinking parameter for Qwen3 models (#12603)
### Issue

When using Qwen3 models (`qwen3-32b`, `qwen3-max`) through the
Tongyi-Qianwen provider for non-streaming calls (e.g., knowledge graph
generation), the API fails with:

Closes #12424

```
parameter.enable_thinking must be set to false for non-streaming calls
```

### Root Cause

In `LiteLLMBase.async_chat()`, the `extra_body={"enable_thinking":
False}` was set in `kwargs` but never forwarded to
`_construct_completion_args()`.

### What problem does this PR solve?

Pass merged kwargs to `_construct_completion_args()` using
`**{**gen_conf, **kwargs}` to safely handle potential duplicate
parameters.

### Changes

- `rag/llm/chat_model.py`: Forward kwargs containing `extra_body` to
`_construct_completion_args()` in `async_chat()`


_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Contribution by Gittensor, see my contribution statistics at
https://gittensor.io/miners/details?githubId=42954461
2026-01-14 16:35:46 +08:00
5b22f94502 Feat: Benchmark CLI additions and documentation (#12536)
### What problem does this PR solve?

This PR adds a dedicated HTTP benchmark CLI for RAGFlow chat and
retrieval endpoints so we can measure latency/QPS.

### Type of change

- [x] Documentation Update
- [x] Other (please describe): Adds a CLI benchmarking tool for
chat/retrieval latency/QPS

---------

Co-authored-by: Liu An <asiro@qq.com>
2026-01-14 13:49:16 +08:00
a7671583b3 Feat: add CN regions for AWS (#12610)
### What problem does this PR solve?

Add CN regions for AWS.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-01-14 12:34:55 +08:00
d32fa02d97 Fix: Unable to copy category node. #12607 (#12609)
### What problem does this PR solve?

Fix: Unable to copy category node. #12607

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-14 11:45:31 +08:00
f72a35188d refactor: remove debug print statements (#12598)
### What problem does this PR solve?

This PR eliminates unnecessary debug print statements that were left in
hot paths of the codebase.

### Type of change

- [x] Refactoring
2026-01-14 10:05:34 +08:00
ea619dba3b Added to the HTTP API test suite (#12556)
### What problem does this PR solve?

This PR adds missing HTTP API test coverage for dataset
graph/GraphRAG/RAPTOR tasks, metadata summary, chat completions, agent
sessions/completions, and related questions. It also introduces minimal
HTTP test helpers to exercise these endpoints consistently with the
existing suite.

### Type of change

- [x]  Other (please describe): Test coverage (HTTP API tests)

---------

Co-authored-by: Liu An <asiro@qq.com>
2026-01-14 10:02:30 +08:00
36b0835740 Docs: Use memory (#12599)
### What problem does this PR solve?


### Type of change


- [x] Documentation Update
2026-01-14 09:40:31 +08:00
0795616b34 Align p3 HTTP/SDK tests with current backend behavior (#12563)
### What problem does this PR solve?

Updates pre-existing HTTP API and SDK tests to align with current
backend behavior (validation errors, 404s, and schema defaults). This
ensures p3 regression coverage is accurate without changing production
code.

### Type of change

- [x] Other (please describe): align p3 HTTP/SDK tests with current
backend behavior

---------

Co-authored-by: Liu An <asiro@qq.com>
2026-01-13 19:22:47 +08:00
941651a16f Fix: wrong input trace in Category component (#12590)
### What problem does this PR solve?

Wrong input trace in Category component

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-13 17:54:57 +08:00
360114ed42 fix(ob_conn): avoid reusing SQLAlchemy Column objects in DDL (#12588)
### What problem does this PR solve?

When there are multiple users, parsing a document for a new user can
trigger the reuse of column objects, leading to the error
`sqlalchemy.exc.ArgumentError: Column object 'id' already assigned to
Table xxx`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-13 17:39:20 +08:00
ffedb2c6d3 Feat: The MetadataFilterConditions component supports adding values ​​via search. (#12585)
### What problem does this PR solve?

Feat: The MetadataFilterConditions component supports adding values
​​via search.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-01-13 17:03:25 +08:00
947e63ca14 Fixed typos and added pptx preview for frontend (#12577)
### What problem does this PR solve?
Previously, we added support for previewing PPT and PPTX files in the
backend. Now, we are adding it to the frontend, so when the slides in
the chat interface are referenced, they will no longer be blank.
### Type of change

- Bug Fix (non-breaking change which fixes an issue)
2026-01-13 17:02:36 +08:00
34d74d9928 fix: add uv-aarch64-unknown-linux-gnu.tar.gz to deps image (#12516)
### What problem does this PR solve?

Add uv-aarch64-unknown-linux-gnu.tar.gz to support building ARM64 Docker
images.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Liu An <asiro@qq.com>
2026-01-13 15:37:32 +08:00
accae95126 Feat: Exported Agent JSON Should Include Conversation Variables Configuration #11796 (#12579)
### What problem does this PR solve?

Feat: Exported Agent JSON Should Include Conversation Variables
Configuration #11796

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-01-13 15:35:45 +08:00
68e5c86e9c Fix: image not displaying thumbnails when using pipeline (#12574)
### What problem does this PR solve?

Fix image not displaying thumbnails when using pipeline.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-13 12:54:13 +08:00
64c75d558e Fix: zip extraction vulnerabilities in MinerU and TCADP (#12527)
### What problem does this PR solve?

Fix zip extraction vulnerabilities:
   - Block symlink entries in zip files.
   - Reject encrypted zip entries.
   - Prevent absolute path attacks (including Windows paths).
   - Block path traversal attempts (../).
   - Stop zip slip exploits (directory escape).
   - Use streaming for memory-safe file handling.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-01-13 12:24:50 +08:00
41c84fd78f Add MIME types for PPT and PPTX files (#12562)
Otherwise, slide files cannot be opened in Chat module

### What problem does this PR solve?

Backend Reason (API): In the api/utils/web_utils.py file of the backend,
the CONTENT_TYPE_MAP dictionary is missing ppt and pptx.
MIME type mapping. This means that when the frontend requests a PPTX
file, the backend cannot correctly inform the browser that it is a PPTX
file, resulting in the file being displayed incorrectly.
Type identification error.

### Type of change

-  Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-01-13 12:17:49 +08:00
d76912ab15 Fix: Use uv pip install for Docling installation (#12567)
Fixes #12440
### What problem does this PR solve?
The current implementation uses `python3 -m pip` which can fail in
certain environments. This change leverages `uv pip install` instead,
which aligns with the project's existing tooling.

### Type of change
- Removed the ensurepip line (not needed since uv manages pip)
- Changed python3 to "$PY" for consistency with the rest of the script
- Changed python3 -m pip install to uv pip install

Co-authored-by: Gongzi <gongzi@192.168.0.100>
2026-01-13 11:48:42 +08:00
4fe3c24198 feat: PaddleOCR PDF parser supports thumnails and positions (#12565)
### What problem does this PR solve?

1. PaddleOCR PDF parser supports thumnails and positions.
2. Add FAQ documentation for PaddleOCR PDF parser.


### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-01-13 09:51:08 +08:00
44bada64c9 Feat: support tree structured deep-research policy. (#12559)
### What problem does this PR solve?

#12558
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-01-13 09:41:35 +08:00
867ec94258 revert white-space changes in docs (#12557)
### What problem does this PR solve?

Trailing white-spaces in commit 6814ace1aa
got automatically trimmed by code editor may causes documentation
typesetting broken.

Mostly for double spaces for soft line breaks.  

### Type of change

- [x] Documentation Update
2026-01-13 09:41:02 +08:00