Commit Graph

5895 Commits

Author SHA1 Message Date
f3b7d55a1e fix: handle Infinity table-not-exist error (3022) in update() methods (#14153)
### What problem does this PR solve?

## Summary

Closes #6102

When using Infinity as the document store engine (GPU version), calling
`update()` on a non-existent table throws an unhandled
`InfinityException` with error code 3022 (`TABLE_NOT_EXIST`). This
causes users to see a raw "3022" error when clicking on a parsed
document.

## Root Cause

The `update()` methods in both `rag/utils/infinity_conn.py` and
`memory/utils/infinity_conn.py` call `db_instance.get_table(table_name)`
without catching `InfinityException`. In contrast, other CRUD methods
(`insert`, `delete`, `search`) all handle this exception gracefully:

| Method   | Handles table-not-exist? | Behavior |
|----------|--------------------------|----------|
| `insert`  |  Yes | Auto-creates the table |
| `search`  |  Yes | Skips the table |
| `delete`  |  Yes | Returns 0 |
| `update`  |  **No** | Crashes with 3022 |

Additionally, `api/apps/document_app.py` worked around this with a
fragile string match (`"3022" in msg`) to detect the error.

## Changes

- **`rag/utils/infinity_conn.py`**: Catch `InfinityException` in
`update()`. When `TABLE_NOT_EXIST` is detected, log a warning and return
`False` — consistent with `delete()`.
- **`memory/utils/infinity_conn.py`**: Apply the same fix to its
`update()` method.
- **`api/apps/document_app.py`**: Remove the fragile `"3022"`
string-matching workaround. Table-not-exist is now handled by the `if
not ok` path with an improved error message.

### Type of change

- [x] Refactoring

---------

Signed-off-by: noob <yixiao121314@outlook.com>
2026-04-27 11:52:22 +08:00
33bb464ce3 fix: skip canvas SSE fetch in chat shared page to eliminate spurious 103 error (#14190)
## What does this PR do?

Fixes the `hint : 103 Only owner of canvas authorized for this
operation` error that appears when opening a **Chat** shared link
(`/chats/share?shared_id=...&from=chat`).

## Root Cause

The Chat shared page (`web/src/pages/next-chats/share/index.tsx`)
unconditionally calls `useFetchFlowSSE()`, which requests
`/api/canvas/getsse/{sharedId}`. This is an Agent Canvas endpoint that
validates canvas ownership. When sharing a **Chat** dialog (not an
Agent):

1. `sharedId` is a `dialog_id`, not a `canvas_id`
2. The API token's `tenant_id` doesn't match any canvas owner
3. The backend returns `code: 103, message: "Only owner of canvas
authorized for this operation."`
4. The global error interceptor in `request.ts` displays it as a
notification: `hint : 103 Only owner of canvas authorized for this
operation.`

## Changes

- **`web/src/hooks/use-agent-request.ts`**: Added an `enabled` parameter
to `useFetchFlowSSE` so callers can conditionally skip the query.
- **`web/src/pages/next-chats/share/index.tsx`**: Only enable
`useFetchFlowSSE` when `from === SharedFrom.Agent`. For Chat shares, the
hook is disabled, avoiding the unnecessary canvas API call entirely.

## Related Issue

Closes #14115

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: noob <yixiao121314@outlook.com>
2026-04-27 11:27:39 +08:00
3ad3241ae0 feat: persist RAPTOR layer metadata on summary chunks (#13286)
## Summary

RAPTOR's recursive clustering builds a `layers` list tracking
`(start_idx, end_idx)` boundaries per level, but currently discards this
information — only the flat `chunks` list is returned. This makes it
impossible to distinguish leaf-level summaries from top-level ones.

This PR:
- Returns `(chunks, layers)` tuple from `raptor.py`'s `__call__`
- Annotates each RAPTOR summary chunk with `raptor_layer_int` (1 = first
summary level, 2 = summary-of-summaries, etc.)
- Adds `raptor_layer_int` to `infinity_mapping.json` (Elasticsearch
handles it via existing `*_int` dynamic template)

### Why this matters

Downstream features need to know which RAPTOR layer a summary belongs
to:
- **Retrieving the top-level document summary** for entity extraction,
search snippets, or document comparison
- **Filtering by abstraction level** — users may want only high-level
summaries or only leaf-level cluster summaries
- **RAPTOR recall quality** — #10951 reports summaries not being
recalled for definition queries; layer metadata enables targeted
retrieval

### Changes

| File | Change | LOC |
|------|--------|-----|
| `rag/raptor.py` | Return `(chunks, layers)` tuple | ~3 |
| `rag/svr/task_executor.py` | Build `chunk_layer` mapping, set
`raptor_layer_int` | ~12 |
| `conf/infinity_mapping.json` | Add `raptor_layer_int` integer field |
~1 |

### Backward compatibility

- **Additive only** — no existing fields or behavior changed
- Existing RAPTOR chunks continue to work (they'll have
`raptor_layer_int = 0` by default)
- New RAPTOR chunks get layer metadata automatically

## Test plan

- [ ] Parse a document with RAPTOR enabled, verify `raptor_layer_int` is
set on indexed chunks
- [ ] Verify `raptor_layer_int` values increase with abstraction level
(layer 1 < layer 2 < ...)
- [ ] Verify existing RAPTOR deletion (`delete by raptor_kwd`) still
works
- [ ] Verify Infinity backend accepts the new field

Fixes #7488
Related: #4104, #11191, #10951

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: yuch85 <yuch85.1@gmail.com>
Co-authored-by: Wang Qi <wangq8@outlook.com>
2026-04-27 10:20:46 +08:00
a9e5724b46 Refa: unify document create flows under REST documents API (#14345)
### What problem does this PR solve?

unify document create flows under REST documents API

### Type of change

- [x] Refactoring
2026-04-27 10:18:16 +08:00
4dcc42e0e1 feat(api): add unified index API and dataset management endpoints (#14222)
### What problem does this PR solve?

## Summary

Refactor the dataset API layer into a clean service/REST separation
pattern, add a unified `/index` API for graph/raptor/mindmap operations,
and introduce several new dataset management endpoints with full test
coverage.

## Changes

### Service Layer (`dataset_api_service.py`)

- Added `trace_index(dataset_id, tenant_id, index_type)` — unified trace
function for all index types
- Added `run_index`, `delete_index` service functions
- Added `get_dataset`, `get_ingestion_summary`, `list_ingestion_logs`,
`get_ingestion_log`
- Added `run_embedding`, `list_tags`, `aggregate_tags`, `delete_tags`,
`rename_tag`
- Added `get_flattened_metadata`, `get_auto_metadata`,
`update_auto_metadata`

### REST API Layer (`dataset_api.py`)

**New unified routes:**

| Method | Route | Description |
|--------|-------|-------------|
| POST | `/datasets/<id>/index?type=graph\|raptor\|mindmap` | Run index
task |
| GET | `/datasets/<id>/index?type=graph\|raptor\|mindmap` | Trace index
task |
| DELETE | `/datasets/<id>/<index_type>` | Delete index |
| GET | `/datasets/<id>` | Get dataset details |
| GET | `/datasets/<id>/ingestions/summary` | Ingestion summary |
| GET | `/datasets/<id>/ingestions` | List ingestion logs |
| GET | `/datasets/<id>/ingestions/<log_id>` | Get single ingestion log
|
| POST | `/datasets/<id>/embedding` | Run embedding |
| GET | `/datasets/<id>/tags` | List tags |
| GET | `/datasets/tags/aggregation` | Aggregate tags across datasets |
| DELETE | `/datasets/<id>/tags` | Delete tags |
| PUT | `/datasets/<id>/tags` | Rename tag |
| GET | `/datasets/metadata/flattened` | Get flattened metadata |
| GET/PUT | `/datasets/<id>/metadata/config` | New metadata config path
|

**Removed routes (replaced by unified `/index`):**

- `POST /datasets/<id>/mindmap`
- `GET /datasets/<id>/mindmap`

**Preserved legacy routes (backward compatibility):**

- `/run_graphrag`, `/trace_graphrag`, `/run_raptor`, `/trace_raptor`
- `/auto_metadata` GET/PUT

### Test Suite

- Updated `common.py` helpers: added `trace_index`, removed
`run_mindmap`/`trace_mindmap`
- Added 7 new test files with 39 test cases total:

| Test File | Cases |
|-----------|-------|
| `test_get_dataset.py` | 4 |
| `test_ingestion_summary.py` | 2 |
| `test_ingestion_logs.py` | 5 |
| `test_index_api.py` | 14 |
| `test_embedding.py` | 2 |
| `test_tags.py` | 8 |
| `test_flattened_metadata.py` | 4 |

- Deleted `test_mindmap_tasks.py` (covered by unified index tests)

## Design Decisions

1. **Unified `/index?type=...`** — single endpoint replaces 3 separate
route pairs for graph/raptor/mindmap
2. **Backward compatibility** — old routes (`/run_graphrag`,
`/run_raptor`, `/auto_metadata`) preserved alongside new paths
3. **`_VALID_INDEX_TYPES = {"graph", "raptor", "mindmap"}`** — input
validation via constant set
4. **`_INDEX_TYPE_TO_TASK_ID_FIELD`** — maps index type to KB model task
ID field for clean dispatch

## Files Changed

- `api/apps/restful_apis/dataset_api.py`
- `api/apps/services/dataset_api_service.py`
- `sdk/python/ragflow_sdk/modules/dataset.py`
- `test/testcases/test_http_api/common.py`
- `test/testcases/test_http_api/test_dataset_management/` (7 new files)
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring

---------

Signed-off-by: noob <yixiao121314@outlook.com>
2026-04-27 09:38:01 +08:00
fb95136f39 Fix: validate URL scheme and resolved IP before crawling to prevent SSRF (#14090)
### What problem does this PR solve?

The POST /upload_info?url=<url> endpoint accepted a user-supplied URL
and passed it directly to AsyncWebCrawler without any validation. There
were no restrictions on URL scheme, destination hostname, or resolved IP
address. This allowed any authenticated user to instruct the server to
make outbound HTTP requests to internal infrastructure — including RFC
1918 private networks, loopback addresses, and cloud metadata services
such as http://169.254.169.254 — effectively using the server as a proxy
for internal network reconnaissance or credential theft.

This PR adds an SSRF guard (_validate_url_for_crawl) that runs before
any crawl is initiated. It enforces an allowlist of safe schemes
(http/https), resolves the hostname at validation time, and rejects any
URL whose resolved IP falls within a private or reserved network range.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-25 14:30:15 +08:00
78188ce9e9 Feat: add OpenDataLoader PDF parser backend (#14058) (#14097)
### What problem does this PR solve?

Closes #14058.

RAGFlow supports multiple PDF parsing backends (DeepDOC, MinerU,
Docling, TCADP, PaddleOCR). This PR adds **OpenDataLoader**
([opendataloader-project/opendataloader-pdf](https://github.com/opendataloader-project/opendataloader-pdf))
as a new optional backend, giving users a deterministic, local-first
alternative with competitive table extraction accuracy.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update

---

### Changes

#### Backend
- `deepdoc/parser/opendataloader_parser.py` — new `OpenDataLoaderParser`
class inheriting `RAGFlowPdfParser`. Implements `check_installation()`
(guards Python package + Java 11+ runtime), `parse_pdf()` with
JSON-first extraction (heading/paragraph/table/list/image/formula) and
Markdown fallback, position-tag generation compatible with the shared
`@@page\tx0\tx1\ty0\ty1##` format, and temp-dir lifecycle with cleanup.
- `rag/app/naive.py` — new `by_opendataloader()` wrapper, registered in
`PARSERS` dict, added to `chunk_token_num=0` override list.
- `rag/flow/parser/parser.py` — `"opendataloader"` branch in the
pipeline PDF handler + check validation list.

#### Infrastructure
- `docker/entrypoint.sh` — `ensure_opendataloader()` function: opt-in
via `USE_OPENDATALOADER=true`, skips gracefully if Java is not on PATH.

#### Frontend
- `web/src/components/layout-recognize-form-field.tsx` —
`OpenDataLoader` added to `ParseDocumentType` enum and parser dropdown.
Cascades automatically to the pipeline editor's Parser component.

#### Docs
- `docs/guides/dataset/select_pdf_parser.md` — added OpenDataLoader
entry and full env-var reference.

---

### Environment variables

| Variable | Default | Description |
|---|---|---|
| `USE_OPENDATALOADER` | `false` | Set `true` to install
`opendataloader-pdf` on container startup |
| `OPENDATALOADER_VERSION` | latest | Pin the PyPI release (e.g.
`==2.2.1`) |
| `OPENDATALOADER_HYBRID` | _(unset)_ | Enable hybrid AI mode (e.g.
`docling-fast`) |
| `OPENDATALOADER_IMAGE_OUTPUT` | _(unset)_ | `off` / `embedded` /
`external` |
| `OPENDATALOADER_OUTPUT_DIR` | _(tmp)_ | Persistent output dir; temp
dir used + cleaned if unset |
| `OPENDATALOADER_DELETE_OUTPUT` | `1` | `0` to retain intermediate
files for debugging |
| `OPENDATALOADER_SANITIZE` | _(unset)_ | `1` to filter prompt-injection
patterns from output |

---

### Dependencies

- **Runtime**: `opendataloader-pdf` (PyPI, Apache 2.0) — opt-in, not
added to `pyproject.toml` core deps. Installed by
`ensure_opendataloader()` at container startup when
`USE_OPENDATALOADER=true`.
- **System**: Java 11+ on PATH (JVM is the underlying engine). The
installer skips with a warning if `java` is not found.

---

### How to test

**Standalone parser:**
```bash
source .venv/bin/activate
uv pip install opendataloader-pdf
python3 -c "
import sys; sys.path.insert(0, '.')
from deepdoc.parser.opendataloader_parser import OpenDataLoaderParser
p = OpenDataLoaderParser()
print('available:', p.check_installation())
s, t = p.parse_pdf('path/to/test.pdf', parse_method='pipeline')
print(f'sections={len(s)} tables={len(t)}')
"

```
### Benchmark vs Docling
```
file                      parser            secs  sections  tables
----------------------------------------------------------------------
text-heavy.pdf            docling           45.29       148      10
text-heavy.pdf            opendataloader     3.14       559       0
table-heavy.pdf           docling           7.05        76       3
table-heavy.pdf           opendataloader     3.71        90       0
complex.pdf               docling            42.67       114       8
complex.pdf               opendataloader     3.51       180       0
```
2026-04-25 00:33:02 +08:00
e22cf333ed Fix: allow search id or _id (#14356)
### What problem does this PR solve?

Allow search id or _id when using es as doc_engine.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-24 21:38:19 +08:00
25089600d0 Feat: introduce minimum type check for pipeline (#14354)
### What problem does this PR solve?

Feat: introduce minimum type check for pipeline

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-04-24 21:12:50 +08:00
1c244df90d Go: add gitee and siliconflow as model provider (#14336)
### What problem does this PR solve?

As title

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-24 20:59:30 +08:00
e5cfe7fb8f Doc: Updated a 0.25-specific faq (#14365)
### What problem does this PR solve?

Updated a 0.25 faq.

### Type of change


- [x] Documentation Update
2026-04-24 20:57:32 +08:00
7fb6a12067 Update API document (#14364)
### What problem does this PR solve?

Update API document

### Type of change

- [ ] Documentation Update
2026-04-24 20:36:47 +08:00
3ccd58f28c Fix: The button styles in the PaddleOCR dialog are not applying correctly. (#14350)
### What problem does this PR solve?

Fix: The button styles in the PaddleOCR dialog are not applying
correctly.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Co-authored-by: Copilot <copilot@github.com>
2026-04-24 20:17:01 +08:00
1870c934c6 Refact: Updated rootAsHeadingTip (#14363)
### What problem does this PR solve?

Updated rootASHeadingTip.

### Type of change

- [x] Documentation Update
2026-04-24 20:08:44 +08:00
ca01c7a745 Fix blob sync: skip unsupported files before download (#14357)
### What problem does this PR solve?

Blob storage sync was downloading unsupported files first and rejecting
them later, which wasted bandwidth and made sync slower. This PR skips
unsupported extensions before download and applies `allow_images` in
blob sync. fixes #14338

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-24 19:22:32 +08:00
620088be2f fix: check isinstance before len in VariableAssigner _remove_first/_remove_last (#14281)
fix: check isinstance before len in VariableAssigner _remove_first/_remove_last
2026-04-24 19:09:44 +08:00
eeb89d604e feat: route docling parsing through native chunking endpoints (#14218)
Resolves #14211

**Background:** Currently, RAGFlow routes all Docling parsing through
the standard `/convert/source` endpoint. For large documents, this
returns massive, unchunked text that exceeds RAGFlow's internal
embedding model context limits, causing pipeline failures.

**Solution:**
This PR updates the `_parse_pdf_remote` ingestion logic in
`docling_parser.py` to prioritize `docling-serve`'s native chunking
endpoints (`/v1/chunk/source` and `/v1alpha/chunk/source`).
- By receiving pre-sliced chunk objects directly from Docling, RAGFlow
natively bypasses token limit overflows.
- Included a graceful fallback mechanism to the standard
`/convert/source` endpoints to maintain backwards compatibility for
users running older versions of the Docling server that return 404s on
the new routes.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-04-24 19:03:19 +08:00
beb2406b86 Fix: allow use image2text as chat model (#14331)
### What problem does this PR solve?

Allow image2text models (multimodal) to be used as chat models.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-24 17:58:25 +08:00
9ad752f497 Refa:migrate agent webhook routes to REST APIs (#14330)
### What problem does this PR solve?

migrate agent webhook routes to REST APIs

### Type of change
- [x] Refactoring
2026-04-24 17:55:53 +08:00
b8d831c1c3 Fix api user patch verb does not work (#14358)
### What problem does this PR solve?

Fix api user patch verb does not work

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
2026-04-24 17:27:41 +08:00
8a2f63e77d docs: fix API key guide typo (#14352)
Fixes a small typo in the RAGFlow API key guide: `This documents
provides` -> `This document provides`.
2026-04-24 16:59:25 +08:00
1473000135 Implement retrieval_test in GO (#14231)
### What problem does this PR solve?

Implement retrieval_test in GO

### Type of change

- [x] Refactoring
2026-04-24 15:30:14 +08:00
aadd9a333f Feat: deepseek v4 (#14346)
### What problem does this PR solve?

Feat: deepseek v4
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-04-24 13:07:59 +08:00
199fbceb72 Refactor user REST API (#14334)
### What problem does this PR solve?
Refactor user REST API

### Type of change
- [x] Refactoring
2026-04-24 10:25:15 +08:00
c41b5e8a5d fix: migrate Langfuse integration from start_generation to start_obse… (#14205)
The Langfuse Python SDK v3+ removed `start_generation()` method.
RagFlow's code called this non-existent method, causing AttributeError
when Langfuse tracing is enabled.

Replace all `start_generation()` calls with
`start_observation(as_type="generation")` which is the correct v4 SDK
API.

Affected files:
- api/db/services/llm_service.py (12 occurrences)
- api/db/services/dialog_service.py (1 occurrence)

Fixes #14204
Related to #9243

### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 10:03:57 +08:00
c74aece63c Feat: Agent api (#14157)
### What problem does this PR solve?

1. **List agents**  
   **Prev API**:  
   - `/v1/canvas/list GET`  
   - `/api/v1/agents GET`  
   **Current API**: `/api/v2/agents GET`

2. **Get canvas template**  
   **Prev API**: `/v1/canvas/templates GET`  
   **Current API**: `/api/v2/agents/templates GET`

3. **Delete an agent**  
   **Prev API**: 
    - `/v1/canvas/rm POST`  
    - `/api/v1/agents/<agent_id> DELETE`
   **Current API**: `/api/v2/agents/<agent_id> DELETE`

4. **Update an agent**  
   **Prev API**: 
    - `/api/v1/agents/<agent_id> PUT`   
    - `/v1/canvas/setting POST `
   **Current API**: `/api/v2/agents/<agent_id> PATCH`


5. **Create an agent**  
   **Prev API**: 
    - `/v1/canvas/set POST`  
    - `/api/v1/agents POST`
   **Current API**: `/api/v2/agents POST`


6. **Get an agent**  
   **Prev API**: 
    - `/v1/canvas/get/<canvas_id> GET `  
   **Current API**: `/api/v2/agents/<agent_id> GET`


7. **Reset an agent**  
   **Prev API**: 
    - `/v1/canvas/reset POST`  
   **Current API**: `/api/v2/agents/<agent_id>/reset POST`


8. **Upload a file to an agent**  
   **Prev API**: 
    - `/v1/canvas/upload/<canvas_id> POST`  
   **Current API**: `/api/v2/agents/<agent_id>/upload POST`


9. **Input form**  
   **Prev API**: 
    - `/v1/canvas/input_form GET`  
**Current API**:
`/api/v2/agents/<agent_id>/components/<component_id>/input-form GET`


10. **Debug an agent**  
   **Prev API**: 
    - `/v1/canvas/debug POST`  
**Current API**:
`/api/v2/agents/<agent_id>/components/<component_id>/debug POST`


11. **Trace an agent**  
   **Prev API**: 
    - `/v1/canvas/trace GET`  
   **Current API**: `/api/v2/agents/<agent_id>/logs/<message_id> GET`


12. **Get an agent version list**  
   **Prev API**: 
    - `/v1/canvas/getlistversion/<canvas_id>`  
   **Current API**: `/api/v2/agents/<agent_id>/versions GET`


13. **Get a version of agent**  
   **Prev API**: 
    - `/v1/canvas/getversion/<version_id>`  
**Current API**: `/api/v2/agents/<agent_id>/versions/<version_id> GET`


14. **Test db connection**  
   **Prev API**: 
    - `/v1/canvas/test_db_connect POST`  
   **Current API**: `/api/v2/agents/test_db_connection`


15. **Rerun the agent**  
   **Prev API**: 
    - `/v1/canvas/rerun POST`  
   **Current API**: `/api/v2/agents/rerun POST`


16. **Get prompts**  
   **Prev API**: 
    - `/v1/canvas/prompts GET`  
   **Current API**: `/api/v2/agents/prompts GET`

### Type of change
- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: chanx <1243304602@qq.com>
2026-04-24 10:02:22 +08:00
d84438fd53 fix azure blob put method param (#14329)
### What problem does this PR solve?

when use azure blob as the file container, when click parse file, it
calls:

```python
partial(settings.STORAGE_IMPL.put, tenant_id=task["tenant_id"])
```
So any storage backend used there must accept tenant_id as a kwarg. 
RAGFlowAzureSasBlob.put() did not, causing:
```
TypeError: ... got an unexpected keyword argument 'tenant_id'
```
Now it does, so parsing should proceed past this point.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-23 20:40:54 +08:00
d4fa57311c Refa: remove legacy MCP server web API (#14322)
### What problem does this PR solve?

remove legacy MCP server web API

### Type of change

- [x] Refactoring
2026-04-23 19:01:22 +08:00
75a5548b85 Feat: optimize title chunk (#14325)
### What problem does this PR solve?

Feat: optimize title chunk
1. Add a new button to enable "Use root chunk as H0 heading", so that
the first chunk is carried on to all remaining chunks.
2. Update resume agent template

### Type of change

- [x] New Feature (non-breaking change which adds functionality)


<img width="700" alt="img_v3_02111_63b04951-b3d7-4001-a08b-539db6d5298g"
src="https://github.com/user-attachments/assets/4179ac4d-90e7-4353-9b93-d649a455e634"
/>

<img width="700" alt="image"
src="https://github.com/user-attachments/assets/c0ba0f3c-05aa-4f2c-b418-e808ca1a2641"
/>
2026-04-23 18:55:55 +08:00
ba47c13eb5 Fix commit override from #14298 of api-key to api_key (#14328)
### What problem does this PR solve?

Fix commit override from
https://github.com/infiniflow/ragflow/pull/14298/ of `api-key` to `api_key`

### Type of change

- [x] Refactoring
2026-04-23 17:16:32 +08:00
4458763a93 API refactor: stats_api and plugin_api (#14324)
### What problem does this PR solve?

API refactor: stats_api and plugin_api

### Type of change

- [x] Refactoring
2026-04-23 17:16:04 +08:00
7817b0d779 Refa: migrate chunk APIs to RESTful routes (#14291)
### What problem does this PR solve?

migrate chunk APIs to RESTful routes

### Type of change
- [x] Refactoring
2026-04-23 14:17:23 +08:00
76b017ca32 Refact: system apis (#14298)
### What problem does this PR solve?
Refact: system apis

### Type of change

- [x] Refactoring
2026-04-23 14:09:42 +08:00
57f527eb02 Add missing timeout to ragflow server health check (#14311)
### What problem does this PR solve?

`check_ragflow_server_alive()` in `api/utils/health_utils.py` calls
`requests.get(url)` without a `timeout` parameter. Unlike
`check_minio_alive()` which correctly specifies `timeout=10`, this
health check can hang indefinitely if the server is unresponsive.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Changes

Added `timeout=10` to the `requests.get()` call, consistent with
`check_minio_alive()`.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-23 14:08:52 +08:00
8901c18cb8 Build(deps): Bump lxml from 6.0.2 to 6.1.0 in /sdk/python (#14318)
Bumps [lxml](https://github.com/lxml/lxml) from 6.0.2 to 6.1.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/lxml/lxml/blob/master/CHANGES.txt">lxml's
changelog</a>.</em></p>
<blockquote>
<h1>6.1.0 (2026-04-17)</h1>
<p>This release fixes a possible external entity injection (XXE)
vulnerability in
<code>iterparse()</code> and the <code>ETCompatXMLParser</code>.</p>
<h2>Features added</h2>
<ul>
<li>
<p>GH#486: The HTML ARIA accessibility attributes were added to the set
of safe attributes
in <code>lxml.html.defs</code>. This allows <code>lxml_html_clean</code>
to pass them through.
Patch by oomsveta.</p>
</li>
<li>
<p>The default chunk size for reading from file-likes in
<code>iterparse()</code> is now configurable
with a new <code>chunk_size</code> argument.</p>
</li>
</ul>
<h2>Bugs fixed</h2>
<ul>
<li>LP#2146291: The <code>resolve_entities</code> option was still set
to <code>True</code> for
<code>iterparse</code> and <code>ETCompatXMLParser</code>, allowing for
external entity injection (XXE)
when using these parsers without setting this option explicitly.
The default was now changed to <code>'internal'</code> only (as for the
normal XML and HTML parsers
since lxml 5.0).
Issue found by Sihao Qiu as CVE-2026-41066.</li>
</ul>
<h1>6.0.4 (2026-04-12)</h1>
<h2>Bugs fixed</h2>
<ul>
<li>LP#2148019: Spurious MemoryError during namespace cleanup.</li>
</ul>
<h1>6.0.3 (2026-04-09)</h1>
<h2>Bugs fixed</h2>
<ul>
<li>
<p>Several out of memory error cases now raise <code>MemoryError</code>
that were not handled before.</p>
</li>
<li>
<p>Slicing with large step values (outside of <code>+/-
sys.maxsize</code>) could trigger undefined C behaviour.</p>
</li>
<li>
<p>LP#2125399: Some failing tests were fixed or disabled in PyPy.</p>
</li>
<li>
<p>LP#2138421: Memory leak in error cases when setting the
<code>public_id</code> or <code>system_url</code> of a document.</p>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="43722f4402"><code>43722f4</code></a>
Update changelog.</li>
<li><a
href="87470409b1"><code>8747040</code></a>
Name version of option change in docstring.</li>
<li><a
href="6c36e6cef7"><code>6c36e6c</code></a>
Fix pypistats URL in download statistics script.</li>
<li><a
href="c7d76d6cb8"><code>c7d76d6</code></a>
Change security policy to point to Github security advisories.</li>
<li><a
href="378ccf82db"><code>378ccf8</code></a>
Update project income report.</li>
<li><a
href="315270b810"><code>315270b</code></a>
Docs: Reduce TOC depth of package pages and move module contents
first.</li>
<li><a
href="6dbba7f3c7"><code>6dbba7f</code></a>
Docs: Show current year in copyright line.</li>
<li><a
href="e4385bfa5d"><code>e4385bf</code></a>
Update project income report.</li>
<li><a
href="5bed1e1a22"><code>5bed1e1</code></a>
Validate file hashes in release download script.</li>
<li><a
href="c13ee10a42"><code>c13ee10</code></a>
Prepare release of 6.1.0.</li>
<li>Additional commits viewable in <a
href="https://github.com/lxml/lxml/compare/lxml-6.0.2...lxml-6.1.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=lxml&package-manager=uv&previous-version=6.0.2&new-version=6.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-23 13:23:12 +08:00
224574831c Add REDIS zcard (#14316)
### What problem does this PR solve?

As description.

### Type of change

- [x] Refactoring
2026-04-23 12:51:55 +08:00
aa4526266f Refa: migrate MCP APIs to RESTful api (#14317)
### What problem does this PR solve?

migrate MCP APIs to RESTful api

### Type of change

- [x] Refactoring
2026-04-23 12:51:27 +08:00
dbf8c6ed90 Refactor: Doc metadata update (#14289)
### What problem does this PR solve?

Before migration
Web API: POST /v1/document/metadata/update

After migration, Restful API
PATCH /api/v2/datasets/<dataset_id>/documents/metadatas 

### Type of change

- [x] Refactoring
2026-04-23 12:04:34 +08:00
aae45b959b Refactor: API file2document (#14306)
Refactor: API file2document
2026-04-23 11:40:45 +08:00
e79b896637 Refactor: REST API langfuse api-key (#14315)
REST API langfuse api-key
2026-04-23 11:36:16 +08:00
f98597a19e Fix: Recall Test Page Metadata Not Displaying. (#14297)
### What problem does this PR solve?

Fix: Recall Test Page Metadata Not Displaying.
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-23 10:57:05 +08:00
2b029882d7 Go: add new provider minimax (#14296)
### What problem does this PR solve?

1. Add new provider minimax
2. Add new command: CHECK INSTANCE 'instance_name' FROM 'provider_name';
```
RAGFlow(user)> check instance 'test' from 'minimax';
SUCCESS
```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-23 10:16:20 +08:00
387e2903d3 Fix: Some bugs (#14287)
### What problem does this PR solve?

Fix: Some bugs

- Pipeline runtime log files could not be viewed
- Corrected TOC terminology errors in the English translation

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-04-23 10:15:26 +08:00
ffa8738a78 Fix: Remove duplicate text output from the thought model on the chat page. (#14301)
### What problem does this PR solve?

Fix: Remove duplicate text output from the thought model on the chat
page.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-22 23:22:51 +08:00
01753b8f31 Refactor: API connectors (#14228)
### What problem does this PR solve?

Refactor /api/v1/connectors to be more RESTful.

### Type of change
- [x] Refactoring
2026-04-22 20:42:41 +08:00
c08cd8e090 Refactor: Migrate document metadata config update API (#14286)
### What problem does this PR solve?

Before migration
Web API: POST /v1/document/update_metadata_setting

After consolidation, Restful API
PUT
/api/v1/datasets/<dataset_id>/documents/<document_id>/metadata/config

### Type of change

- [x] Refactoring
2026-04-22 20:01:31 +08:00
d1c62fc19d Refact: Tenant api (#14288)
### What problem does this PR solve?

Refact: Tenant api

### Type of change

- [x] Refactoring
2026-04-22 20:00:32 +08:00
1434f8ade8 Doc: two PDF parser optimizers are supported as of v0.25.0. (#14261)
### What problem does this PR solve?

Multi-column layout detection is supported in v0.25.0

### Type of change


- [x] Documentation Update
2026-04-22 20:00:06 +08:00
b52c518ec9 Set image tag v0.25.0 (#14299)
### What problem does this PR solve?

AD

### Type of change
2026-04-22 19:12:21 +08:00
38e45a1117 Fix: serialize GraphRAG entity resolution merges to avoid graph mutation races (#14237)
### What problem does this PR solve?

This PR fixes the merge-phase crash reported in #14236 during GraphRAG
entity resolution.

The issue happens after candidate pair resolution completes, when
multiple merge coroutines mutate the same shared `networkx` graph
concurrently. In `_merge_graph_nodes`, the code iterates over
`graph.neighbors(node1)` and also awaits during edge/description
merging. That allows another coroutine to modify the graph adjacency
structure in between, which can trigger `RuntimeError: dictionary keys
changed during iteration` and can also lead to unsafe shared-graph
mutation.

This change keeps the PR scoped to that single issue by:
- serializing merge-time graph mutations with a dedicated merge lock
- snapshotting `graph.neighbors(node1)` with `list(...)` before
iteration

Together, these changes prevent concurrent mutation of the shared graph
during the merge phase and make the merge loop safe against live-view
invalidation.

Fixes #14236

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-22 16:42:53 +08:00