Commit Graph

5982 Commits

Author SHA1 Message Date
8aaf0942b1 Doc: Updated v0.25.1 release notes (#14519)
### What problem does this PR solve?

Updated v0.25.1 release notes.

### Type of change


- [x] Documentation Update
2026-04-30 11:59:53 +08:00
9075872435 Fix: Manual/Naive outline tuple unpack crash (#14518)
### What problem does this PR solve?

This fixes a crash in Manual and Naive parsing when PDF outlines include
page numbers as a third tuple value. It makes outline unpacking accept
extra values so parsing no longer fails. fixes #14411

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-30 11:55:02 +08:00
71952b6b58 Chore(deps): Bump go.opentelemetry.io/otel from 1.39.0 to 1.41.0 (#14516)
Bumps
[go.opentelemetry.io/otel](https://github.com/open-telemetry/opentelemetry-go)
from 1.39.0 to 1.41.0.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-30 11:43:24 +08:00
06c6da5d94 Fix: add document delete permission check (#14472)
### What problem does this PR solve?

add document delete permission check

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-30 11:01:09 +08:00
47129fdd08 Fix: optimize file batch delete (#14473)
### What problem does this PR solve?

optimize file batch delete

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-30 11:00:39 +08:00
811e9826d0 perf: avoid O(n²) array growth in embedding accumulation (#14369)
### What problem does this PR solve?

Both tokenizer (`rag/flow/tokenizer/tokenizer.py`) and
`BuiltinEmbed.encode`
(`rag/llm/embedding_model.py`) currently accumulate embedding batches
via
`np.concatenate` inside the per-batch loop. `np.concatenate` allocates a
new
array and copies all existing data on every call, so accumulating N
batches
is O(N²) in both time and peak memory.

Replacing the incremental concatenate with a list-of-batches + a single
`np.vstack` at the end gives O(N) total work.

For tokenizer the title-vector broadcast `np.concatenate([vts[0]] * N)`
is
also replaced by `np.tile`, which does the same job with a single
contiguous
allocation instead of building a Python list of references.

This is purely a CPU/memory optimisation — output shape and dtype are
unchanged. Measured impact grows with document size:
  -   1k chunks (batch 512, 2 iters):    ~negligible
  -  10k chunks (20 iters):              ~10× speedup on this stage
  - 100k chunks (195 iters):             ~100× speedup, and peak RAM
drops from O(N) extra to near-zero

### Type of change

- [x] Performance Improvement

Co-authored-by: yoan sapienza <Yoan Sapienza yoan.sapienza@orange.fr Yoan Sapienza zappy@macbookpro.home>
2026-04-30 11:00:10 +08:00
2548c28d65 feat: add FuturMix as model provider (#14419)
## Summary

Add [FuturMix](https://futurmix.ai) as a new model provider. FuturMix is
an OpenAI-compatible unified AI gateway that provides access to 22+
models (GPT, Claude, Gemini, DeepSeek, and more) through a single API
endpoint and key.

- **API Base**: `https://futurmix.ai/v1` (OpenAI-compatible)
- **Supported capabilities**: Chat, Embedding, Image2Text, TTS,
Speech2Text, Rerank

### Changes

| File | Change |
|------|--------|
| `rag/llm/__init__.py` | Add `FuturMix` to `SupportedLiteLLMProvider`
enum, `FACTORY_DEFAULT_BASE_URL`, and `LITELLM_PROVIDER_PREFIX` |
| `rag/llm/chat_model.py` | Add `FuturMixChat(Base)` — follows
Astraflow/Avian pattern |
| `rag/llm/embedding_model.py` | Add `FuturMixEmbed(OpenAIEmbed)` —
follows Astraflow pattern |
| `rag/llm/cv_model.py` | Add `FuturMixCV(GptV4)` — follows
SILICONFLOW/OpenRouter pattern |
| `rag/llm/tts_model.py` | Add `FuturMixTTS(OpenAITTS)` — follows
CometAPI/DeerAPI pattern |
| `rag/llm/sequence2txt_model.py` | Add `FuturMixSeq2txt(GPTSeq2txt)` —
follows StepFun pattern |
| `rag/llm/rerank_model.py` | Add `FuturMixRerank(OpenAI_APIRerank)` |
| `conf/llm_factories.json` | Add factory config with 8 chat, 2
embedding, 1 image2text, 2 TTS, 1 speech2text models |
| `docs/guides/models/supported_models.mdx` | Add FuturMix to supported
models table |

### Models included

- **Chat**: claude-sonnet-4-20250514, claude-3.5-haiku, gpt-4o,
gpt-4o-mini, gemini-2.5-flash, gemini-2.0-flash, deepseek-chat,
deepseek-reasoner
- **Embedding**: text-embedding-3-small, text-embedding-3-large
- **Image2Text**: gpt-4o
- **TTS**: tts-1, tts-1-hd
- **Speech2Text**: whisper-1

## Test plan

- [ ] Verify FuturMix appears in the model provider list in RAGFlow UI
- [ ] Configure FuturMix with API key and test chat completion
- [ ] Test embedding model with document indexing
- [ ] Test image2text with a sample image

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-30 10:59:37 +08:00
ce4c782fd7 Docs: Update version references to v0.25.1 in READMEs and docs (#14488)
### What problem does this PR solve?

- Update version tags in README files (including translations) from
v0.25.0 to v0.25.1
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files

### Type of change

- [x] Documentation Update
v0.25.1
2026-04-30 10:49:26 +08:00
7c0584a2b7 Fix: The GraphRAG icon is not displaying. (#14514)
### What problem does this PR solve?

Fix: The GraphRAG icon is not displaying.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-30 10:44:05 +08:00
0fa2bd539d Chore(deps): Bump google.golang.org/grpc from 1.66.2 to 1.79.3 (#14513)
Bumps [google.golang.org/grpc](https://github.com/grpc/grpc-go) from
1.66.2 to 1.79.3.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-30 10:35:03 +08:00
6dd38eca6a fix: file logs not displayed in dataset ingestion page (#14479)
### What problem does this PR solve?

## Summary

Fixed a bug where the **File Logs** tab in the dataset ingestion page
always showed "No logs" even after files were parsed successfully.

## Root Cause

Both the **File Logs** and **Dataset Logs** tabs on the frontend called
the same backend endpoint `/datasets/{dataset_id}/ingestions`. However,
the backend only queried `get_dataset_logs_by_kb_id`, which
hard-filtered records by `document_id == GRAPH_RAPTOR_FAKE_DOC_ID`
(dataset-level logs). As a result, real file-level logs were never
returned, causing the table to appear empty.

## Changes

### Backend

- **`api/apps/restful_apis/dataset_api.py`**
  - Added two new query parameters to `list_ingestion_logs`:
    - `log_type` — `"file"` or `"dataset"` (default: `"dataset"`)
    - `keywords` — search keyword for filtering by document / task name

- **`api/apps/services/dataset_api_service.py`**
- Updated `list_ingestion_logs` signature to accept `log_type` and
`keywords`.
  - Added conditional routing:
- When `log_type == "file"`, call
`PipelineOperationLogService.get_file_logs_by_kb_id`
- Otherwise, call
`PipelineOperationLogService.get_dataset_logs_by_kb_id`

- **`api/db/services/pipeline_operation_log_service.py`**
- Extended `get_dataset_logs_by_kb_id` with an optional `keywords`
parameter so dataset logs can also be searched.

### Frontend

- **`web/src/pages/dataset/dataset-overview/hook.ts`**
- Removed the separate API function switching (`listPipelineDatasetLogs`
vs `listDataPipelineLogDocument`).
- Unified both tabs to call `listDataPipelineLogDocument` with the new
`log_type` query parameter (`"file"` or `"dataset"`).
  - Ensured `keywords` and filter values are passed through correctly.

## Behavior After Fix

| Tab | `log_type` | Returned Records | Searchable Field |
|---|---|---|---|
| File Logs | `file` | Real document-level logs | `document_name` (file
name) |
| Dataset Logs | `dataset` | GraphRAG / RAPTOR / MindMap logs |
`document_name` (task type) |
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: noob <yixiao121314@outlook.com>
Co-authored-by: Wang Qi <wangq8@outlook.com>
Co-authored-by: Yingfeng Zhang <yingfeng.zhang@gmail.com>
2026-04-29 22:10:24 +08:00
5018459112 Fix metadata config (#14480)
### What problem does this PR solve?

Fix metadata config

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-29 21:09:54 +08:00
d4147efc66 Docs: (#14492)
### What problem does this PR solve?

Added v0.25.1 release notes

### Type of change


- [x] Documentation Update
2026-04-29 20:29:58 +08:00
c4d0b0ebcf Fix visit dataset error (#14490)
### What problem does this PR solve?

Fix visit dataset error

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-29 20:17:00 +08:00
1692f0928f Fix: The pipeline column header in the FileLogsTable is displaying incorrectly. (#14489)
### What problem does this PR solve?
Fix: The pipeline column header in the FileLogsTable is displaying
incorrectly.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-29 19:52:28 +08:00
9280c64518 Docs: Updated Title chunker references (#14483)
### What problem does this PR solve?

Updated Title chunker references

### Type of change

- [x] Documentation Update
2026-04-29 19:37:24 +08:00
261be81127 Go: add drop instance models (#14485)
### What problem does this PR solve?

1. drop instance model
2. Fix issue of drop instance but not drop models.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-29 19:18:49 +08:00
0e1477eb23 Go: implement provider: MiniMax (#14478)
### What problem does this PR solve?

implement MiniMax provider

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
2026-04-29 19:06:40 +08:00
de8c6ad0f3 Feat: enable sync deleted file for Discord (#14451)
### What problem does this PR solve?

Feat: enable sync deleted file for Discord

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-04-29 19:05:40 +08:00
2bc8c6d35e feat(dropbox): support deleted-file sync (#14476)
### What problem does this PR solve?

Partially addresses #14362 by adding deleted-file sync support for the
Dropbox data source.

Dropbox previously did not provide the slim current-file snapshot
required by stale document reconciliation, and its sync runner returned
only document batches. As a result, enabling deleted-file sync could not
remove local documents that had been deleted from Dropbox.

This PR:
- Adds `retrieve_all_slim_docs_perm_sync()` to `DropboxConnector`.
- Reuses Dropbox metadata traversal to collect current remote file IDs
without downloading file contents.
- Wires incremental Dropbox sync to return `(document_generator,
file_list)` when `sync_deleted_files` is enabled.
- Enables the deleted-file sync toggle for Dropbox in the data source
settings UI.
- Adds regression coverage for slim snapshots, nested folders, paginated
listings, duplicate filenames, and full reindex behavior.

Tests:
- `uv run pytest test/unit_test/common/test_dropbox_connector.py -q`
- `uv run pytest test/unit_test/rag/test_sync_data_source.py -q`
- `uv run pytest test/unit_test/common/test_dropbox_connector.py
test/unit_test/rag/test_sync_data_source.py -q`
- `uv run ruff check common/data_source/dropbox_connector.py
rag/svr/sync_data_source.py
test/unit_test/common/test_dropbox_connector.py
test/unit_test/rag/test_sync_data_source.py`
- `./node_modules/.bin/eslint
src/pages/user-setting/data-source/constant/index.tsx`

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-04-29 19:05:11 +08:00
db1a73b255 Feat: enable sync deleted files in gitlab (#14481)
### What problem does this PR solve?

Feat: enable sync deleted files in gitlab

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-04-29 19:04:10 +08:00
a0f9ae16d2 Fix: RAPTOR "Generation scope" reset to "Single file" when selecting "Dataset" (#14477)
## Problem
In the Dataset Configuration page, changing the RAPTOR **Generation
scope** from "Single file" to "Dataset" and clicking **Save** did not
persist the change. After refreshing or re-entering the page, the scope
always reverted to "Single file".

## Root Cause
1. **Backend**: The `RaptorConfig` Pydantic model in
`api/utils/validation_utils.py` was configured with `extra="forbid"` but
did not declare a `scope` field. When the frontend sent `"scope":
"dataset"`, Pydantic rejected the request.
2. **Frontend**: The `extractRaptorConfigExt` utility in
`web/src/hooks/parser-config-utils.ts` treated `scope` as an unknown
field and moved it into the nested `ext` object. Consequently, the
backend could not read `raptor_config.get("scope", "file")` correctly,
so the default `"file"` was always used.

## Changes
- Added `scope: Literal["file", "dataset"]` to the backend
`RaptorConfig` model with a default of `"file"`.
- Added `scope` to the known-field whitelist in the frontend
`extractRaptorConfigExt` helper so it is transmitted as a top-level
raptor field instead of being buried in `ext`.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Signed-off-by: noob <yixiao121314@outlook.com>
2026-04-29 18:46:28 +08:00
1b84892e3a Fix delete graph (#14484)
### What problem does this PR solve?

Fix delete graph

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
2026-04-29 18:09:10 +08:00
3991bdfaf5 Fix graph task type (#14475)
### What problem does this PR solve?
Fix graph task type

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-29 17:05:56 +08:00
bb05a8bd7e Update create model instance command (#14441)
### What problem does this PR solve?

1. support command:

```
RAGFlow(user)> create provider 'vllm' instance 'test' key 'test-key' url 'base-url' region 'abc';
SUCCESS
RAGFlow(user)> list instances from 'vllm';
+----------+----------------------------------------+----------------------------------+--------------+----------------------------------+--------+
| apiKey   | extra                                  | id                               | instanceName | providerID                       | status |
+----------+----------------------------------------+----------------------------------+--------------+----------------------------------+--------+
| test-key | {"base_url":"base-url","region":"abc"} | 40213c89430311f1a7cf38a74640adcc | test         | b4d40e6142d311f1a4f938a74640adcc | enable |
+----------+----------------------------------------+----------------------------------+--------------+----------------------------------+--------+
```
2. support add vllm model
```
RAGFlow(user)> add model 'Qwen/Qwen2-0.5B' to provider 'vllm' instance 'test' with tokens 131072 chat;
SUCCESS
```
3. add vllm chat

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-29 17:05:08 +08:00
486ca463aa Port PR14454 to GO (PruneDeletedChunks) (#14463)
### What problem does this PR solve?

Port PR14454 to GO (PruneDeletedChunks)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-29 17:04:22 +08:00
e0b3070012 Feat: enable sync deleted files for Gmail && fix google drive issues (#14462)
### What problem does this PR solve?

Feat: enable sync deleted files for Gmail && fix google drive issues

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: bill <yibie_jingnian@163.com>
Co-authored-by: balibabu <assassin_cike@163.com>
2026-04-29 17:03:56 +08:00
a736948493 Fix: Clicking the button in the bottom-right corner of the /chats/widget page fails to display the dialog box. (#14465)
### What problem does this PR solve?

Fix: Clicking the button in the bottom-right corner of the
`/chats/widget` page fails to display the dialog box.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-29 17:03:33 +08:00
6afb1957d8 Fix query param type (#14471)
### What problem does this PR solve?

Fix query param type

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-29 16:53:28 +08:00
9690923516 Fix delete graphrag raptor (#14469)
### What problem does this PR solve?

Fix delete graphrag raptor

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-29 16:47:42 +08:00
decf673049 Go: implement provider: volcengine (#14460)
### What problem does this PR solve?

implement `volcengine` provider 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-04-29 15:45:08 +08:00
b684c89950 Add backward compat APIs (#14427)
### What problem does this PR solve?

Add backward compat APIs:

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-29 15:15:49 +08:00
c08ced09a7 Fix: add retrieval fallback comments (#14457)
### What problem does this PR solve?

add retrieval fallback comments

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-29 14:44:31 +08:00
f3c232cf47 Remove model_bundle.go, modify chat_session.go (#14458)
### What problem does this PR solve?

Remove model_bundle.go, modify chat_session.go

### Type of change

- [x] Refactoring
2026-04-29 14:44:12 +08:00
ce933357c6 Fix: Dataset: When configuring the "general chunk method," options such as chunk size and parent-child slicing are unavailable. (#14459)
### What problem does this PR solve?

Fix: Dataset: When configuring the "general chunk method," options such
as chunk size and parent-child slicing are unavailable.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: balibabu <assassin_cike@163.com>
2026-04-29 14:37:48 +08:00
a7ce1b1677 Fix: prune deleted doc chunks from retrieval (#14454)
### What problem does this PR solve?

prune deleted doc chunks from retrieval

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-29 13:03:09 +08:00
b493a33316 Go: update chat URL (#14453)
### What problem does this PR solve?

Update the URL to: /api/v1/chat/completions

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-29 11:45:06 +08:00
3b7a6eaa6c Feat: sync deleted files in Bitbucket (#14450)
### What problem does this PR solve?

Feat: sync deleted files in Bitbucket

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-04-29 11:29:17 +08:00
74fa54f122 feat(google-drive): optimize memory payload and enable sync deletion (#14372)
**Addresses the Google Drive integration for #14362**

This PR completely overhauls the Google Drive sync logic to accurately
detect remote deletions, while drastically reducing the memory footprint
during the snapshot phase.

### What changed under the hood:

* **Killed the memory bloat:** Swapped out the massive document
dictionary objects for a lightweight `collections.namedtuple` (`SlimDoc
= namedtuple('SlimDoc', ['id'])`). This prevents RAM spikes during
`retrieve_all_slim_docs_perm_sync` on massive enterprise drives.
* **Flawless downstream integration:** The `SlimDoc` object relies on
simple duck typing. It perfectly delivers the `.id` attribute required
by `ConnectorService.cleanup_stale_documents_for_task`, meaning your
core `hash128` vector cleanup logic runs natively without modification.
* **Fixed the Shared Drive blindspot:** The standard API query was
missing team folders. Injected the `corpora="allDrives"` and
`includeItemsFromAllDrives=True` override flags so the connector now
accurately maps state across both personal workspaces and organizational
Shared Drives.

### Testing:
Isolated the Google API retrieval logic locally to prove the `SlimDoc`
mapping works and correctly registers state drops when a file is trashed
remotely.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Performance Improvement
2026-04-29 10:04:36 +08:00
345bec812d refactor: improve QwenRerank logic (#14388)
### What problem does this PR solve?

improve QwenRerank logic

### Type of change

- [x] Refactoring
2026-04-28 20:17:34 +08:00
0d18b293f5 Fix: enable sync deleted file in airtable (#14438)
### What problem does this PR solve?

Fix: enable sync deleted file in airtable

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-28 20:09:08 +08:00
926efbd29b Fix: update based on #14436 (#14440)
### What problem does this PR solve?

Fix: update based on #14436

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-28 20:08:42 +08:00
35f6d81b73 Refactor: migrate chunk retrieval_test and knowledge_graph to REST API endpoints (#14402)
### What problem does this PR solve?

## Summary

Migrate two web API endpoints to REST-style HTTP API endpoints,
following the pattern established in #14222:

| Old Endpoint | New Endpoint |
|---|---|
| `POST /v1/chunk/retrieval_test` | `POST
/api/v1/datasets/<dataset_id>/search` |
| `GET /v1/chunk/knowledge_graph` | `GET
/api/v1/datasets/<dataset_id>/graph` |
2026-04-28 20:00:26 +08:00
85575259ac Fix: google authentication - gmail && google-drive (#14422)
### What problem does this PR solve?

Fix: google authentication - gmail && google-drive

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-28 18:09:02 +08:00
dcce864d4c Simplify Encode (#14437)
### What problem does this PR solve?

Simplify Encode

### Type of change

- [x] Refactoring
2026-04-28 18:07:42 +08:00
d532151be0 Feat: more model for paddle (#14436)
### What problem does this PR solve?

Feat: more model for paddle
### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-04-28 18:07:00 +08:00
4e5a093ac5 Go: implement provider: Moonshot (#14433)
### What problem does this PR solve?

implement `Moonshot` provider

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-04-28 18:06:25 +08:00
c330005659 Fix: document level auto metadata config missing after save (#14421)
### What problem does this PR solve?

Steps to re-produce (existing bug before API migration):

create a new dataset
upload a file 
click on "General" in "Parse" column and then click on "switch or
configure ingestion pipeline"
click on "Settings" (at right of "Auto metadata")
click "Add" to add new metadata
click on "Save"
re-open "Settings" and the newly added metadata is not there

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-28 17:09:23 +08:00
e6e80041f5 Fix: agent toolcall null response & schema validation & DeepSeek think history (#14425)
### What problem does this PR solve?
agent toolcall null response & schema validation & DeepSeek think
history

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-28 17:09:08 +08:00
f670913bb4 Refactor model type to model class (#14426)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-28 16:05:15 +08:00