### What problem does this PR solve?
Closes#14674.
This PR improves RAPTOR configuration and tree construction while
preserving the existing RAPTOR behavior as the default.
RAPTOR currently builds summary layers with the original UMAP + GMM
clustering path. This PR keeps that default path, and adds:
- A hidden backend tree-builder option:
- `tree_builder="raptor"`: default, existing RAPTOR behavior.
- `tree_builder="psi"`: rank-aware Psi-style tree builder using original
embedding-space cosine ranking.
- A user-facing clustering method option for the default RAPTOR builder:
- `clustering_method="gmm"`: existing default.
- `clustering_method="ahc"`: agglomerative hierarchical clustering path.
- A RAPTOR UI setting for `Clustering method` and `Max cluster`.
### What changed
#### Backend
- Added `tree_builder` support for RAPTOR/Psi.
- Added `clustering_method` support for GMM/AHC.
- Kept existing RAPTOR + GMM as the default.
- Added Psi tree building from original-space cosine similarity.
- Added bucketed Psi building controls for large inputs:
- `raptor.ext.psi_exact_max_leaves`
- `raptor.ext.psi_bucket_size`
- Added method-aware RAPTOR summary metadata using existing
`extra.raptor_method`.
- Avoided adding a dedicated DB schema field for experimental method
tracking.
- Added cleanup/migration logic to avoid mixing stale RAPTOR summary
trees.
- Added defensive checks for Psi tree construction and summary failures.
#### Frontend/UI
- Added `Clustering method` in RAPTOR settings with `GMM` and `AHC`.
- Added/kept `Max cluster` in RAPTOR settings.
- Enlarged max cluster UI limit to `1024`, matching backend validation.
- Kept AHC editable even when a RAPTOR task has already finished.
- Fixed the UI save payload so `clustering_method` and `tree_builder`
are serialized through `parser_config.raptor.ext`, avoiding backend
validation errors for extra top-level RAPTOR fields.
Example saved RAPTOR config:
```json
{
"raptor": {
"max_cluster": 317,
"ext": {
"clustering_method": "ahc",
"tree_builder": "raptor"
}
}
}
Co-authored-by: CaptainTimon <CaptainTimon@users.noreply.github.com>
### What problem does this PR solve?
Currently, RAGFlow's Search and Chat interfaces display only raw
vectorized text chunks during retrieval, without contextual information
about their source documents. Users cannot see document titles, page
numbers, upload dates, or custom metadata fields that would help them
understand and trust the retrieved results.
This PR introduces an **optional metadata display feature** that
enriches retrieved chunks with document-level metadata in both the
Search tab and Chatbot interface.
**Key improvements:**
- **Search results**: Display document metadata as styled badges beneath
chunk snippets
- **Chat citations**: Show metadata in citation popovers and reference
lists for better source context
- **LLM context**: Metadata is injected into the LLM prompt to enable
more accurate, citation-aware responses
- **External API support**: Applications using RAGFlow's SDK retrieval
endpoints (`/v1/retrieval`, `/v1/searchbots/retrieval_test`) can opt-in
via request parameters
- **User control**: Multi-select dropdown UI allows users to choose
which metadata fields to display
**Implementation approach:**
- ✅ Reuses existing `DocMetadataService` infrastructure (no new database
tables or indices)
- ✅ Settings stored in existing JSON configuration fields
(`search_config.reference_metadata`, `prompt_config.reference_metadata`)
- ✅ No database migrations required
- ✅ Disabled by default (fully opt-in and backward-compatible)
- ✅ Dynamic metadata field selection populated from actual document
metadata keys
- ✅ Fixed critical bug where Python's builtin `set()` was shadowed by a
route handler function
**Modified endpoints (all backward-compatible):**
- `POST /v1/retrieval` (Public SDK)
- `POST /v1/searchbots/retrieval_test` (Searchbots)
- `POST /v1/chunk/retrieval_test` (UI/Internal)
- Chat completions endpoints (via `extra_body.reference_metadata` or
`prompt_config`)
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
###Images
-
<img width="879" height="1275" alt="image"
src="https://github.com/user-attachments/assets/95b2d731-31ae-45a1-b081-bf5893f52aeb"
/>
<br><br>
<br><br>
<img width="1532" height="362" alt="image"
src="https://github.com/user-attachments/assets/9cebc65b-b7a7-459f-b25e-3b13fa9b638e"
/>
<br><br>
<br><br>
<img width="2586" height="1320" alt="image"
src="https://github.com/user-attachments/assets/2153d493-d899-461f-a7a9-041391e07776"
/>
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Attili-sys <Attili-sys@users.noreply.github.com>
Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
### What problem does this PR solve?
Fix: Dataset: When configuring the "general chunk method," options such
as chunk size and parent-child slicing are unavailable.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: balibabu <assassin_cike@163.com>
### What problem does this PR solve?
Before migration
Web API: POST /v1/document/change_parser
HTTP API: PATCH /api/v1/datasets/<dataset_id>/documents
After consolidation, Restful API
PATCH /api/v1/datasets/<dataset_id>/documents
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Fix: The number of chunks in the file list is not displayed.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Before consolidation
Web API: POST /v1/document/list
Http API - GET /api/v1/datasets/<dataset_id>/documents
After consolidation, Restful API -- GET
/api/v1/datasets/<dataset_id>/documents
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Fix: The file count in the file header did not change after uploading or
deleting files.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
### What problem does this PR solve?
Closes#13907
The template catalog had duplicate files (e.g. `*_r.json`) only to place
the same template into multiple sidebar groups.
This increases maintenance cost and makes template updates error-prone.
This PR adds first-class support for multiple template categories in a
single file via `canvas_types`, then removes duplicate template files.
What changed:
- Added `canvas_types` to `CanvasTemplate` model and DB migration.
- Added normalization logic when loading templates:
- accepts legacy `canvas_type`
- accepts new `canvas_types`
- merges/deduplicates values
- preserves backward compatibility by keeping `canvas_type` as first
normalized value.
- Updated template import flow to load only `.json` files and in stable
sorted order.
- Updated frontend template filtering to match on `canvas_types` first,
with fallback to legacy `canvas_type`.
- Consolidated duplicated template pairs into single files and removed:
- `deep_search_r.json`
- `reflective_academic_paper_generator_r.json`
- `seo_article_writer_r.json`
- Added regression/edge-case tests for category normalization and route
serialization expectations.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
1. Split dataset api to gateway and service, and modify web UI to use
restful http api.
2. Old KB releated APIs are commented.
### Type of change
- [x] Refactoring
---------
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
### What problem does this PR solve?
Feat: Export Agent Logs.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: balibabu <assassin_cike@163.com>
### What problem does this PR solve?
1. Split dataset api to gateway and service, and modify web UI to use
restful http api.
2. Old KB releated APIs are commented.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
1. Split dataset api to gateway and service, and modify web UI to use
restful http api.
2. Old KB releated APIs are commented.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Feat: Modify the style of the release confirmation box.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
Co-authored-by: balibabu <assassin_cike@163.com>
Co-authored-by: 6ba3i <isbaaoui09@gmail.com>
### What problem does this PR solve?
Feat: Display release status in agent version history.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: balibabu <assassin_cike@163.com>
### What problem does this PR solve?
feat: Adds the tenant model ID field to the interface definition
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Feat: Add a web search button to the chat box on the chat page.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Add a think button to the chat box. #12742
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fix:Automatically enable metadata and optimize parser dialog logic
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Refactor: Refactoring VolcEngine and Yiyan modal using shadcn. #10427
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Back-end may returns empty array on `"doc_type_kwd"` property which
causes translation key malformed.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Improve image table context.
Current strategy in attach_media_context:
- Order by position when possible: if any chunk has page/position info,
sort by (page, top, left), otherwise keep original order.
- Apply only to media chunks: images use image_context_size, tables use
table_context_size.
- Primary matching: on the same page, choose a text chunk whose vertical
span overlaps the media, then pick the one with the closest vertical
midpoint.
- Fallback matching: if no overlap on that page, choose the nearest text
chunk on the same page (page-head uses the next text; page-tail uses the
previous text).
- Context extraction: inside the chosen text chunk, find a mid-sentence
boundary near the text midpoint, then take context_size tokens split
before/after (total budget).
- No multi-chunk stitching: context comes from a single text chunk to
avoid mixing unrelated segments.
### Type of change
- [x] Refactoring
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Fix: Add a no-data filter condition to MetaData
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix:Bugs Fixed
- Text overflow issues that caused rendering problems
- Metadata saving, copywriting and other related issues
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix: Metadata bugs.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Feat: Add memory multi-select dropdown to recall and message operator
forms. #4213
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
feature: Complete metadata functionality
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Display chunk type in chunk editor and dialog, may be one of below:
- Image
- Table
- Text
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add support for multiple Retrieval tools under an agent
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: Remove HMAC from the webhook #10427
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Only support MinerU-API now, still need to complete frontend for
pipeline to allow the configuration of MinerU options.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Feat: Add mineru as a model manufacturer to the system. #10621
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: balibabu <assassin_cike@163.com>
### What problem does this PR solve?
Retrieval metadata filtering adds semi-automatic mode, and users can
manually check the metadata key that participates in LLM to generate
filter conditions.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)