### What problem does this PR solve?
Currently, RAGFlow's Search and Chat interfaces display only raw
vectorized text chunks during retrieval, without contextual information
about their source documents. Users cannot see document titles, page
numbers, upload dates, or custom metadata fields that would help them
understand and trust the retrieved results.
This PR introduces an **optional metadata display feature** that
enriches retrieved chunks with document-level metadata in both the
Search tab and Chatbot interface.
**Key improvements:**
- **Search results**: Display document metadata as styled badges beneath
chunk snippets
- **Chat citations**: Show metadata in citation popovers and reference
lists for better source context
- **LLM context**: Metadata is injected into the LLM prompt to enable
more accurate, citation-aware responses
- **External API support**: Applications using RAGFlow's SDK retrieval
endpoints (`/v1/retrieval`, `/v1/searchbots/retrieval_test`) can opt-in
via request parameters
- **User control**: Multi-select dropdown UI allows users to choose
which metadata fields to display
**Implementation approach:**
- ✅ Reuses existing `DocMetadataService` infrastructure (no new database
tables or indices)
- ✅ Settings stored in existing JSON configuration fields
(`search_config.reference_metadata`, `prompt_config.reference_metadata`)
- ✅ No database migrations required
- ✅ Disabled by default (fully opt-in and backward-compatible)
- ✅ Dynamic metadata field selection populated from actual document
metadata keys
- ✅ Fixed critical bug where Python's builtin `set()` was shadowed by a
route handler function
**Modified endpoints (all backward-compatible):**
- `POST /v1/retrieval` (Public SDK)
- `POST /v1/searchbots/retrieval_test` (Searchbots)
- `POST /v1/chunk/retrieval_test` (UI/Internal)
- Chat completions endpoints (via `extra_body.reference_metadata` or
`prompt_config`)
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
###Images
-
<img width="879" height="1275" alt="image"
src="https://github.com/user-attachments/assets/95b2d731-31ae-45a1-b081-bf5893f52aeb"
/>
<br><br>
<br><br>
<img width="1532" height="362" alt="image"
src="https://github.com/user-attachments/assets/9cebc65b-b7a7-459f-b25e-3b13fa9b638e"
/>
<br><br>
<br><br>
<img width="2586" height="1320" alt="image"
src="https://github.com/user-attachments/assets/2153d493-d899-461f-a7a9-041391e07776"
/>
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Attili-sys <Attili-sys@users.noreply.github.com>
Co-authored-by: Ahmad Intisar <ahmadintisar@Ahmads-MacBook-M4-Pro.local>
### What problem does this PR solve?
## Summary
Fixed a bug where the **File Logs** tab in the dataset ingestion page
always showed "No logs" even after files were parsed successfully.
## Root Cause
Both the **File Logs** and **Dataset Logs** tabs on the frontend called
the same backend endpoint `/datasets/{dataset_id}/ingestions`. However,
the backend only queried `get_dataset_logs_by_kb_id`, which
hard-filtered records by `document_id == GRAPH_RAPTOR_FAKE_DOC_ID`
(dataset-level logs). As a result, real file-level logs were never
returned, causing the table to appear empty.
## Changes
### Backend
- **`api/apps/restful_apis/dataset_api.py`**
- Added two new query parameters to `list_ingestion_logs`:
- `log_type` — `"file"` or `"dataset"` (default: `"dataset"`)
- `keywords` — search keyword for filtering by document / task name
- **`api/apps/services/dataset_api_service.py`**
- Updated `list_ingestion_logs` signature to accept `log_type` and
`keywords`.
- Added conditional routing:
- When `log_type == "file"`, call
`PipelineOperationLogService.get_file_logs_by_kb_id`
- Otherwise, call
`PipelineOperationLogService.get_dataset_logs_by_kb_id`
- **`api/db/services/pipeline_operation_log_service.py`**
- Extended `get_dataset_logs_by_kb_id` with an optional `keywords`
parameter so dataset logs can also be searched.
### Frontend
- **`web/src/pages/dataset/dataset-overview/hook.ts`**
- Removed the separate API function switching (`listPipelineDatasetLogs`
vs `listDataPipelineLogDocument`).
- Unified both tabs to call `listDataPipelineLogDocument` with the new
`log_type` query parameter (`"file"` or `"dataset"`).
- Ensured `keywords` and filter values are passed through correctly.
## Behavior After Fix
| Tab | `log_type` | Returned Records | Searchable Field |
|---|---|---|---|
| File Logs | `file` | Real document-level logs | `document_name` (file
name) |
| Dataset Logs | `dataset` | GraphRAG / RAPTOR / MindMap logs |
`document_name` (task type) |
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Signed-off-by: noob <yixiao121314@outlook.com>
Co-authored-by: Wang Qi <wangq8@outlook.com>
Co-authored-by: Yingfeng Zhang <yingfeng.zhang@gmail.com>
### What problem does this PR solve?
Feat: enable sync delted files for connectors
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Fixes#14196
## Problem
When using DeepDOC to parse large PDFs (over 1000 pages), the parser
silently truncated processing at 300 pages due to a hardcoded default
`page_to=299` in `RAGFlowPdfParser.__images__()`. This caused:
- **Errors** on pages beyond the limit
- **Poor image quality** as the parser attempted to compensate with
missing page data
- **Inconsistent chunk splitting** between full PDF imports and partial
imports
Additionally, the codebase scattered magic numbers (`299`, `600`,
`10000`, `100000`, `100000000`, `10000000000`, `10**9`) across 22 files
as sentinel values for "parse all pages", making future maintenance
error-prone.
## Root Cause
```python
# deepdoc/parser/pdf_parser.py (before)
def __images__(self, fnm, zoomin=3, page_from=0, page_to=299, callback=None):
# Only the first 300 pages were rendered; everything beyond was silently dropped
```
While most callers in `rag/app/*.py` correctly passed `to_page=100000`,
the base class `RAGFlowPdfParser.__call__()` and `parse_into_bboxes()`
invoked `__images__` **without** forwarding `page_from`/`page_to`,
falling back to the restrictive default of 299.
## Solution
### 1. Define constants in `common/constants.py`
```python
MAXIMUM_PAGE_NUMBER = 100000 # Used by the parsing layer
MAXIMUM_TASK_PAGE_NUMBER = MAXIMUM_PAGE_NUMBER * 1000 # Used by the task/DB layer
```
### 2. Replace all hardcoded sentinel values
| Layer | Files Changed | Old Values | New Value |
|---|---|---|---|
| **Deepdoc parsers** | `pdf_parser.py`, `mineru_parser.py`,
`docling_parser.py`, `opendataloader_parser.py`, `paddleocr_parser.py`,
`docx_parser.py` | `299`, `600`, `10**9`, `100000000` |
`MAXIMUM_PAGE_NUMBER` |
| **Chunk parsers** | `naive.py`, `book.py`, `qa.py`, `one.py`,
`manual.py`, `paper.py`, `presentation.py`, `laws.py`, `resume.py`,
`email.py`, `table.py` | `100000`, `10000`, `10000000000` |
`MAXIMUM_PAGE_NUMBER` |
| **Task/DB layer** | `db_models.py`, `task_service.py`,
`document_service.py`, `file_service.py` | `100000000` |
`MAXIMUM_TASK_PAGE_NUMBER` |
### 3. Fix `parse_into_bboxes()` missing parameters
Added `from_page`/`to_page` parameters to `parse_into_bboxes()` so that
the `rag/flow/parser/parser.py` DeepDOC path no longer falls back to the
restrictive default.
## Files Changed (22)
- `common/constants.py`
- `deepdoc/parser/pdf_parser.py`
- `deepdoc/parser/mineru_parser.py`
- `deepdoc/parser/docling_parser.py`
- `deepdoc/parser/opendataloader_parser.py`
- `deepdoc/parser/paddleocr_parser.py`
- `deepdoc/parser/docx_parser.py`
- `rag/app/naive.py`
- `rag/app/book.py`
- `rag/app/qa.py`
- `rag/app/one.py`
- `rag/app/manual.py`
- `rag/app/paper.py`
- `rag/app/presentation.py`
- `rag/app/laws.py`
- `rag/app/resume.py`
- `rag/app/email.py`
- `rag/app/table.py`
- `api/db/db_models.py`
- `api/db/services/task_service.py`
- `api/db/services/document_service.py`
- `api/db/services/file_service.py`
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
---------
Signed-off-by: noob <yixiao121314@outlook.com>
### What problem does this PR solve?
The POST /upload_info?url=<url> endpoint accepted a user-supplied URL
and passed it directly to AsyncWebCrawler without any validation. There
were no restrictions on URL scheme, destination hostname, or resolved IP
address. This allowed any authenticated user to instruct the server to
make outbound HTTP requests to internal infrastructure — including RFC
1918 private networks, loopback addresses, and cloud metadata services
such as http://169.254.169.254 — effectively using the server as a proxy
for internal network reconnaissance or credential theft.
This PR adds an SSRF guard (_validate_url_for_crawl) that runs before
any crawl is initiated. It enforces an allowlist of safe schemes
(http/https), resolves the hostname at validation time, and rejects any
URL whose resolved IP falls within a private or reserved network range.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Allow image2text models (multimodal) to be used as chat models.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
The Langfuse Python SDK v3+ removed `start_generation()` method.
RagFlow's code called this non-existent method, causing AttributeError
when Langfuse tracing is enabled.
Replace all `start_generation()` calls with
`start_observation(as_type="generation")` which is the correct v4 SDK
API.
Affected files:
- api/db/services/llm_service.py (12 occurrences)
- api/db/services/dialog_service.py (1 occurrence)
Fixes#14204
Related to #9243
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
### What problem does this PR solve?
normalize think tags in final chat answer
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Before consolidation
Web API: POST /v1/document/filter
Http API - GET /api/v1/datasets/<dataset_id>/documents
After consolidation, Restful API -- GET
/api/v1/datasets/<dataset_id>/documents?type=filter
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Before consolidation
Web API: POST /v1/document/list
Http API - GET /api/v1/datasets/<dataset_id>/documents
After consolidation, Restful API -- GET
/api/v1/datasets/<dataset_id>/documents
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Trivial fix log creation, follow on PR:
https://github.com/infiniflow/ragflow/pull/14136
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fixes#6034
Changes the `size` field in both `Document` and `File` models from
`IntegerField` (32-bit, max ~2GB) to `BigIntegerField` (64-bit, max
~9.2EB), and adds corresponding database migrations.
## Problem
When uploading a file larger than 2GB, the `size` value overflows a
32-bit signed integer (max 2,147,483,647). This causes:
- The stored `size` wraps around to an incorrect value (e.g., a 3GB file
shows as 2,097,152 KB in File Management).
- Subsequent file operations (e.g., download) fail because the corrupted
size leads to invalid storage lookups.
## Changes
- `Document.size`: `IntegerField` → `BigIntegerField`
- `File.size`: `IntegerField` → `BigIntegerField`
- Added `alter_db_column_type` migrations in `migrate_db()` for both
`document.size` and `file.size` columns to ensure existing deployments
are upgraded automatically.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: noob <yixiao121314@outlook.com>
### What problem does this PR solve?
Resolve#14115 .
## Problem
On the shared chat link page (`/chats/share?shared_id=...`), querying
the knowledge base returns "no relevant information was found", while
the same query works correctly on the editor chat page.
## Root Cause
Knowledge base retrieval in `async_chat()` is gated by the check `if
"knowledge" in param_keys` (line 598), where `param_keys` is derived
from `prompt_config["parameters"]`. If `parameters` is empty or missing
the `{"key": "knowledge", "optional": false}` entry, retrieval is
entirely skipped.
This can happen because `_apply_prompt_defaults()` — which ensures
`parameters` contains the `knowledge` entry — is only called in the
`create` (POST) and `update_chat` (PUT) handlers, but **not** in
`patch_chat` (PATCH). If a chat's `prompt_config` was updated via PATCH
without including `parameters`, the `knowledge` entry would be absent.
Additionally, `prompt_config["parameters"]` would raise a `KeyError` if
the key was missing entirely.
## Fix
Added a defensive safety net in `async_chat()`
(`api/db/services/dialog_service.py`) that auto-injects the `knowledge`
parameter when:
- `dialog.kb_ids` is set (knowledge bases are configured)
- `"knowledge"` is not already in `param_keys`
- `{knowledge}` placeholder exists in the system prompt
Also changed `prompt_config["parameters"]` to
`prompt_config.get("parameters", [])` to prevent `KeyError` when the key
is absent.
## Files Changed
- `api/db/services/dialog_service.py` — added auto-injection of
`knowledge` parameter and safe `.get()` access for `parameters`
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: noob <yixiao121314@outlook.com>
## What's the problem
Both `async_chat()` and `async_ask()` call `decorate_answer()` to build
the final SSE payload — it inserts citation markers (`##N$$`) into the
answer text and prunes `doc_aggs` to only the cited documents.
Immediately after, both functions overwrite `final["answer"]` with `""`:
```python
# async_chat(), line ~774 (issue #13828)
final = decorate_answer(thought + full_answer)
final["final"] = True
final["audio_binary"] = None
final["answer"] = "" # discards decorated text
yield final
# async_ask(), line ~1444 (same bug, different path)
final = decorate_answer(full_answer)
final["final"] = True
final["answer"] = "" # discards decorated text
yield final
```
The client receives filtered references (built for a citation-decorated
answer it never sees) while displaying the raw, undecorated streaming
text. Citations can never match.
## Root cause
`final["answer"] = ""` was left over from an earlier design where
clients were meant to reconstruct the full answer purely from delta
events. Once `decorate_answer()` started placing citation markers, this
blank-out broke the contract: the final event is where the decorated
answer should land.
## Fix
Remove the two blank-override lines — one in `async_chat()`, one in
`async_ask()`:
```diff
- final["answer"] = ""
```
`decorate_answer()` already sets `final["answer"]` to the correct
decorated string; there is nothing to override.
## Relation to #13828
Issue #13828 and PR #13835 identify the bug in `async_chat()`. This PR
absorbs that fix and also corrects the identical pattern in
`async_ask()` (used by the `/retrieval` route in `chat_api.py`), which
PR #13835 does not touch.
## Regression test
Added
`test/unit_test/api/db/services/test_dialog_service_final_answer.py`
with three tests:
| Test | Purpose |
|------|---------|
| `test_buggy_pattern_drops_answer` | Documents the old behaviour:
blank-override empties the final answer |
| `test_fixed_pattern_preserves_decorated_answer` | Core invariant:
final event carries the decorated text from `decorate_answer()` |
| `test_final_event_reference_matches_decorated_result` | Citation
markers in the answer must match the pruned `doc_aggs` in the same event
|
Local run result:
```
test_dialog_service_final_answer.py::test_buggy_pattern_drops_answer PASSED
test_dialog_service_final_answer.py::test_fixed_pattern_preserves_decorated_answer PASSED
test_dialog_service_final_answer.py::test_final_event_reference_matches_decorated_result PASSED
3 passed in 0.04s
```
`ruff check` passes with no issues on all changed files.
---------
Co-authored-by: edenfunf <edenfunf@gmail.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
## What problem does this PR solve?
Add a warning log when `get_flatted_meta_by_kbs` returns 10,000 results,
which indicates the query limit has been reached and metadata may be
silently truncated.
## Type of change
- [x] Improvement (non-breaking change which improves observability)
### What problem does this PR solve?
Fixes#14051.
The chat UI already sends an `internet` flag with each request, but the
backend previously triggered Tavily web retrieval whenever
`prompt_config.tavily_api_key` was configured. As a result, web search
could still run even when the internet toggle was off.
This PR makes web search an explicit opt-in at request time:
- `tavily_api_key` only indicates that web search is available
- Tavily retrieval runs only when `internet` is explicitly enabled
- the same behavior now applies to both the normal retrieval path and
the deep-research / reasoning path
This also fixes the no-KB fallback case so chats without KBs fall back
to normal solo chat when `internet` is off.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Closes#13907
The template catalog had duplicate files (e.g. `*_r.json`) only to place
the same template into multiple sidebar groups.
This increases maintenance cost and makes template updates error-prone.
This PR adds first-class support for multiple template categories in a
single file via `canvas_types`, then removes duplicate template files.
What changed:
- Added `canvas_types` to `CanvasTemplate` model and DB migration.
- Added normalization logic when loading templates:
- accepts legacy `canvas_type`
- accepts new `canvas_types`
- merges/deduplicates values
- preserves backward compatibility by keeping `canvas_type` as first
normalized value.
- Updated template import flow to load only `.json` files and in stable
sorted order.
- Updated frontend template filtering to match on `canvas_types` first,
with fallback to legacy `canvas_type`.
- Consolidated duplicated template pairs into single files and removed:
- `deep_search_r.json`
- `reflective_academic_paper_generator_r.json`
- `seo_article_writer_r.json`
- Added regression/edge-case tests for category normalization and route
serialization expectations.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
## Summary
Fixes#13996
Replace `json.load(open(...))` with `with open(...) as f: json.load(f)`
in two files to ensure file descriptors are properly closed.
**Affected files:**
- `common/doc_store/infinity_conn_base.py` — schema loading for Infinity
doc store
- `api/db/init_data.py` — agent template loading at startup
## Why this matters
In a long-running server process like RAGFlow, leaked file descriptors
from `json.load(open(...))` can accumulate over time. While CPython's
refcounting usually cleans these up, it's not guaranteed (especially
under memory pressure or with alternative Python runtimes), and can lead
to `OSError: [Errno 24] Too many open files`.
## Test plan
- [ ] Verify Infinity doc store schema loading still works correctly
- [ ] Verify agent templates load correctly on startup
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Refactor**
* Improved file handling in internal data processing to ensure proper
resource cleanup.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Co-authored-by: easonysliu <easonysliu@tencent.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
### What problem does this PR solve?
As title
### Type of change
- [x] Refactoring
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Chores**
* Improved authentication error logging to better distinguish between
JWT and API token failures.
* Enhanced code documentation with clarifying comments for better
maintainability.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Jin Hai <haijin.chn@gmail.com>
### What problem does this PR solve?
Feat: enable sync deleted files for connector
1. first comes with github
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added "sync deleted files" feature for data sources, enabling
automatic removal of files deleted from the source system.
* Added multilingual support for the new sync deleted files setting
across multiple languages.
* **UI Improvements**
* Improved checkbox form field rendering and layout.
* Enhanced full-width display for authentication token input fields.
### What problem does this PR solve?
Implements automatic adjustment of knowledge base chunk recall weights
based on user feedback (upvotes/downvotes). When users upvote or
downvote a response, the system locates the corresponding knowledge
snippets and adjusts their recall weight to improve future retrieval
quality.
**Closes #12670**
**How it works:**
1. User upvotes/downvotes a response via `POST /thumbup`
2. System extracts chunk IDs from the conversation reference
3. For each referenced chunk:
- Reads current `pagerank_fea` value from document store
- Increments (+1) for upvote or decrements (-1) for downvote
- Clamps weight to [0, 100] range
- Updates chunk in ES/Infinity/OceanBase
4. Future retrievals score these chunks higher/lower based on
accumulated feedback
**Files changed:**
- `api/db/services/chunk_feedback_service.py` - New service for updating
chunk pagerank weights
- `api/apps/conversation_app.py` - Integrated feedback service into
thumbup endpoint
- `test/testcases/test_web_api/test_chunk_feedback/` - Unit tests
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Chat message feedback now updates per-chunk relevance weights
(feature-flag gated), with configurable weighting and atomic updates
across storage backends.
* **Bug Fixes**
* Stricter validation for message feedback inputs and more robust
handling of feedback transitions.
* **Tests**
* Expanded test coverage for chunk-feedback behavior, weighting
strategies, storage backends, and thumb-flip scenarios.
* **Chores**
* CI workflow extended to run the new chunk-feedback web API tests.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: mkdev11 <YOUR_GITHUB_ID+MkDev11@users.noreply.github.com>
Co-authored-by: mkdev11 <MkDev11@users.noreply.github.com>
### What problem does this PR solve?
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
---------
Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
### What problem does this PR solve?
fixes issue #13799 where team members get model not authorized when
running RAG on an admin-shared knowledge base after the admin changes
the KB embedding model (for example to bge-m3).
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Fix
migrate_add_unique_email-silently-skips-unique-constraint-when-non-unique-user_email-index-exists.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
1. Split dataset api to gateway and service, and modify web UI to use
restful http api.
2. Old KB releated APIs are commented.
### Type of change
- [x] Refactoring
---------
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
### What problem does this PR solve?
Fixes [#13505](https://github.com/infiniflow/ragflow/issues/13505): Jira
incremental sync could miss updated issues after initial sync,
especially near time boundaries.
Root cause:
- Jira JQL uses minute-level precision for `updated` filters.
- Incremental windows had no overlap buffer, so boundary updates could
be skipped.
- Sync log cursor tracking used a backward-facing update for
`poll_range_start`.
- Existing-doc updates in `upload_document` lacked a KB ownership guard
for doc-id collisions.
What changed:
- Added Jira incremental overlap buffer (`time_buffer_seconds`,
defaulting to `JIRA_SYNC_TIME_BUFFER_SECONDS`) when building JQL
lower-bound time.
- Preserved second-level post-filtering to avoid duplicate reprocessing
while still catching boundary updates.
- Improved Jira sync logging to include start/end window and overlap
configuration.
- Updated sync cursor tracking in `increase_docs` to keep
`poll_range_start` moving forward with max update time.
- Added KB ID safety check before updating existing document records in
`upload_document`.
Verification performed:
- Python syntax compile checks passed for modified files.
- Manual verification flow:
1. Run full Jira sync.
2. Edit an already-indexed Jira issue.
3. Run next incremental sync.
4. Confirm updated content is re-ingested into KB.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
### What problem does this PR solve?
using builtin model when parsing gave an error because it expects
fid==builtin. split_model_name_and_factory returns id=None. pr allows
the model to be accepted wheter with or without @Builtin
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Export Agent Logs.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: balibabu <assassin_cike@163.com>
### What problem does this PR solve?
1. Split dataset api to gateway and service, and modify web UI to use
restful http api.
2. Old KB releated APIs are commented.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
1. Split dataset api to gateway and service, and modify web UI to use
restful http api.
2. Old KB releated APIs are commented.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Fix: model selecton rule in get_model_config_by_type_and_name
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: Modify the style of the release confirmation box.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
Co-authored-by: balibabu <assassin_cike@163.com>
Co-authored-by: 6ba3i <isbaaoui09@gmail.com>
### What problem does this PR solve?
Get user_id from canvas and record it.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
## Summary
Fixes#13544: PostgreSQL startup crash because
`update_tenant_llm_to_id_primary_key()` unconditionally uses
MySQL-specific SQL.
- Split `update_tenant_llm_to_id_primary_key()` into
`_update_tenant_llm_to_id_primary_key_mysql()` and
`_update_tenant_llm_to_id_primary_key_postgres()`, dispatching on
`settings.DATABASE_TYPE`
- MySQL path: unchanged (existing `DATABASE()`, `SET @row = 0`,
`AUTO_INCREMENT`, `DROP PRIMARY KEY` logic)
- PostgreSQL path: uses `current_database()`, `ROW_NUMBER() OVER (ORDER
BY ...)` for sequential IDs, `CREATE SEQUENCE` + `nextval()` for
auto-increment, and `information_schema.table_constraints` to find the
PK constraint name
- Also fix `migrate_add_unique_email()`: MySQL-only
`information_schema.statistics` is replaced with `pg_indexes` on
PostgreSQL
## Test plan
- [ ] Start RAGFlow with `DB_TYPE=postgres` — startup should complete
without `function database() does not exist` error
- [ ] Start RAGFlow with `DB_TYPE=mysql` (default) — existing behaviour
unchanged, migration runs as before
- [ ] Fresh PostgreSQL install: verify `tenant_llm.id` column is created
as a serial primary key after migration
- [ ] Idempotency: running migration twice on PostgreSQL should be a
no-op (column already exists check passes)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: gambletan <gambletan@github>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>