Commit Graph

5699 Commits

Author SHA1 Message Date
b7744e053e fix: support dense_vector from ES fields response (ES 9.x compatibility) (#13972)
fix: support dense_vector from ES fields response (ES 9.x compatibility)

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Configuration Chore (non-breaking change which updates
configuration)


## Summary by CodeRabbit

* **Bug Fixes**
* More accurate handling and unwrapping of dense-vector fields so
returned values have correct shapes.
* Field selection reliably limits returned data and falls back to
alternate result locations when needed.
* Use of consistent result IDs and tolerant handling when score values
are missing.

* **Chores / Configuration**
* Increased build memory and adjusted build-time flags for the frontend
build.
* Simplified runtime model/GPU checks and removed an automated runtime
GPU-install attempt.

* **Build Fixes**
* `web/vite.config.ts`: make `build.minify` and `build.sourcemap`
respect `VITE_MINIFY` and `VITE_BUILD_SOURCEMAP` env vars from
Dockerfile instead of hardcoding `terser` and `true`.

* **Environment**
* Allow stack version override and default the runtime image tag to
"latest".

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Correct unwrapping of dense-vector fields and reliable field selection
with fallback locations.
* Consistent use of hit-level IDs and tolerant handling when score
values are missing.

* **Chores / Configuration**
* Increased frontend build memory and added build-time minify/sourcemap
flags; build minification and sourcemap now configurable.
* Removed runtime GPU detection for model initialization; force CPU
initialization.

* **Environment**
* Allow stack version override and default runtime image tag to
"latest".

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 17:44:13 +08:00
107fe6cf90 Feat: support doc for pipeline parser in word (#14005)
### What problem does this PR solve?

Feat: support doc for pipeline parser in word

### Type of change

- [x] New Feature (non-breaking change which adds functionality)



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added support for processing legacy Word `.doc` file formats,
extending document compatibility.

* **Bug Fixes**
* Enhanced error handling during document parsing to improve reliability
and prevent processing failures.
2026-04-09 16:40:42 +08:00
8d52ef2893 Feat: enable sync deleted files for connector (#14000)
### What problem does this PR solve?

Feat: enable sync deleted files for connector
1. first comes with github

### Type of change

- [x] New Feature (non-breaking change which adds functionality)



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added "sync deleted files" feature for data sources, enabling
automatic removal of files deleted from the source system.
* Added multilingual support for the new sync deleted files setting
across multiple languages.

* **UI Improvements**
  * Improved checkbox form field rendering and layout.
  * Enhanced full-width display for authentication token input fields.
2026-04-09 16:40:14 +08:00
577c96bf2a Refactor: Merge document update API (#13962)
### What problem does this PR solve?

Refactor: merge document.rename into document.update_document

### Type of change

- [x] Refactoring


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added a unified document update API (PUT) supporting name, metadata,
parser/chunk settings, and status changes.

* **Breaking Changes**
* Legacy single-parameter rename endpoint removed; renames now require
dataset + document identifiers.
  * `/list` now reads dataset id from a different query parameter.

* **Validation / Bug Fixes**
* Stricter meta_fields and parser-config validation; unauthenticated
requests return 401.

* **Frontend**
  * UI now sends dataset id when saving document names.

* **Tests**
* Numerous unit and HTTP tests adjusted or removed to match new API and
validations.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: MkDev11 <94194147+MkDev11@users.noreply.github.com>
Co-authored-by: mkdev11 <YOUR_GITHUB_ID+MkDev11@users.noreply.github.com>
Co-authored-by: mkdev11 <MkDev11@users.noreply.github.com>
Co-authored-by: Qi Wang <wangq8@outlook.com>
Co-authored-by: dataCenter430 <161712630+dataCenter430@users.noreply.github.com>
Co-authored-by: balibabu <cike8899@users.noreply.github.com>
2026-04-09 11:17:38 +08:00
c13f8856a1 fix: correct typos in agent component filename and templates (#13930)
## Summary
- Rename misspelled file `varaiable_aggregator.py` →
`variable_aggregator.py`
- Fix `unkown` → `unknown` in template and frontend constant (3
instances)
- Fix `Finale` → `Final` in customer feedback template (2 instances)

## Test plan
- [ ] Verify variable aggregator component loads correctly
- [ ] Verify agent templates render properly

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: yuj <yuj@ztjzsoft.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-04-09 11:06:01 +08:00
dbfb439239 Feat: migrate script (#13976)
### What problem does this PR solve?

Add stage for migrate tenant_llm data into table tenant_model_instance
and tenant_model.

### Type of change

- [x] Other (please describe): tool script


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Added two new migration stages to move tenant model and instance
records into new target tables, with dry-run, full-execute, and "create
table only" modes; migration skips already-migrated rows to avoid
duplicates.
* **Bug Fixes**
  * Cleaned up migration header logging for clearer output.
* **Documentation**
* Added usage guide describing stages, options, modes, config format,
examples, and expected logs.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-09 11:03:39 +08:00
c5871c1078 Fix: dsl import/export (#13992)
### What problem does this PR solve?

Fix: dsl import/export
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Enhanced JSON import functionality for agents to automatically
populate components from imported graph structures.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-09 10:55:22 +08:00
82fa85c837 Implement Delete in GO and refactor functions (#13974)
### What problem does this PR solve?

Implement Delete in GO and refactor functions

### Type of change

- [x] Refactoring

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added a remove_chunks command to delete specific or all chunks from a
document.
  * Added new endpoints for chunk removal and chunk update.

* **Refactor**
* Renamed index commands to dataset/metadata table terminology and
updated REST routes accordingly.
* Updated chunk update flow to a JSON POST style and improved metadata
error messages.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
2026-04-09 09:52:31 +08:00
3b7723855c Fix: revert xgboost version to 1.6.0 (#13984)
### What problem does this PR solve?

Revert xgboost version to 1.6.0

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
  * Updated xgboost dependency from version 3.2.0 to 1.6.0

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 19:53:47 +08:00
5fe6f7c9ac Go CLI: Add list configs and set log level command (#13983)
### What problem does this PR solve?

1. list configs
2. set log level debug/info/warn/error/fatal/panic

```

RAGFlow(user)> list configs;
+--------------------+-----------------------+
| key                | value                 |
+--------------------+-----------------------+
| redis_host         | localhost:6379        |
| doc_engine         | elasticsearch         |
| elasticsearch_host | http://localhost:1200 |
| log_level          | info                  |
| database           | mysql                 |
| database_host      | localhost:3306        |
| admin              | 0.0.0.0:9383          |
| storage_engine     | minio                 |
| minio_host         | localhost:9000        |
+--------------------+-----------------------+
```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Added `LIST CONFIGS` command to view system configuration details
(Redis, database, log level, storage engine, and host settings).
* Added `SET LOG LEVEL` command to adjust logging verbosity at runtime.

* **Improvements**
* Enhanced log level configuration defaults and runtime state
management.
* Reorganized token management and system endpoints under `/system/`
routes for better API organization.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-08 19:32:53 +08:00
86900dca99 Refactor: Remove unused API code (#13978)
### What problem does this PR solve?

Refactor: Remove unused API code

### Type of change


- [x] New Feature (non-breaking change which adds functionality)



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Style**
* Updated table header styling in dataset settings by removing a
hard-coded background color class, allowing the header to use default or
inherited component styling instead.

* **Refactor**
* Removed token management endpoints from the API service. Token
creation, listing, and removal functions are no longer available.
  * Removed the statistics data endpoint from available API routes.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 18:46:08 +08:00
c0c3287af4 Fix: Error message: Use 'const' instead. (#13982)
### What problem does this PR solve?

Fix: Linter error message: Use 'const' instead.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Updated variable declarations across form components, agent utilities,
memory management hooks, and data handling functions to enhance code
consistency and maintainability throughout the application codebase.

* **Style**
* Added ESLint suppressions to document intentional constant-condition
patterns in asynchronous event streaming operations.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 18:13:14 +08:00
3064895bbb Fix: import error in sandbox provider (#13971)
### What problem does this PR solve?

Fix import error in sandbox provider.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Updated internal configuration import mechanism for sandbox provider
initialization. No end-user impact.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 15:35:30 +08:00
fa75aee3b9 Refactor system API (#13958)
### What problem does this PR solve?

- ping
- token
- log level

### Type of change

- [x] Refactoring


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Refactor**
* System endpoints consolidated under /api/v1/system: ping, health
check, and token management moved to the centralized API surface.
* Token management unified at /api/v1/system/tokens with
list/create/delete behavior.

* **Documentation**
  * API reference updated to reflect the new /api/v1/system paths.

* **Tests**
* Client fixtures and test utilities updated to use
/api/v1/system/tokens; one unit test for health/oceanbase status
removed.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-08 15:26:18 +08:00
ad789f5c43 Fix list files (#13960)
### What problem does this PR solve?

As title.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Standardized the query parameter used when listing documents so
listings behave consistently across the web and client interfaces.
* Clarified the error message shown when a required dataset ID is
missing to give clearer guidance to users.

* **Tests**
* Updated test coverage to reflect the standardized dataset identifier
usage.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-08 13:38:30 +08:00
b8764cfa11 Fix: The document management table cannot be displayed. (#13967)
### What problem does this PR solve?

Fix: The document management table cannot be displayed.
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Improved table layout and overflow behavior in the files view to
ensure proper scrolling and display.

* **Chores**
* Removed unused system status functionality and cleaned up service
methods.
  * Updated TypeScript configuration for compatibility.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 11:37:27 +08:00
62a1333cf2 Feat: expose parent-child chunking configuration via HTTP API and Python SDK (#13940)
…
### What problem does this PR solve?

Closes #13857

Parent-child chunking was introduced in v0.23.0 but is only configurable
through the web UI. Users managing datasets programmatically cannot
enable it via the HTTP API or Python SDK because `ParserConfig` uses
`extra="forbid"`, rejecting the `children_delimiter` field at
validation.

### What does this PR change?

Adds a `parent_child` nested config to `ParserConfig`, following the
same pattern as `raptor` and `graphrag`:

```json
"parser_config": {
  "parent_child": {
    "use_parent_child": true,
    "children_delimiter": "\n"
  }
}
```

- api/utils/validation_utils.py — new ParentChildConfig model, added to
ParserConfig
- api/utils/api_utils.py — naive defaults + flatten to
children_delimiter for the execution layer
- api/apps/services/dataset_api_service.py — flatten on the update path
- test/testcases/configs.py — updated DEFAULT_PARSER_CONFIG
-
test/testcases/test_http_api/test_dataset_management/test_create_dataset.py
— 4 valid + 2 invalid test cases

No changes to the execution layer (rag/app/naive.py, rag/nlp/search.py).
Existing UI flow via ext is unaffected.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added parent-child chunking configuration for dataset creation and
updates with new `use_parent_child` toggle and customizable
`children_delimiter` setting to specify how parent chunks are split into
child chunks.

* **Documentation**
* Updated HTTP and Python API references with parent-child chunking
configuration details and examples.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 11:36:57 +08:00
0ced071a0b Use uv run python3 x.py instead of uv run x.py (#13966)
### Use uv run python3 x.py instead of uv run x.py

When directly call `uv run x.py` it will use the python in shebang, it
does not work if the default python lack of some packages, so change it
to best practices `uv run python3 x.py`

### Type of change

- [x] Documentation Update

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **Documentation**
* Updated development setup instructions across all README files
(English and multiple language translations) to use explicit Python
interpreter invocation for the dependency download command.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 10:33:46 +08:00
cfee2bc9db feat: Auto-adjust chunk recall weights based on user feedback (#12689)
### What problem does this PR solve?

Implements automatic adjustment of knowledge base chunk recall weights
based on user feedback (upvotes/downvotes). When users upvote or
downvote a response, the system locates the corresponding knowledge
snippets and adjusts their recall weight to improve future retrieval
quality.

**Closes #12670**

**How it works:**
1. User upvotes/downvotes a response via `POST /thumbup`
2. System extracts chunk IDs from the conversation reference
3. For each referenced chunk:
   - Reads current `pagerank_fea` value from document store
   - Increments (+1) for upvote or decrements (-1) for downvote
   - Clamps weight to [0, 100] range
   - Updates chunk in ES/Infinity/OceanBase
4. Future retrievals score these chunks higher/lower based on
accumulated feedback

**Files changed:**
- `api/db/services/chunk_feedback_service.py` - New service for updating
chunk pagerank weights
- `api/apps/conversation_app.py` - Integrated feedback service into
thumbup endpoint
- `test/testcases/test_web_api/test_chunk_feedback/` - Unit tests

### Type of change

- [x] New Feature (non-breaking change which adds functionality)


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Chat message feedback now updates per-chunk relevance weights
(feature-flag gated), with configurable weighting and atomic updates
across storage backends.

* **Bug Fixes**
* Stricter validation for message feedback inputs and more robust
handling of feedback transitions.

* **Tests**
* Expanded test coverage for chunk-feedback behavior, weighting
strategies, storage backends, and thumb-flip scenarios.

* **Chores**
  * CI workflow extended to run the new chunk-feedback web API tests.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: mkdev11 <YOUR_GITHUB_ID+MkDev11@users.noreply.github.com>
Co-authored-by: mkdev11 <MkDev11@users.noreply.github.com>
2026-04-08 09:52:18 +08:00
4a2a17c27a Fix typos (#13961)
### What problem does this PR solve?

as title.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
  * Internal code quality improvements with no user-facing changes.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 23:16:52 +08:00
931021875a Refactor system/version API to RESTful style (#13956)
### What problem does this PR solve?

Refactor version API to RESTful style. Python and go server API also
updated.
### Type of change

- [x] Refactoring



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Release Notes

* **Refactor**
* Migrated core API endpoints to the `/api/v1/` namespace for improved
consistency and organization.
* Standardized system version, search, and chat list endpoints under the
new API versioning structure.

* **New Features**
* Added MinIO region configuration support, allowing specification of
storage engine regional settings via environment variables or
configuration files.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 19:07:47 +08:00
bc8d67ce78 feat: add region parameter support to MinIO connection (#13954)
## Summary
- Add optional `region` parameter to `Minio()` client constructor in
`rag/utils/minio_conn.py`
- Reads from `MINIO.region` in settings, defaults to `None` when not
configured
- Required by some S3-compatible storage services (e.g., AWS S3, Tencent
COS) for proper bucket access

## Motivation
When using RAGFlow with S3-compatible storage that requires a region
(such as AWS S3 or Tencent Cloud COS), the MinIO client fails to access
buckets because the `region` parameter is not passed through.

The `Minio()` Python client already supports the `region` parameter
natively — this PR simply wires it up from the RAGFlow configuration.

## Changes
- `rag/utils/minio_conn.py`: Pass `region=settings.MINIO.get("region",
None) or None` to `Minio()` constructor

## Backward Compatibility
- No breaking changes. When `region` is not configured, it defaults to
`None`, preserving the existing behavior exactly.

## Test Plan
- [ ] Verified with MinIO (no region set) — works as before
- [x] Verified with S3-compatible storage requiring region — bucket
access succeeds

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Enhanced MinIO client initialization with regional configuration
support for improved compatibility with region-specific deployments.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Jarry Wang <code-better-life@users.noreply.github.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 16:38:23 +08:00
68f665be7a CLI: Add float parsing (#13955)
### What problem does this PR solve?

Add float parsing

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 15:09:45 +08:00
393efa9b7c Refactor variable of front end (#13953)
### What problem does this PR solve?

api_host -> webAPI
ExternalApi -> restAPIv1

### Type of change

- [x] Refactoring


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Updated internal API endpoint configuration to use consolidated base
URL constants for improved maintainability and consistency across the
application.

* **Chores**
* Updated server-side protocol validation for admin connectivity checks.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 15:08:11 +08:00
38acf34724 Fix: The agent selected a knowledge base, but the API returned the error: "No dataset is selected". (#13950)
### What problem does this PR solve?

Fix: The agent selected a knowledge base, but the API returned the
error: "No dataset is selected".

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: balibabu <assassin_cike@163.com>
2026-04-07 14:16:37 +08:00
fa08fa2a17 docs: fix broken internal links in guides (#13935)
### What problem does this PR solve?

This fixes two broken internal documentation links in the guides:

- `docs/develop/mcp/launch_mcp_server.md` linked
`./acquire_ragflow_api_key.md`, but the target page lives one level up
as `../acquire_ragflow_api_key.md`.
- `docs/guides/dataset/run_retrieval_test.md` linked
`./construct_knowledge_graph.md`, but the actual page lives under
`./advanced/construct_knowledge_graph.md`.

These broken links make it harder to follow the MCP and retrieval-test
docs from the local docs tree.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2026-04-07 14:01:12 +08:00
9ac5d28f06 Refactor context command (#13952)
### What problem does this PR solve?

Refactor context search command

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 13:59:27 +08:00
424aee5bec fix: correct typos in code comments, docstrings and docs (#13931)
## Summary
- Fix `a image` → `an image` in README and log message
- Fix `colomn` → `column` in table structure recognizer comment
- Fix `formated` → `formatted` in confluence connector docstring
- Fix `tabel of content` → `table of contents` in TOC prompt

## Test plan
- [ ] Documentation and comment changes, no functional impact

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: yuj <yuj@ztjzsoft.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 13:05:39 +08:00
29cf8aba48 fix: correct typos in locale files and search hooks (#13932)
## Summary
- Fix `Refrence` → `Reference` in zh, id, zh-traditional locale files
(en.ts already correct)
- Fix `from from` → `from` and `this files` → `this file` in en.ts
- Fix variable name `reponse` → `response` in search hooks

## Test plan
- [ ] Verify UI strings display correctly
- [ ] Verify search functionality works with renamed variable

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: yuj <yuj@ztjzsoft.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 12:26:25 +08:00
112007243d Refa: refine code_exec component (#13925)
### What problem does this PR solve?

Refine code_exec component.

### Type of change

- [x] Refactoring
2026-04-07 11:48:29 +08:00
c4b0aaa874 Fix: #6098 - Add validation logic for parser_config when update document (#13911)
### What problem does this PR solve?

Add validation logic for parser_config.
Refactor the processing flow. Before change, validation logics and
update logics are mixed up - some validation logis executes followed by
some update logic executes and then another such
"validation-and-then-update" which is not good. After change, all
validation logic executes firstly. Update logic will be executed after
ALL validation logic executed.
Validation logic for parameters (that come from front end) will be
checked using Pydantic. For validation logic that depends on data from
DB, they will be in separate methods.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2026-04-07 11:33:05 +08:00
5673245134 Refactor context command (#13948)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-07 11:30:09 +08:00
ff27ce86d6 fix: gpt-5 name-based config clearing from base chat path (#13949)
### What problem does this PR solve?

fix #13944 where OpenAI-compatible custom endpoints failed verification
when model names contained `gpt-5` becauser of incorrect name-based
handling in the Base/backend=`base` path.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-07 11:24:47 +08:00
a0be7c7ca7 Fix(connector): expose id_column, timestamp_column, metadata_columns for MySQL/PostgreSQL incremental sync (#13849)
### What problem does this PR solve?
The MySQL and PostgreSQL sync classes in `sync_data_source.py` were not
passing `id_column`, `timestamp_column`, and `metadata_columns` to
`RDBMSConnector`,
making incremental sync and document update impossible even when
configured.
   
- Without `id_column`: updated records generate new documents instead of
overwriting existing ones (doc ID is derived from content hash, so any
change produces a new ID).
- Without `timestamp_column`: `poll_source` always falls back to full
sync,
ignoring the configured time range.
- The three fields existed in the frontend default values but had no
form
inputs, so users had no way to fill them in.
### Type of change
  - [x] Bug Fix (non-breaking change which fixes an issue)        
  - [x] New Feature (non-breaking change which adds functionality)

### Changes
   
- **Backend** (`rag/svr/sync_data_source.py`): pass `id_column`,
    `timestamp_column`, and `metadata_columns` from `self.conf` to
`RDBMSConnector` for both `MySQL` and `PostgreSQL` sync classes.
- **Frontend**
(`web/src/pages/user-setting/data-source/constant/index.tsx`):
add `ID Column`, `Timestamp Column`, and `Metadata Columns` form fields
    to MySQL and PostgreSQL data source configuration UI with tooltips.

Signed-off-by: lixintao <lixintao@uniontech.com>
Co-authored-by: lixintao <lixintao@uniontech.com>
2026-04-07 10:24:30 +08:00
49386bc1b5 Implement UpdateDataset and UpdateMetadata in GO (#13928)
### What problem does this PR solve?

Implement UpdateDataset and UpdateMetadata in GO

Add cli:
UPDATE CHUNK <chunk_id> OF DATASET <dataset_name> SET <update_fields>
REMOVE TAGS 'tag1', 'tag2' from DATASET 'dataset_name';
SET METADATA OF DOCUMENT <doc_id> TO <meta>


### Type of change

- [ ] Refactoring
2026-04-07 09:44:51 +08:00
60ec5880e5 Feat: mysql data migrate script (#13927)
### What problem does this PR solve?

Add a script to migrate data in tenant_llm into tenant_model_provider.

### Type of change

- [x] Other (please describe): tool script.
2026-04-03 20:01:37 +08:00
69264b3a70 Feat: Refact pipeline (#13826)
### What problem does this PR solve?

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring

---------

Co-authored-by: Zhichang Yu <yuzhichang@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-03 19:26:45 +08:00
6d9430a125 Add think chat to CLI (#13922)
### What problem does this PR solve?

Now user can use 'think mode' to chat with LLM

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-03 18:11:23 +08:00
e518c20736 Update README (#13924)
### Type of change

- [x] Documentation Update
2026-04-03 17:29:48 +08:00
35b2a714f9 Fix: tag datasets not visible in tag sets dropdown (#13921)
## Problem Description

When a user creates Dataset A using the **Tag parser** (for CSV/Excel
files with tag definitions), and then creates Dataset B, the Tag Sets
dropdown in Dataset B's Configuration page cannot display Dataset A.

### Steps to Reproduce
1. Create Dataset A with **Tag** as the chunking method
2. Upload a CSV file to Dataset A to generate tags
3. Create Dataset B
4. Navigate to Dataset B → Configuration → Tag Sets
5. **Expected**: Dataset A should appear in the dropdown
6. **Actual**: The dropdown is empty, Dataset A is not visible

---

## Root Cause Analysis

After thorough code review, **the original code logic is correct**. The
`chunk_method` field flows properly through the system:

### Data Flow

```mermaid
sequenceDiagram
    participant Frontend
    participant Pydantic
    participant API
    participant Database

    Note over Frontend,Database: Creating a Tag Dataset
    Frontend->>Pydantic: POST {chunk_method: "tag"}
    Pydantic->>API: serialization_alias converts<br/>chunk_method → parser_id
    API->>Database: INSERT {parser_id: "tag"}

    Note over Frontend,Database: Querying Datasets
    Frontend->>API: GET /api/v1/datasets
    API->>Database: SELECT parser_id, ...
    Database-->>API: Returns {parser_id: "tag"}
    API->>API: remap_dictionary_keys()<br/>parser_id → chunk_method
    API-->>Frontend: {chunk_method: "tag"}

    Note over Frontend: Filter: x.chunk_method === 'tag'
    Note over Frontend:  Match found!
```

### Field Mapping

**Location**: `api/utils/api_utils.py:657-662`
```python
DEFAULT_KEY_MAP = {
    "chunk_num": "chunk_count",
    "doc_num": "document_count",
    "parser_id": "chunk_method",  # Maps DB field to API response
    "embd_id": "embedding_model",
}
```

### Frontend Filtering (Already Correct)

**Location**:
`web/src/pages/dataset/dataset-setting/components/tag-item.tsx:24`
```typescript
const knowledgeOptions = knowledgeList
  .filter((x) => x.chunk_method === 'tag')  //  Correct field
  .map((x) => ({...}));
```

---

## Actual Issue

The most likely causes for the "bug" are:

1. **Browser Cache**: Old data cached before proper deployment
2. **Stale Data**: Datasets created before the code was fully deployed
3. **Container Not Restarted**: Changes not applied to running container

---

## Resolution

**No code changes are needed.** The existing code correctly:

1. Accepts `chunk_method` from frontend
2. Converts to `parser_id` via Pydantic serialization_alias
3. Stores in database as `parser_id`
4. Maps back to `chunk_method` in API response
5. Frontend filters by `chunk_method === 'tag'`
2026-04-03 17:29:10 +08:00
0b724be521 chore(templates): Update the customer feedback dispatcher template (#13919)
### What problem does this PR solve?
Update the customer feedback dispatcher template and introduce a new
operator `Variable Aggregator`.

### Type of change

- [x] Other (please describe): Template change

---------

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-04-03 16:51:39 +08:00
5b43c7cf16 Feat: Place the language configuration in web/.env for easy user configuration. (#13920)
### What problem does this PR solve?

Feat: Place the language configuration in web/.env for easy user
configuration.

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-04-03 16:50:18 +08:00
354108922b fix: use f-string with separator in switch operator error message (#13915)
\`switch.py\` line 137 concatenates the operator directly after the text
without separator:
\`'Not supported operator' + operator\` → produces \`"Not supported
operatorXXX"\`

Changed to: \`f'Not supported operator: {operator}'\`
2026-04-03 16:49:28 +08:00
21af67f6f9 feat(File Management): Refactor File List API and Add Knowledge Base Document Initialization (#13914)
### What problem does this PR solve?

feat(File Management): Refactor File List API and Add Knowledge Base
Document Initialization

- Migrate the file list API endpoint from `/v1/file/list` to
`/api/v1/files` to align with the Python implementation.
- Add logic for initializing knowledge base documents; automatically
create the `.knowledgebase` folder and associated documents when
retrieving the root directory.
- Enhance parameter validation and error handling, including the
introduction of a new `CodeParamError` error code.
- Optimize the file list response structure to match the implementation
on the Python side.
- Update the Vite configuration to support proxying the new
`/api/v1/files` endpoint.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-04-03 15:08:43 +08:00
6263857c1e Agent templates regrouped and renamed (#13873)
### What problem does this PR solve?

Regrouped and renamed agent templates to increase user engagement.

### Type of change


- [x] Refactoring
2026-04-03 13:43:25 +08:00
ab358fe949 feat: make Azure cloud authority configurable for SPN auth (#13898)
## Summary
- The Azure SPN storage handler hardcoded
`AzureAuthorityHosts.AZURE_CHINA`, preventing users in Azure Public
Cloud regions (UK-South, EU, US, etc.) from authenticating
- Add a `cloud` config option (env: `AZURE_CLOUD`) supporting all four
Azure sovereignties: `public`, `china`, `government`, `germany`
- Defaults to `public` (global Azure) — the most common international
use case

Closes #13259

## Test plan
- [ ] Verify default (`cloud: public`) connects to Azure Public Cloud
endpoints
- [ ] Verify `cloud: china` retains existing behavior for Azure China
users
- [ ] Verify `AZURE_CLOUD` env var overrides the config file value

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-03 12:51:26 +08:00
384fa6fc6e Replace MinIO official image with pgsty/minio fork (#13896)
## Summary

- Replace `quay.io/minio/minio` with `pgsty/minio` community fork in
`docker/docker-compose-base.yml`

MinIO stopped distributing pre-built Docker images and changed its
license. The pgsty/minio fork provides drop-in compatible images under
AGPLv3.

Closes #13840

## Test plan

- [x] Verify `docker compose -f docker/docker-compose-base.yml up -d`
pulls the pgsty/minio image successfully
- [ ] Verify MinIO console accessible on port 9001
- [ ] Verify RAGFlow backend can connect to MinIO and perform file
operations normally

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-02 22:03:02 +08:00
b7daf6285b Refa: Chat conversations /convsersation API to RESTFul (#13893)
### What problem does this PR solve?

Chat conversations /convsersation API to RESTFul.

### Type of change

- [x] Refactoring
2026-04-02 20:49:23 +08:00
bbb9b1df85 feat: Implement file upload and folder creation features by GO (#13903)
### What problem does this PR solve?

feat: Implement file upload and folder creation features

- Add file upload route in router.go
- Add file operation methods in dao/file.go
- Add util/file.go for file type detection and filename handling
- Implement file upload and folder creation endpoints in handler/file.go
- Implement file upload and folder creation logic in service/file.go
- Modify response message format in memory.go
- Add document count method in dao/document.go

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-04-02 20:21:04 +08:00
6c29128de1 Refactor model provider and command (#13887)
### What problem does this PR solve?

Introduce 5 new tables, including model groups and provider instance.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-02 20:20:35 +08:00