Commit Graph

178 Commits

Author SHA1 Message Date
57b24be6d6 Docs: Update version references to v0.25.2 in READMEs and docs (#14731)
### What problem does this PR solve?

- Update version tags in README files (including translations) from
v0.25.1 to v0.25.2
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files

### Type of change

- [x] Documentation Update
2026-05-09 19:06:05 +08:00
12f80f170c Bump to infinity v0.7.0-dev6 (#14606)
### What problem does this PR solve?

Bump to infinity v0.7.0-dev6

(uv lock --upgrade-package infinity-sdk)

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-07 10:51:17 +08:00
9e4f3614de Chore(deps-dev): Bump pillow from 12.1.1 to 12.2.0 (#14578)
As title
2026-05-06 11:08:38 +08:00
aa57b5bd8b Go: move logger to common module (#14545)
### What problem does this PR solve?

As title

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-05-06 10:41:58 +08:00
ce4c782fd7 Docs: Update version references to v0.25.1 in READMEs and docs (#14488)
### What problem does this PR solve?

- Update version tags in README files (including translations) from
v0.25.0 to v0.25.1
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files

### Type of change

- [x] Documentation Update
2026-04-30 10:49:26 +08:00
c41b5e8a5d fix: migrate Langfuse integration from start_generation to start_obse… (#14205)
The Langfuse Python SDK v3+ removed `start_generation()` method.
RagFlow's code called this non-existent method, causing AttributeError
when Langfuse tracing is enabled.

Replace all `start_generation()` calls with
`start_observation(as_type="generation")` which is the correct v4 SDK
API.

Affected files:
- api/db/services/llm_service.py (12 occurrences)
- api/db/services/dialog_service.py (1 occurrence)

Fixes #14204
Related to #9243

### What problem does this PR solve?

_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 10:03:57 +08:00
a33d0737cd Docs: Update version references to v0.25.0 in READMEs and docs (#14257)
### What problem does this PR solve?

- Update version tags in README files (including translations) from
v0.24.0 to v0.25.0
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files

### Type of change

- [x] Documentation Update
2026-04-21 17:26:50 +08:00
ba7d3f6c31 Add debugpy dependency to pyproject.toml (#14225)
In order to attach the debugger to a running docker container it has to
be inside the docker image

### What problem does this PR solve?

[#14224](https://github.com/infiniflow/ragflow/issues/14224)

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-20 18:05:17 +08:00
b34a726acd Build(deps): Bump pypdf from 6.9.2 to 6.10.2 (#14184)
Bumps [pypdf](https://github.com/py-pdf/pypdf) from 6.9.2 to 6.10.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/py-pdf/pypdf/releases">pypdf's
releases</a>.</em></p>
<blockquote>
<h2>Version 6.10.2, 2026-04-15</h2>
<h2>What's new</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Do not rely on possibly invalid /Size for incremental cloning (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3735">#3735</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
<li>Introduce limits for FlateDecode parameters and image decoding (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3734">#3734</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<p><a
href="https://github.com/py-pdf/pypdf/compare/6.10.1...6.10.2">Full
Changelog</a></p>
<h2>Version 6.10.1, 2026-04-14</h2>
<h2>What's new</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Limit the allowed size of xref and object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3733">#3733</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<h3>Robustness (ROB)</h3>
<ul>
<li>Consider strict mode setting for decryption errors (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3731">#3731</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<h3>Documentation (DOC)</h3>
<ul>
<li>Use new parameter names for compress_identical_objects by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<p><a
href="https://github.com/py-pdf/pypdf/compare/6.10.0...6.10.1">Full
Changelog</a></p>
<h2>Version 6.10.0, 2026-04-10</h2>
<h2>What's new</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Disallow custom XML entity declarations for XMP metadata (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3724">#3724</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<h3>New Features (ENH)</h3>
<ul>
<li>Skip MD5 key derivation for AES-256 encrypted PDFs (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3694">#3694</a>)
by <a href="https://github.com/Ygnas"><code>@​Ygnas</code></a></li>
</ul>
<h3>Bug Fixes (BUG)</h3>
<ul>
<li>Use remove_orphans in compress_identical_objects (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3310">#3310</a>)
by <a href="https://github.com/j-t-1"><code>@​j-t-1</code></a></li>
<li>Fix PdfReadError when xref table contains comments before trailer
(<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3710">#3710</a>)
by <a href="https://github.com/rassie"><code>@​rassie</code></a></li>
<li>Correctly verify AES padding during decryption (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3699">#3699</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
<li>Fix stale object cache from non-authoritative object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3698">#3698</a>)
by <a
href="https://github.com/astahlman"><code>@​astahlman</code></a></li>
<li>Fix extract_links pairing when annotations include non-links (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3687">#3687</a>)
by <a
href="https://github.com/ReinerBRO"><code>@​ReinerBRO</code></a></li>
</ul>
<h3>Documentation (DOC)</h3>
<ul>
<li>Add AI policy (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3717">#3717</a>)
by <a
href="https://github.com/stefan6419846"><code>@​stefan6419846</code></a></li>
</ul>
<p><a href="https://github.com/py-pdf/pypdf/compare/6.9.2...6.10.0">Full
Changelog</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md">pypdf's
changelog</a>.</em></p>
<blockquote>
<h2>Version 6.10.2, 2026-04-15</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Do not rely on possibly invalid /Size for incremental cloning (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3735">#3735</a>)</li>
<li>Introduce limits for FlateDecode parameters and image decoding (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3734">#3734</a>)</li>
</ul>
<p><a
href="https://github.com/py-pdf/pypdf/compare/6.10.1...6.10.2">Full
Changelog</a></p>
<h2>Version 6.10.1, 2026-04-14</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Limit the allowed size of xref and object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3733">#3733</a>)</li>
</ul>
<h3>Robustness (ROB)</h3>
<ul>
<li>Consider strict mode setting for decryption errors (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3731">#3731</a>)</li>
</ul>
<h3>Documentation (DOC)</h3>
<ul>
<li>Use new parameter names for compress_identical_objects</li>
</ul>
<p><a
href="https://github.com/py-pdf/pypdf/compare/6.10.0...6.10.1">Full
Changelog</a></p>
<h2>Version 6.10.0, 2026-04-10</h2>
<h3>Security (SEC)</h3>
<ul>
<li>Disallow custom XML entity declarations for XMP metadata (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3724">#3724</a>)</li>
</ul>
<h3>New Features (ENH)</h3>
<ul>
<li>Skip MD5 key derivation for AES-256 encrypted PDFs (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3694">#3694</a>)</li>
</ul>
<h3>Bug Fixes (BUG)</h3>
<ul>
<li>Use remove_orphans in compress_identical_objects (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3310">#3310</a>)</li>
<li>Fix PdfReadError when xref table contains comments before trailer
(<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3710">#3710</a>)</li>
<li>Correctly verify AES padding during decryption (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3699">#3699</a>)</li>
<li>Fix stale object cache from non-authoritative object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3698">#3698</a>)</li>
<li>Fix extract_links pairing when annotations include non-links (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3687">#3687</a>)</li>
</ul>
<h3>Documentation (DOC)</h3>
<ul>
<li>Add AI policy (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3717">#3717</a>)</li>
</ul>
<p><a href="https://github.com/py-pdf/pypdf/compare/6.9.2...6.10.0">Full
Changelog</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="c476b4f293"><code>c476b4f</code></a>
REL: 6.10.2</li>
<li><a
href="c50a0104cf"><code>c50a010</code></a>
SEC: Do not rely on possibly invalid /Size for incremental cloning (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3735">#3735</a>)</li>
<li><a
href="ac734dab4e"><code>ac734da</code></a>
SEC: Introduce limits for FlateDecode parameters and image decoding (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3734">#3734</a>)</li>
<li><a
href="b49e7eb454"><code>b49e7eb</code></a>
REL: 6.10.1</li>
<li><a
href="62338e9d36"><code>62338e9</code></a>
SEC: Limit the allowed size of xref and object streams (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3733">#3733</a>)</li>
<li><a
href="5dcc0aebaa"><code>5dcc0ae</code></a>
DEV: Update pytest-benchmark to 5.2.3</li>
<li><a
href="b42e4aa98a"><code>b42e4aa</code></a>
DEV: Update pinned pillow and pytest where possible (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3732">#3732</a>)</li>
<li><a
href="717446b121"><code>717446b</code></a>
ROB: Consider strict mode setting for decryption errors (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3731">#3731</a>)</li>
<li><a
href="9e461d361b"><code>9e461d3</code></a>
DEV: Bump softprops/action-gh-release from 2 to 3 (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3730">#3730</a>)</li>
<li><a
href="500d09d92f"><code>500d09d</code></a>
TST: Update <code>test_embedded_file__basic</code> to use
<code>tmp_path</code> fixture (<a
href="https://redirect.github.com/py-pdf/pypdf/issues/3726">#3726</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/py-pdf/pypdf/compare/6.9.2...6.10.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=pypdf&package-manager=uv&previous-version=6.9.2&new-version=6.10.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-17 18:43:19 +08:00
3b7723855c Fix: revert xgboost version to 1.6.0 (#13984)
### What problem does this PR solve?

Revert xgboost version to 1.6.0

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [ ] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
  * Updated xgboost dependency from version 3.2.0 to 1.6.0

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
2026-04-08 19:53:47 +08:00
c4b0aaa874 Fix: #6098 - Add validation logic for parser_config when update document (#13911)
### What problem does this PR solve?

Add validation logic for parser_config.
Refactor the processing flow. Before change, validation logics and
update logics are mixed up - some validation logis executes followed by
some update logic executes and then another such
"validation-and-then-update" which is not good. After change, all
validation logic executes firstly. Update logic will be executed after
ALL validation logic executed.
Validation logic for parameters (that come from front end) will be
checked using Pydantic. For validation logic that depends on data from
DB, they will be in separate methods.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
2026-04-07 11:33:05 +08:00
ab358fe949 feat: make Azure cloud authority configurable for SPN auth (#13898)
## Summary
- The Azure SPN storage handler hardcoded
`AzureAuthorityHosts.AZURE_CHINA`, preventing users in Azure Public
Cloud regions (UK-South, EU, US, etc.) from authenticating
- Add a `cloud` config option (env: `AZURE_CLOUD`) supporting all four
Azure sovereignties: `public`, `china`, `government`, `germany`
- Defaults to `public` (global Azure) — the most common international
use case

Closes #13259

## Test plan
- [ ] Verify default (`cloud: public`) connects to Azure Public Cloud
endpoints
- [ ] Verify `cloud: china` retains existing behavior for Azure China
users
- [ ] Verify `AZURE_CLOUD` env var overrides the config file value

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-03 12:51:26 +08:00
a8bbe167a9 Bump to infinity v0.7.0-dev5 (#13846)
### What problem does this PR solve?

Bump to infinity v0.7.0-dev5

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-30 10:19:06 +08:00
cb78ce0a7b feat: support rss datasource (#13721)
### What problem does this PR solve?

Supporting public RSS/Atom feed URLs as data sources for RagFlow.

link https://github.com/infiniflow/ragflow/issues/12313

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-03-27 22:58:44 +08:00
cdbbd2620c Fix: upgrade pyasn1 from 0.6.2 to 0.6.3 to address CVE-2026-30922 (#13773)
## Summary

- Adds `pyasn1>=0.6.3` as a `[tool.uv.constraint-dependencies]` entry to
mitigate **CVE-2026-30922** (CVSS 7.5 HIGH)
- Regenerates `uv.lock` so the resolved pyasn1 version moves from
**0.6.2 to 0.6.3**

## Details

**CVE-2026-30922** is a Denial of Service vulnerability in pyasn1 caused
by unbounded recursion when decoding ASN.1 data with deeply nested
structures. An attacker can send crafted payloads with thousands of
nested SEQUENCE or SET tags to trigger a `RecursionError` crash or
memory exhaustion.

- **Severity:** HIGH (CVSS 7.5)
- **Affected versions:** pyasn1 < 0.6.3
- **Fixed in:** pyasn1 >= 0.6.3
- **NVD:** https://nvd.nist.gov/vuln/detail/CVE-2026-25769

`pyasn1` is not a direct dependency of RAGFlow but is pulled in
transitively via `google-auth` -> `rsa` -> `pyasn1-modules` -> `pyasn1`.
The `constraint-dependencies` mechanism in uv is the correct way to
enforce a minimum version for transitive dependencies without polluting
the direct dependency list.

## Test plan

- [x] `pyproject.toml` passes TOML validation
- [x] `uv lock` resolves successfully with the new constraint
- [x] pyasn1 version in `uv.lock` is now 0.6.3
- [ ] Existing CI/CD tests continue to pass

Closes #13686
2026-03-27 10:37:34 +08:00
ea1430bec5 Security: do not use litellm 1.82.7 and 1.82.8 (#13768)
### What problem does this PR solve?

See [issue](https://github.com/BerriAI/litellm/issues/24518) from
Litellm.

Upgraded from `1.81.15` to `1.82.6`, so RAGFlow is safe as always. 

### Type of change

- [x] Security

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
2026-03-25 22:39:33 +08:00
5b3bb25010 Fix: switch Python package mirror from Tsinghua to Aliyun (#13617)
### What problem does this PR solve?

Replace pypi.tuna.tsinghua.edu.cn with mirrors.aliyun.com to resolve
issues with missing packages on the Tsinghua mirror.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-16 12:12:25 +08:00
287637162c Revert "fix CVE-2026-26216. CVE-2026-26217 CVE 2025-66416" (#13613)
Reverts infiniflow/ragflow#13583 which cause uv sync fails.
2026-03-16 10:19:29 +08:00
a67fa03584 fix CVE-2026-28804 CVE-2026-31826 (#13592)
What problem does this PR solve?

fix CVE-2026-28804 CVE-2026-31826

 Bug Fix (non-breaking change which fixes an issue)

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-13 16:34:28 +08:00
e90f0e8910 fix CVE-2026-26216. CVE-2026-26217 CVE 2025-66416 (#13583)
### What problem does this PR solve?

fix CVE-2026-26216. CVE-2026-26217 CVE 2025-66416

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-13 11:17:39 +08:00
7484298c82 Refa: convert download_img to async (#13477)
### What problem does this PR solve?

Convert download_img to async.

### Type of change

- [x] Refactoring
- [x] Performance Improvement
2026-03-09 19:00:17 +08:00
32d31284cc Fix: upgrade pypdf to 6.7.5 and migrate from deprecated pypdf2 to fix CVE-2026-28804 and CVE-2023-36464 (#13454)
### What problem does this PR solve?

This PR addresses security vulnerabilities in PDF processing
dependencies identified by Trivy security scan:

1. CVE-2026-28804 (MEDIUM): pypdf 6.7.4 vulnerable to inefficient
decoding of ASCIIHexDecode streams
2. CVE-2023-36464 (MEDIUM): pypdf2 3.0.1 susceptible to infinite loop
when parsing malformed comments

Since pypdf2 is deprecated with no available fixes, this PR migrates all
pypdf2 usage to the actively maintained pypdf library (version 6.7.5),
which resolves
both vulnerabilities.


### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-03-09 12:06:00 +08:00
c217b8f3d8 Feat: add DingTalk AI Table connector and integration for data synch… (#13413)
### What problem does this PR solve?

Add DingTalk AI Table connector and integration for data synchronization

Issue #13400

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

Co-authored-by: wangheyang <wangheyang@corp.netease.com>
2026-03-06 21:13:23 +08:00
6bb00e2762 Update graspologic to gitee (#13362)
### What problem does this PR solve?

Accelerate python module downloading

### Type of change

- [x] Refactoring

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-03-04 17:48:47 +08:00
860c4bd0bb Feat: UI testing automation with playwright (#12749)
### What problem does this PR solve?

This PR helps automate the testing of the ui interface using pytest
Playwright

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Other (please describe): test automation infrastructure

---------

Co-authored-by: Liu An <asiro@qq.com>
2026-03-02 13:04:08 +08:00
158503a1aa Feat: optimize ingestion pipeline with preprocess (#13211)
### What problem does this PR solve?

Feat: optimize ingestion pipeline with preprocess

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-02-26 10:24:13 +08:00
98e1d5aa5c Refact: switch from google-generativeai to google-genai (#13140)
### What problem does this PR solve?

Refact: switch from oogle-generativeai to google-genai  #13132
Refact: commnet out unused pywencai.

### Type of change

- [x] Refactoring
2026-02-24 10:28:33 +08:00
392ec99651 Docs: Update version references to v0.24.0 in READMEs and docs (#13095)
### What problem does this PR solve?

- Update version tags in README files (including translations) from
v0.23.1 to v0.24.0
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files

### Type of change

- [x] Documentation Update
2026-02-10 17:24:03 +08:00
38289084a8 Chore/upgrade dashscope to 1.25.11 (#13007)
## Description
  Upgrade dashscope package to support text-embedding-v4 model.

  ## Changes
  - Update dashscope version from 1.20.11 to 1.25.11 in pyproject.toml

  ## Reason
The text-embedding-v4 model requires dashscope >= 1.25.0 to function
properly. This upgrade ensures compatibility with the latest embedding
models.

Co-authored-by: Clint-chan <Clint-chan@users.noreply.github.com>
2026-02-06 19:06:41 +08:00
6f31c5fed2 feat/add MySQL and PostgreSQL data source connectors (#12817)
### What problem does this PR solve?

This PR adds MySQL and PostgreSQL as data source connectors, allowing
users to import data directly from relational databases into RAGFlow for
RAG workflows.

Many users store their knowledge in databases (product catalogs,
documentation, FAQs, etc.) and currently have no way to sync this data
into RAGFlow without exporting to files first. This feature lets them
connect directly to their databases, run SQL queries, and automatically
create documents from the results.

Closes #763
Closes #11560

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):

### What this PR does

**New capabilities:**
- Connect to MySQL and PostgreSQL databases
- Run custom SQL queries to extract data
- Map database columns to document content (vectorized) and metadata
(searchable)
- Support incremental sync using a timestamp column
- Full frontend UI with connection form and tooltips

**Files changed:**

Backend:
- `common/constants.py` - Added MYSQL/POSTGRESQL to FileSource enum
- `common/data_source/config.py` - Added to DocumentSource enum
- `common/data_source/rdbms_connector.py` - New connector (368 lines)
- `common/data_source/__init__.py` - Exported the connector
- `rag/svr/sync_data_source.py` - Added MySQL and PostgreSQL sync
classes
- `pyproject.toml` - Added mysql-connector-python dependency

Frontend:
- `web/src/pages/user-setting/data-source/constant/index.tsx` - Form
fields
- `web/src/locales/en.ts` - English translations
- `web/src/assets/svg/data-source/mysql.svg` - MySQL icon
- `web/src/assets/svg/data-source/postgresql.svg` - PostgreSQL icon

### Testing done

Tested with MySQL 8.0 and PostgreSQL 16:
- Connection validation works correctly
- Full sync imports all query results as documents
- Incremental sync only fetches rows updated since last sync
- Custom SQL queries filter data as expected
- Invalid credentials show clear error messages
- Lint checks pass (`ruff check` returns no errors)

---------

Co-authored-by: mkdev11 <YOUR_GITHUB_ID+MkDev11@users.noreply.github.com>
2026-02-04 10:14:32 +08:00
e385b19d67 Test: Add code coverage reporting to CI (#12874)
### What problem does this PR solve?

Add code coverage reporting to CI

### Type of change

- [x] Test (please describe): coverage report

---------

Co-authored-by: Liu An <asiro@qq.com>
2026-01-30 14:49:16 +08:00
f1c2fac03e Refa: remove ppt image. (#12909)
### What problem does this PR solve?

remove `aspose`

### Type of change

- [x] Refactoring
2026-01-30 13:35:42 +08:00
2c4499ec45 Fix: key error "content" #12844 (#12847)
### What problem does this PR solve?

Fix: key error "content" #12844

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2026-01-28 14:39:34 +08:00
fd11aca8e5 feat: Implement pluggable multi-provider sandbox architecture (#12820)
## Summary

Implement a flexible sandbox provider system supporting both
self-managed (Docker) and SaaS (Aliyun Code Interpreter) backends for
secure code execution in agent workflows.

**Key Changes:**
-  Aliyun Code Interpreter provider using official
`agentrun-sdk>=0.0.16`
-  Self-managed provider with gVisor (runsc) security
-  Arguments parameter support for dynamic code execution
-  Database-only configuration (removed fallback logic)
-  Configuration scripts for quick setup

Issue #12479

## Features

### 🔌 Provider Abstraction Layer

**1. Self-Managed Provider** (`agent/sandbox/providers/self_managed.py`)
- Wraps existing executor_manager HTTP API
- gVisor (runsc) for secure container isolation
- Configurable pool size, timeout, retry logic
- Languages: Python, Node.js, JavaScript
- ⚠️ **Requires**: gVisor installation, Docker, base images

**2. Aliyun Code Interpreter**
(`agent/sandbox/providers/aliyun_codeinterpreter.py`)
- SaaS integration using official agentrun-sdk
- Serverless microVM execution with auto-authentication
- Hard timeout: 30 seconds max
- Credentials: `AGENTRUN_ACCESS_KEY_ID`, `AGENTRUN_ACCESS_KEY_SECRET`,
`AGENTRUN_ACCOUNT_ID`, `AGENTRUN_REGION`
- Automatically wraps code to call `main()` function

**3. E2B Provider** (`agent/sandbox/providers/e2b.py`)
- Placeholder for future integration

### ⚙️ Configuration System

- `conf/system_settings.json`: Default provider =
`aliyun_codeinterpreter`
- `agent/sandbox/client.py`: Enforces database-only configuration
- Admin UI: `/admin/sandbox-settings`
- Configuration validation via `validate_config()` method
- Health checks for all providers

### 🎯 Key Capabilities

**Arguments Parameter Support:**
All providers support passing arguments to `main()` function:
```python
# User code
def main(name: str, count: int) -> dict:
    return {"message": f"Hello {name}!" * count}

# Executed with: arguments={"name": "World", "count": 3}
# Result: {"message": "Hello World!Hello World!Hello World!"}
```

**Self-Describing Providers:**
Each provider implements `get_config_schema()` returning form
configuration for Admin UI

**Error Handling:**
Structured `ExecutionResult` with stdout, stderr, exit_code,
execution_time

## Configuration Scripts

Two scripts for quick Aliyun sandbox setup:

**Shell Script (requires jq):**
```bash
source scripts/configure_aliyun_sandbox.sh
```

**Python Script (interactive):**
```bash
python3 scripts/configure_aliyun_sandbox.py
```

## Testing

```bash
# Unit tests
uv run pytest agent/sandbox/tests/test_providers.py -v

# Aliyun provider tests
uv run pytest agent/sandbox/tests/test_aliyun_codeinterpreter.py -v

# Integration tests (requires credentials)
uv run pytest agent/sandbox/tests/test_aliyun_codeinterpreter_integration.py -v

# Quick SDK validation
python3 agent/sandbox/tests/verify_sdk.py
```

**Test Coverage:**
- 30 unit tests for provider abstraction
- Provider-specific tests for Aliyun
- Integration tests with real API
- Security tests for executor_manager

## Documentation

- `docs/develop/sandbox_spec.md` - Complete architecture specification
- `agent/sandbox/tests/MIGRATION_GUIDE.md` - Migration from legacy
sandbox
- `agent/sandbox/tests/QUICKSTART.md` - Quick start guide
- `agent/sandbox/tests/README.md` - Testing documentation

## Breaking Changes

⚠️ **Migration Required:**

1. **Directory Move**: `sandbox/` → `agent/sandbox/`
   - Update imports: `from sandbox.` → `from agent.sandbox.`

2. **Mandatory Configuration**: 
   - SystemSettings must have `sandbox.provider_type` configured
   - Removed fallback default values
- Configuration must exist in database (from
`conf/system_settings.json`)

3. **Aliyun Credentials**:
   - Requires `AGENTRUN_*` environment variables (not `ALIYUN_*`)
   - `AGENTRUN_ACCOUNT_ID` is now required (Aliyun primary account ID)

4. **Self-Managed Provider**:
   - gVisor (runsc) must be installed for security
   - Install: `go install gvisor.dev/gvisor/runsc@latest`

## Database Schema Changes

```python
# SystemSettings.value: CharField → TextField
api/db/db_models.py: Changed for unlimited config length

# SystemSettingsService.get_by_name(): Fixed query precision
api/db/services/system_settings_service.py: startswith → exact match
```

## Files Changed

### Backend (Python)
- `agent/sandbox/providers/base.py` - SandboxProvider ABC interface
- `agent/sandbox/providers/manager.py` - ProviderManager
- `agent/sandbox/providers/self_managed.py` - Self-managed provider
- `agent/sandbox/providers/aliyun_codeinterpreter.py` - Aliyun provider
- `agent/sandbox/providers/e2b.py` - E2B provider (placeholder)
- `agent/sandbox/client.py` - Unified client (enforces DB-only config)
- `agent/tools/code_exec.py` - Updated to use provider system
- `admin/server/services.py` - SandboxMgr with registry & validation
- `admin/server/routes.py` - 5 sandbox API endpoints
- `conf/system_settings.json` - Default: aliyun_codeinterpreter
- `api/db/db_models.py` - TextField for SystemSettings.value
- `api/db/services/system_settings_service.py` - Exact match query

### Frontend (TypeScript/React)
- `web/src/pages/admin/sandbox-settings.tsx` - Settings UI
- `web/src/services/admin-service.ts` - Sandbox service functions
- `web/src/services/admin.service.d.ts` - Type definitions
- `web/src/utils/api.ts` - Sandbox API endpoints

### Documentation
- `docs/develop/sandbox_spec.md` - Architecture spec
- `agent/sandbox/tests/MIGRATION_GUIDE.md` - Migration guide
- `agent/sandbox/tests/QUICKSTART.md` - Quick start
- `agent/sandbox/tests/README.md` - Testing guide

### Configuration Scripts
- `scripts/configure_aliyun_sandbox.sh` - Shell script (jq)
- `scripts/configure_aliyun_sandbox.py` - Python script

### Tests
- `agent/sandbox/tests/test_providers.py` - 30 unit tests
- `agent/sandbox/tests/test_aliyun_codeinterpreter.py` - Provider tests
- `agent/sandbox/tests/test_aliyun_codeinterpreter_integration.py` -
Integration tests
- `agent/sandbox/tests/verify_sdk.py` - SDK validation

## Architecture

```
Admin UI → Admin API → SandboxMgr → ProviderManager → [SelfManaged|Aliyun|E2B]
                                      ↓
                                  SystemSettings
```

## Usage

### 1. Configure Provider

**Via Admin UI:**
1. Navigate to `/admin/sandbox-settings`
2. Select provider (Aliyun Code Interpreter / Self-Managed)
3. Fill in configuration
4. Click "Test Connection" to verify
5. Click "Save" to apply

**Via Configuration Scripts:**
```bash
# Aliyun provider
export AGENTRUN_ACCESS_KEY_ID="xxx"
export AGENTRUN_ACCESS_KEY_SECRET="yyy"
export AGENTRUN_ACCOUNT_ID="zzz"
export AGENTRUN_REGION="cn-shanghai"
source scripts/configure_aliyun_sandbox.sh
```

### 2. Restart Service

```bash
cd docker
docker compose restart ragflow-server
```

### 3. Execute Code in Agent

```python
from agent.sandbox.client import execute_code

result = execute_code(
    code='def main(name: str) -> dict: return {"message": f"Hello {name}!"}',
    language="python",
    timeout=30,
    arguments={"name": "World"}
)

print(result.stdout)  # {"message": "Hello World!"}
```

## Troubleshooting

### "Container pool is busy" (Self-Managed)
- **Cause**: Pool exhausted (default: 1 container in `.env`)
- **Fix**: Increase `SANDBOX_EXECUTOR_MANAGER_POOL_SIZE` to 5+

### "Sandbox provider type not configured"
- **Cause**: Database missing configuration
- **Fix**: Run config script or set via Admin UI

### "gVisor not found"
- **Cause**: runsc not installed
- **Fix**: `go install gvisor.dev/gvisor/runsc@latest && sudo cp
~/go/bin/runsc /usr/local/bin/`

### Aliyun authentication errors
- **Cause**: Wrong environment variable names
- **Fix**: Use `AGENTRUN_*` prefix (not `ALIYUN_*`)

## Checklist

- [x] All tests passing (30 unit tests + integration tests)
- [x] Documentation updated (spec, migration guide, quickstart)
- [x] Type definitions added (TypeScript)
- [x] Admin UI implemented
- [x] Configuration validation
- [x] Health checks implemented
- [x] Error handling with structured results
- [x] Breaking changes documented
- [x] Configuration scripts created
- [x] gVisor requirements documented

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-28 13:28:21 +08:00
6404af0a91 Bump to infinity v0.7.0-dev2 (#12839)
### What problem does this PR solve?

Bump to infinity v0.7.0-dev2

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-01-27 11:48:02 +08:00
4fbaa4aae9 Bump to infinity v0.7.0-dev1 (#12699)
### What problem does this PR solve?

Bump to infinity v0.7.0-dev1

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-01-19 16:36:03 +08:00
18867daba7 chore: bump pyobvector from 0.2.18 to 0.2.22 (#12640)
### What problem does this PR solve?

Update ob client

### Type of change

- [x] Other (please describe):dependency upgrade
2026-01-15 15:21:34 +08:00
5b22f94502 Feat: Benchmark CLI additions and documentation (#12536)
### What problem does this PR solve?

This PR adds a dedicated HTTP benchmark CLI for RAGFlow chat and
retrieval endpoints so we can measure latency/QPS.

### Type of change

- [x] Documentation Update
- [x] Other (please describe): Adds a CLI benchmarking tool for
chat/retrieval latency/QPS

---------

Co-authored-by: Liu An <asiro@qq.com>
2026-01-14 13:49:16 +08:00
44bada64c9 Feat: support tree structured deep-research policy. (#12559)
### What problem does this PR solve?

#12558
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-01-13 09:41:35 +08:00
6abf55c048 Feat: support openapi (#12521)
### What problem does this PR solve?
Support OpenAPI interface description.

The issue of not supporting the Swagger interface after upgrading the
system framework from Flask to Quart has been resolved.

Resolved https://github.com/infiniflow/ragflow/issues/5264

### Type of change
- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: puhaiyang <“761396462@qq.com”>
2026-01-09 17:48:20 +08:00
07ef35b7e6 Docs: Update version references to v0.23.1 in READMEs and docs (#12349)
### What problem does this PR solve?

- Update version tags in README files (including translations) from
v0.23.0 to v0.23.1
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files

### Type of change

- [x] Documentation Update
2025-12-31 12:49:42 +08:00
5903d1c8f1 Feat: GitHub connector (#12314)
### What problem does this PR solve?

Feat: GitHub connector

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-30 15:09:52 +08:00
c2c079886f Revert "Feat: github connector" (#12296)
Reverts infiniflow/ragflow#12292
2025-12-29 17:06:40 +08:00
c3ae1aaecd Feat: Gitlab connector (#12248)
### What problem does this PR solve?

Feat: Gitlab connector
Fix: submit button in darkmode

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-12-29 17:05:20 +08:00
f099bc1236 Feat: github connector (#12292)
### What problem does this PR solve?

Feat: github connector

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-29 16:57:20 +08:00
a764f0a5b2 Feat: Add Asana data source integration and configuration options (#12239)
### What problem does this PR solve?

change: Add Asana data source integration and configuration options

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
2025-12-29 13:28:37 +08:00
8dc5b4dc56 Docs: Update version references to v0.23.0 in READMEs and docs (#12253)
### What problem does this PR solve?

- Update version tags in README files (including translations) from
v0.22.1 to v0.23.0
- Modify Docker image references and documentation to reflect new
version
- Update version badges and image descriptions
- Maintain consistency across all language variants of README files

### Type of change

- [x] Documentation Update

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2025-12-27 20:44:35 +08:00
050534e743 Bump infinity to 0.6.15 (#12264)
### What problem does this PR solve?

As title

### Type of change

- [x] Other (please describe): update doc engine

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2025-12-27 19:48:17 +08:00
1812491679 Feat: add Airtable connector and integration for data synchronization (#12211)
### What problem does this PR solve?
change:
add Airtable connector and integration for data synchronization
### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2025-12-25 17:50:41 +08:00
02b976ffa4 Bump infinity to 0.6.13 (#12181)
### What problem does this PR solve?

Bump infinity to 0.6.13

### Type of change

- [x] Refactoring
2025-12-25 12:13:11 +08:00