### What problem does this PR solve?
This PR adds a new `Browser` operator to Agent workflows, enabling
prompt-driven browser automation in RAGFlow.Technically based
‘Browser-Use’
It includes:
- Backend browser component execution with tenant LLM integration
- Upload source support (file IDs, URLs, variables, CSV/JSON array)
- Downloaded file persistence to RAGFlow storage
- Frontend node/operator integration, form config, icon, and i18n
updates
- Unit tests for upload/download and ID parsing logic
- Dependency and Docker updates for browser-use runtime support
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: add local & ssh provider in admin panel
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
## What problem does this PR solve?
Closes#12582.
When a Retrieval component sits inside an Iteration with a **manual**
metadata filter that references the iteration variable (e.g.
`{IterationItem:abc@item}`), every iteration reuses the value resolved
on the **first** pass.
Root cause: [`_resolve_manual_filter` in
`agent/tools/retrieval.py`](https://github.com/infiniflow/ragflow/blob/main/agent/tools/retrieval.py#L144-L171)
mutated `flt["value"]` in place. The `filters` list passed in is the
live `self._param.meta_data_filter["manual"]` (see
[`apply_meta_data_filter` in
`common/metadata_utils.py:257-261`](https://github.com/infiniflow/ragflow/blob/main/common/metadata_utils.py#L257-L261)),
so after the first iteration the param dict permanently held the
resolved string instead of the original variable reference.
```text
iter #1: flt["value"] = "{IterationItem:abc@item}" → resolved to "AI"
after mutation: flt["value"] = "AI" ← written back into _param
iter #2: flt["value"] = "AI" ← no {…} matches
retrieval keeps filtering by "AI" forever
```
This PR returns a shallow copy with the resolved value instead, leaving
the original filter (and its variable reference) intact for the next
iteration.
## Type of change
- [x] Bug fix (non-breaking change which fixes an issue)
## Test plan
- [ ] Build an agent: `Agent (structured output → list of areas) →
Iteration → Retrieval (manual filter: Area = {IterationItem/Item}) →
Message`. Run with a multi-area query and confirm each iteration's
Retrieval result matches its own item, not the first item.
- [ ] Regression: Retrieval with a manual metadata filter outside an
Iteration still resolves the variable correctly on each request.
- [ ] Regression: Retrieval with no metadata filter and with `auto` /
`semi_auto` filters behave unchanged.
## What problem does this PR solve?
Closes#12017.
TTS output is deterministic for a given `(model, text)` pair, so
re-running the same text through the same TTS model produces the same
bytes — yet `Canvas.tts` and `dialog_service.tts` re-synthesized on
every request. That's slow and wastes provider quota whenever the same
assistant response is replayed, shared across users, or repeated within
a session.
### Change
New helper `rag/utils/tts_cache.py` with `synthesize_with_cache(tts_mdl,
cleaned_text)`:
- **Key:** `tts:cache:{model_id}:{sha256(text)}` — separate namespace
per model, identical cleaned text reuses a single entry across both call
sites.
- **Value:** the hex-encoded audio blob both call sites already
returned. No format change for downstream consumers.
- **TTL:** 7 days by default, configurable via
`RAGFLOW_TTS_CACHE_TTL_SECONDS`.
- **Failure modes:** a Redis hiccup falls back to direct synthesis; a
failed synthesis still returns `None` (existing contract preserved).
[`Canvas.tts`](https://github.com/infiniflow/ragflow/blob/main/agent/canvas.py#L683-L724)
and
[`dialog_service.tts`](https://github.com/infiniflow/ragflow/blob/main/api/db/services/dialog_service.py#L1367-L1380)
now route through the helper; the per-file bytes-accumulation/hex-encode
loop has been removed in favor of one shared implementation.
## Type of change
- [x] New Feature (non-breaking change which adds functionality)
## Test plan
- [ ] **Cache hit, chat path:** Configure a dialog with TTS enabled, ask
the same question twice with `stream=false`. Verify the second response
returns the same `audio_binary` and that the second invocation doesn't
hit the TTS provider (e.g., observe provider-side logs / usage counters;
check no `LLMBundle.tts can't update token usage` log line on the second
run).
- [ ] **Cache hit, agent path:** Same exercise via a Conversational
Agent that includes a Message component playing back the answer.
- [ ] **Cache isolation per model:** Switch tenant's `tts_id` between
two models, run the same text against each — confirm the second model's
first synthesis still happens (no cross-model hits).
- [ ] **TTL override:** Set `RAGFLOW_TTS_CACHE_TTL_SECONDS=120`, confirm
the entry expires after 2 minutes.
- [ ] **Redis unavailable:** Stop Redis (or break the connection).
Verify the TTS endpoint still works — synthesis falls back to direct
calls, with a `TTS cache lookup failed` / `TTS cache store failed`
warning logged.
- [ ] **Failure path:** Configure a TTS model with an invalid API key,
ensure the response still returns successfully with `audio_binary=None`
(no regression vs. current behavior).
## Summary
This PR fixes 3 bugs in agent components:
### Bug 1: `DataOperations._invoke()` dispatches `"literal_eval"` to
wrong handler
**File:** `agent/component/data_operations.py`, line 76
The `_invoke()` method compares `self._param.operations` against
`"recursive_eval"` (line 76), but the valid value defined in
`DataOperationsParam.__init__()` (line 29) and validated in `check()`
(line 43) is `"literal_eval"`. This means selecting the `literal_eval`
operation from the frontend would never match, and the method
`_literal_eval()` would never be called.
**Fix:** Change `"recursive_eval"` to `"literal_eval"` in the dispatch
condition.
### Bug 2: `VariableAssigner._clear()` — `bool` branch unreachable
**File:** `agent/component/variable_assigner.py`, lines 95–100
In Python, `bool` is a subclass of `int` (`True` is `isinstance(True,
int) == True`). The `isinstance(variable, int)` check on line 95 catches
boolean values before the `isinstance(variable, bool)` check on line 99,
making the bool branch unreachable. A boolean variable would be cleared
to `0` instead of `False`.
**Fix:** Move the `isinstance(variable, bool)` check before
`isinstance(variable, int)`.
### Bug 3: `LoopItem.evaluate_condition()` — `bool` branch unreachable
**File:** `agent/component/loopitem.py`, lines 67–93
Same issue as Bug 2: `isinstance(var, (int, float))` on line 67 catches
boolean values before `isinstance(var, bool)` on line 85. Boolean
variables would be evaluated with numeric operators (`=`, `≠`, `>`,
etc.) instead of boolean operators (`is`, `is not`).
**Fix:** Move the `isinstance(var, bool)` check before `isinstance(var,
(int, float))`.
## Test plan
- [ ] Verify `DataOperations` with `literal_eval` operation correctly
invokes `_literal_eval()`
- [ ] Verify `VariableAssigner._clear()` returns `False` for boolean
variables (not `0`)
- [ ] Verify `LoopItem.evaluate_condition()` uses boolean operators for
`True`/`False` values
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Fixed operation routing logic to correctly dispatch the "literal_eval"
operation to its handler.
* **Refactor**
* Reorganized conditional branch ordering in agent components to improve
code structure and maintainability without affecting functional
behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Closes#14753
## What changed
| File | Change |
|---|---|
| `pyproject.toml` | `requires-python` → `>=3.13,<3.15`; remove
`strenum==0.4.15` |
| `Dockerfile` | `uv python install 3.13`, `uv sync --python 3.13` |
| `.github/workflows/tests.yml` | `uv sync --python 3.13` on both matrix
legs |
| `CLAUDE.md` | dev setup command + requirements note updated |
| `deepdoc/parser/mineru_parser.py` | `from strenum import StrEnum` →
`from enum import StrEnum` |
| `agent/tools/code_exec.py` | same |
`StrEnum` has been in the stdlib since Python 3.11 — the `strenum`
backport package is no longer needed once the floor is 3.13.
## Why uv.lock is not regenerated
`uv lock --python 3.13` fails because:
1. The infiniflow/graspologic fork pins `numpy>=1.26.4,<2.0.0`
2. `tensorflow-cpu>=2.20.0` (the first release with cp313 wheels)
depends on `ml-dtypes>=0.5.1`, which requires `numpy>=2.1.0`
3. These two constraints are irreconcilable on Python 3.13
The lockfile regeneration requires loosening the `numpy` upper bound in
the `infiniflow/graspologic` fork. Once that fork commit is updated and
the SHA in `pyproject.toml:49` is bumped, `uv lock --python 3.13` will
succeed.
## RFC corrections
Two claims in the original RFC (#14753) did not hold up under code
review:
- **"graspologic hard-blocks 3.13"** — the infiniflow fork at the pinned
commit has no `<3.13` Python constraint. The blocker is the transitive
`numpy<2.0.0` conflict with tensorflow-cpu's test dependency, not a
direct Python version cap.
- **"free-threading throughput gains for I/O-bound workload"** — Python
3.13 free-threading requires a special `--disable-gil` build and
provides no benefit for async I/O code (the GIL is already released
during I/O). The real motivation is forward compatibility and improved
error messages.
### What problem does this PR solve?
This PR fixes two issues in Agent Retrieval behavior and configuration
UX:
1. `top_k` configured in Agent Retrieval was not passed down to the
backend retriever call, so retrieval could ignore the configured vector
recall limit.
2. Similarity weight slider semantics were confusing in Agent forms
because the Agent field stores `keywords_similarity_weight` while UI
interactions were interpreted as vector weight. This could cause
displayed values and actual behavior to diverge.
This PR ensures Agent retrieval uses configured `top_k`, and makes the
slider behavior consistent and explicit for both vector and keyword
weight modes.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
When multiple MCP servers expose tools with the same name, the agent
currently registers those tools using their original MCP names. This can
lead to two issues:
- later MCP tools may overwrite earlier ones in the agent tool map
- duplicate function names may be exposed to the LLM
This PR fixes duplicate MCP tool-name handling by applying the same
indexed naming strategy already used for native agent tools. Native
tools are exposed with generated names such as `<tool_name>_<index>` to
avoid collisions, and MCP tools now follow the same convention for
consistency.
Specifically, this PR:
- assigns unique indexed function names to MCP tools exposed to the LLM
- preserves each MCP tool's original server-side name in an
`MCPToolBinding`
- dispatches MCP calls using the original MCP tool name while keeping
the indexed name in the agent tool map
- allows MCP metadata conversion to override only the OpenAI function
name without modifying the original MCP tool metadata
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Validation
The validation was performed using two MCP servers. Both servers exposed
a tool with the same name: `mcp0`. Both tools take no input parameters.
**MCP Server One:**
<img width="1780" height="625" alt="ONE"
src="https://github.com/user-attachments/assets/801a2654-fc10-4b71-b31c-81841fd40c55"
/>
**MCP Server Two:**
<img width="1777" height="624" alt="Second"
src="https://github.com/user-attachments/assets/c095151d-7bdf-47c8-9bfe-6aaf4a01b944"
/>
**Before the fix:**
When invoking `mcp0`, only the `mcp0` tool from the MCP server injected
later could be called successfully. As shown below, both `mcp0` tools
were present, but only the later-registered one was actually invokable.
<img width="694" height="935" alt="Three"
src="https://github.com/user-attachments/assets/3b9d7ab2-1765-492c-b8e0-bf05a69933ca"
/>
**After the fix:**
Both `mcp0` tools can now be invoked correctly.
<img width="737" height="1095" alt="F"
src="https://github.com/user-attachments/assets/6e896627-2b7f-41bb-becc-daa0c73ff58f"
/>
<img width="730" height="1090" alt="six"
src="https://github.com/user-attachments/assets/aba75593-26ae-4e3b-951d-b45ff177fd32"
/>
## Summary
`Graph.set_variable_param_value()` in `agent/canvas.py` has a bug in its
nested path traversal logic. The `for` loop iterates through **all**
keys in the path (including the last one), descending into every level.
After the loop, it then tries to set `cur[keys[-1]] = value`, but `cur`
has already descended one level too deep.
**Example:** For `path = "a.b"`, `value = "hello"`:
- **Before (bug):** `obj["a"]["b"]` becomes `{"b": "hello"}` instead of
`"hello"`
- **After (fix):** `obj["a"]["b"]` becomes `"hello"` as expected
The fix changes `for key in keys:` to `for key in keys[:-1]:`, so the
loop only navigates to the parent dict, and the final key is set
directly. This is consistent with how the read-side counterpart
`get_variable_param_value()` works.
This method is called by `set_variable_value()` when assigning to nested
variable paths (e.g., `component@root.nested.key`), which is used by the
`VariableAssigner` component.
## Test plan
- [ ] Create a canvas with a VariableAssigner that writes to a nested
path (e.g., `component@obj.nested.key`)
- [ ] Verify the value is set correctly at the expected path, not
wrapped in an extra dict layer
- [ ] Verify single-key paths (e.g., `component@key`) still work
correctly
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Fixed a bug in variable parameter assignment where nested structures
were being incorrectly modified, ensuring values are now properly set at
their intended locations without unintended overwrites.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
\`assert \"string\"\` always passes in Python because non-empty strings
are truthy. This silently skips input validation:
- **variable_assigner.py line 51**: \`assert \"Variable is not
complete.\"\` → \`raise ValueError(\"Variable is not complete.\")\`
- **loop.py line 59**: \`assert \"Loop Variable is not complete.\"\` →
\`raise ValueError(\"Loop Variable is not complete.\")\`
Without this fix, incomplete variables pass validation silently and
cause a confusing KeyError on the next line.
### What problem does this PR solve?
fix:
update null checks to use 'is None' for better clarity
replace RAGFlowSelect with SelectWithSearch in DebugContent
add max height and overflow to DialogContent in ParameterDialog
remove unused types from DataOperationsForm
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.6.3 to 2.7.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/urllib3/urllib3/releases">urllib3's
releases</a>.</em></p>
<blockquote>
<h2>2.7.0</h2>
<h2>🚀 urllib3 is fundraising for HTTP/2 support</h2>
<p><a
href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3
is raising ~$40,000 USD</a> to release HTTP/2 support and ensure
long-term sustainable maintenance of the project after a sharp decline
in financial support. If your company or organization uses Python and
would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and
thousands of other projects <a
href="https://opencollective.com/urllib3">please consider contributing
financially</a> to ensure HTTP/2 support is developed sustainably and
maintained for the long-haul.</p>
<p>Thank you for your support.</p>
<h2>Security</h2>
<p>Addressed high-severity security issues. Impact was limited to
specific use cases detailed in the accompanying advisories; overall user
exposure was estimated to be marginal.</p>
<ul>
<li>
<p>Decompression-bomb safeguards of the streaming API were bypassed:</p>
<ol>
<li>When <code>HTTPResponse.drain_conn()</code> was called after the
response had been read and decompressed partially. (Reported by <a
href="https://github.com/Cycloctane"><code>@Cycloctane</code></a>)</li>
<li>During the second <code>HTTPResponse.read(amt=N)</code> or
<code>HTTPResponse.stream(amt=N)</code> call when the response was
decompressed using the official <a
href="https://pypi.org/project/brotli/">Brotli</a> library. (Reported by
<a
href="https://github.com/kimkou2024"><code>@kimkou2024</code></a>)</li>
</ol>
<p>See GHSA-mf9v-mfxr-j63j for details.</p>
</li>
<li>
<p>HTTP pools created using
<code>ProxyManager.connection_from_url</code> did not strip sensitive
headers specified in <code>Retry.remove_headers_on_redirect</code> when
redirecting to a different host. (GHSA-qccp-gfcp-xxvc reported by <a
href="https://github.com/christos-spearbit"><code>@christos-spearbit</code></a>)</p>
</li>
</ul>
<h2>Deprecations and Removals</h2>
<ul>
<li>Used <code>FutureWarning</code> instead of
<code>DeprecationWarning</code> for better visibility of existing
deprecation notices. Rescheduled the removal of deprecated features to
version 3.0. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3763">urllib3/urllib3#3763</a>)</li>
<li>Removed support for end-of-life Python 3.9. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3720">urllib3/urllib3#3720</a>)</li>
<li>Removed support for end-of-life PyPy3.10. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/4979">urllib3/urllib3#4979</a>)</li>
<li>Bumped the minimum supported pyOpenSSL version to 19.0.0. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3777">urllib3/urllib3#3777</a>)</li>
</ul>
<h2>Bugfixes</h2>
<ul>
<li>Fixed a bug where <code>HTTPResponse.read(amt=None)</code> was
ignoring decompressed data buffered from previous partial reads. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3636">urllib3/urllib3#3636</a>)</li>
<li>Fixed a bug where <code>HTTPResponse.read()</code> could cache only
part of the response after a partial read when
<code>cache_content=True</code>. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/4967">urllib3/urllib3#4967</a>)</li>
<li>Fixed <code>HTTPResponse.stream()</code> and
<code>HTTPResponse.read_chunked()</code> to handle <code>amt=0</code>.
(<a
href="https://redirect.github.com/urllib3/urllib3/issues/3793">urllib3/urllib3#3793</a>)</li>
<li>Updated <code>_TYPE_BODY</code> type alias to include missing
<code>Iterable[str]</code>, matching the documented and runtime behavior
of chunked request bodies. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3798">urllib3/urllib3#3798</a>)</li>
<li>Fixed <code>LocationParseError</code> when paths resembling
schemeless URIs were passed to
<code>HTTPConnectionPool.urlopen()</code>. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3352">urllib3/urllib3#3352</a>)</li>
<li>Fixed <code>BaseHTTPResponse.readinto()</code> type annotation to
accept <code>memoryview</code> in addition to <code>bytearray</code>,
matching the <code>io.RawIOBase.readinto</code> contract and enabling
use with <code>io.BufferedReader</code> without type errors. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3764">urllib3/urllib3#3764</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/urllib3/urllib3/blob/main/CHANGES.rst">urllib3's
changelog</a>.</em></p>
<blockquote>
<h1>2.7.0 (2026-05-07)</h1>
<h2>Security</h2>
<p>Addressed high-severity security issues.
Impact was limited to specific use cases detailed in the accompanying
advisories; overall user exposure was estimated to be marginal.</p>
<ul>
<li>
<p>Decompression-bomb safeguards of the streaming API were bypassed:</p>
<ol>
<li>When <code>HTTPResponse.drain_conn()</code> was called after the
response had been
read and decompressed partially.</li>
<li>During the second <code>HTTPResponse.read(amt=N)</code> or
<code>HTTPResponse.stream(amt=N)</code> call when the response was
decompressed
using the official <code>Brotli
<https://pypi.org/project/brotli/></code>__ library.</li>
</ol>
<p>See <code>GHSA-mf9v-mfxr-j63j
<https://github.com/urllib3/urllib3/security/advisories/GHSA-mf9v-mfxr-j63j></code>__
for details.</p>
</li>
<li>
<p>HTTP pools created using
<code>ProxyManager.connection_from_url</code> did not strip
sensitive headers specified in
<code>Retry.remove_headers_on_redirect</code> when
redirecting to a different host.
(<code>GHSA-qccp-gfcp-xxvc
<https://github.com/urllib3/urllib3/security/advisories/GHSA-qccp-gfcp-xxvc></code>__)</p>
</li>
</ul>
<h2>Deprecations and Removals</h2>
<ul>
<li>Used <code>FutureWarning</code> instead of
<code>DeprecationWarning</code> for better
visibility of existing deprecation notices. Rescheduled the removal of
deprecated features to version 3.0.
(<code>[#3763](https://github.com/urllib3/urllib3/issues/3763)
<https://github.com/urllib3/urllib3/issues/3763></code>__)</li>
<li>Removed support for end-of-life Python 3.9.
(<code>[#3720](https://github.com/urllib3/urllib3/issues/3720)
<https://github.com/urllib3/urllib3/issues/3720></code>__)</li>
<li>Removed support for end-of-life PyPy3.10.
(<code>[#4979](https://github.com/urllib3/urllib3/issues/4979)
<https://github.com/urllib3/urllib3/issues/4979></code>__)</li>
<li>Bumped the minimum supported pyOpenSSL version to 19.0.0.
(<code>[#3777](https://github.com/urllib3/urllib3/issues/3777)
<https://github.com/urllib3/urllib3/issues/3777></code>__)</li>
</ul>
<h2>Bugfixes</h2>
<ul>
<li>Fixed a bug where <code>HTTPResponse.read(amt=None)</code> was
ignoring decompressed
data buffered from previous partial reads.
(<code>[#3636](https://github.com/urllib3/urllib3/issues/3636)
<https://github.com/urllib3/urllib3/issues/3636></code>__)</li>
<li>Fixed a bug where <code>HTTPResponse.read()</code> could cache only
part of the
response after a partial read when <code>cache_content=True</code>.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="9a950b92d9"><code>9a950b9</code></a>
Release 2.7.0</li>
<li><a
href="5ec0de499b"><code>5ec0de4</code></a>
Merge commit from fork</li>
<li><a
href="2bdcc44d1e"><code>2bdcc44</code></a>
Merge commit from fork</li>
<li><a
href="f45b0df09d"><code>f45b0df</code></a>
Fix a misleading example for <code>ProxyManager</code> (<a
href="https://redirect.github.com/urllib3/urllib3/issues/4970">#4970</a>)</li>
<li><a
href="577193ca02"><code>577193c</code></a>
Switch to nightly PyPy3.11 in CI for now (<a
href="https://redirect.github.com/urllib3/urllib3/issues/4984">#4984</a>)</li>
<li><a
href="e90af45bb0"><code>e90af45</code></a>
Avoid infinite loop in <code>HTTPResponse.read_chunked</code> when
<code>amt=0</code> (<a
href="https://redirect.github.com/urllib3/urllib3/issues/4974">#4974</a>)</li>
<li><a
href="67ed74fdae"><code>67ed74f</code></a>
Bump dev dependencies (<a
href="https://redirect.github.com/urllib3/urllib3/issues/4972">#4972</a>)</li>
<li><a
href="3abd481097"><code>3abd481</code></a>
Upgrade mypy to version 1.20.2 (<a
href="https://redirect.github.com/urllib3/urllib3/issues/4978">#4978</a>)</li>
<li><a
href="2b8725dfca"><code>2b8725d</code></a>
Drop support for EOL PyPy3.10 (<a
href="https://redirect.github.com/urllib3/urllib3/issues/4979">#4979</a>)</li>
<li><a
href="2944b2a0a6"><code>2944b2a</code></a>
Upgrade <code>setup-chrome</code> and <code>setup-firefox</code> to fix
warnings (<a
href="https://redirect.github.com/urllib3/urllib3/issues/4973">#4973</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/urllib3/urllib3/compare/2.6.3...2.7.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
### What problem does this PR solve?
fix some comments to improve readability
### Type of change
- [x] Documentation Update
---------
Signed-off-by: box4wangjing <box4wangjing@outlook.com>
## Summary
Two bypass vectors in the sandbox code security analyzer allowed
malicious code to pass the safety check undetected and reach the Docker
executor.
### 1. JavaScript: template-literal bypass of `require()` block
The `SecureJavaScriptAnalyzer` regex patterns used `['"]` to match
module names, covering only single and double quotes. An attacker could
use ES6 template literals to bypass all three `require` checks:
`javascript
const cp = require(`child_process`);
async function main() {
return cp.execSync('cat /etc/passwd').toString();
}
`
The same bypass applied to `fs` and `worker_threads`.
**Fix:** Updated all three `require` patterns from `['"]` to `['"\]` to
also match backtick template literals.
### 2. Python: `builtins` not blocked + attribute-call blind spot in
`visit_Call`
`visit_Call` only checked `ast.Name` nodes, so attribute-style calls
like `module.func()` were invisible to the analyzer. Additionally,
`builtins` was absent from `DANGEROUS_IMPORTS`. Combined, this allowed:
`python
import builtins
def main():
builtins.exec('import os; os.system("id")')
`
Neither the import nor the exec call triggered any flag.
**Fix:** Added `builtins` to `DANGEROUS_IMPORTS` and added an
`ast.Attribute` branch to `visit_Call` so that `module.dangerous_func()`
style calls are caught alongside bare `dangerous_func()` calls.
## Tests
Added four regression tests covering each new bypass vector:
- `test_javascript_child_process_template_literal_is_rejected`
- `test_javascript_fs_template_literal_is_rejected`
- `test_python_builtins_import_is_rejected`
- `test_python_attribute_eval_call_is_rejected`
---------
Co-authored-by: bounty-hunter <bounty@hunter.local>
### What problem does this PR solve?
Resolves#14447. *(Note: This supersedes stalled PR #14448 and
implements the requested CodeRabbitAI fixes).*
Currently, the Dockerfiles inside `agent/sandbox/sandbox_base_image`
(both Python and Node.js) have hardcoded Chinese package mirrors. This
forces the mirrors on all users globally, which causes build network
timeouts for contributors outside of China.
This PR introduces an enhancement to fix the issue by:
1. Implementing the `NEED_MIRROR` build argument in the sandbox
Dockerfiles.
2. Replacing static `ENV` instructions with conditional shell logic
inside `RUN` blocks to dynamically set the package registries.
3. Allowing the build to cleanly fall back to default global registries
(`pypi.org` and `npmjs.org`) when `--build-arg NEED_MIRROR=0` is passed.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
---------
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
## Summary
- Tool-type components (Email, Invoke, etc.) fail to resolve template
strings that mix variable references with literal text in their
parameters.
- This adds template string resolution to `get_input()` in
`ComponentBase`, reusing existing `get_input_elements_from_text()` and
`string_format()` methods.
## Problem
`get_input()` in `ComponentBase` handles two cases:
1. **Pure reference** (`{Component:ID@field}`) — resolved via
`is_reff()` + `get_variable_value()`
2. **Literal value** — passed through as-is
But template strings like `{UserFillUp:X@name}@duke.edu` or `Question
from {Agent:Y@topic}` fall through to the literal branch because
`is_reff()` returns `False` (it expects the entire string to be a single
reference). The unresolved template is passed directly to the tool.
This affects **all** tool components (Email, Invoke, etc.) that need
mixed reference + text parameters — for example, constructing email
addresses or subjects dynamically.
## Fix
```python
# In get_input(), between is_reff check and literal fallback:
elif isinstance(v, str) and re.search(self.variable_ref_patt, v):
elements = self.get_input_elements_from_text(v)
kv = {k: e.get('value', '') for k, e in elements.items()}
self.set_input_value(var, self.string_format(v, kv))
```
This reuses `get_input_elements_from_text()` and `string_format()` which
are already used by `Message` components for the same purpose. The fix
only activates when the string contains at least one variable reference
pattern but is not a pure reference.
## Test plan
- [x] Pure references (`{Component:ID@field}`) still resolve correctly
via `is_reff()` path
- [x] Literal values without references pass through unchanged
- [x] Template strings like `{ref}@duke.edu` resolve the reference and
keep the literal suffix
- [x] Template strings like `Question from {ref}` resolve correctly
- [x] Multiple references in one string (`{ref1} and {ref2}`) both
resolve
- [x] Message components unaffected (they use their own template
resolution in `_run`)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: wanghualoong <wanghualoong@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- When an agent workflow has multiple `UserFillUp` pause points,
`canvas.run()` calls `reset(True)` on **all** components at the start of
each run. This clears outputs from components that completed in prior
runs, so downstream references like `{Agent:XXX@content}` resolve to
`None`.
- This fix only resets components on the **current execution path**
(`self.path`), preserving outputs from previously completed components.
## Problem
In a multi-step agent (e.g. draft email → user confirms → send email):
1. First `run()`: Agent drafts content, UserFillUp pauses for user input
→ Agent output is saved
2. Second `run()`: User submits input, but `reset(True)` clears **all**
components including the Agent that already completed
3. Email component references `{Agent:XXX@content}` → gets `None`
instead of the draft
This affects **all** agents that reference upstream component outputs
after a UserFillUp pause point.
## Fix
```python
# Before: reset ALL components
for k, cpn in self.components.items():
self.components[k]["obj"].reset(True)
# After: only reset components on current execution path
path_set = set(self.path)
for k, cpn in self.components.items():
if k in path_set:
self.components[k]["obj"].reset(True)
```
`self.path` already tracks the current execution path. For agents
without UserFillUp (single run), `path` contains all components, so
behavior is unchanged.
## Test plan
- [x] Agent with single UserFillUp: outputs from prior components are
preserved after resume
- [x] Agent with multiple UserFillUp: each resume preserves all
previously completed outputs
- [x] Agent without UserFillUp: behavior unchanged (all components in
path, all reset)
- [x] Webhook-triggered agents: unaffected (path includes all components
on first run)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: wanghualoong <wanghualoong@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
### What problem does this PR solve?
Fixes#14412.
`common.metadata_utils.meta_filter` evaluates user-defined metadata
conditions in Python after `DocMetadataService.get_flatted_meta_by_kbs`
loads the entire `meta_fields` table into memory. Past a few thousand
documents per knowledge base this becomes a memory bottleneck and a
wasted ES round-trip — every filter request currently fetches up to
10000 metadata rows even when the resulting `doc_ids` list is tiny.
This PR adds an ES push-down path that translates the same filter
language into a `bool` query and returns just the matching document IDs.
**Changes**
- `common/metadata_es_filter.py` *(new)*: pure-Python translator from
the RAGflow filter list to ES DSL. Covers every operator the in-memory
path supports (`=`, `≠`, `>`, `<`, `≥`, `≤`, `in`, `not in`, `contains`,
`not contains`, `start with`, `end with`, `empty`, `not empty`) with
`case_insensitive: true` on `prefix` and `wildcard` for parity with the
existing lower-cased Python comparisons. User wildcard metacharacters
are escaped before being injected into `wildcard` patterns. Negative
operators (`≠`, `not in`, `not contains`, ranges) are wrapped with an
`exists` guard so they do not accidentally match documents missing the
key, matching the legacy `if k not in metas` behaviour.
- `api/db/services/doc_metadata_service.py`: new
`DocMetadataService.filter_doc_ids_by_meta_pushdown(kb_ids, filters,
logic)` that returns the doc IDs ES matched, or `None` to signal the
caller should fall back to the in-memory path. Returns `None` when the
active doc store is Infinity (`meta_fields` is a JSON column, not a
dotted-object mapping), when any filter cannot be expressed in DSL
(`UnsupportedMetaFilter`), or when the ES request or metadata index
lookup errors.
- `common/metadata_utils.py`: `apply_meta_data_filter` accepts an
optional `kb_ids` argument. When supplied, conditions go through
push-down first via a new `_try_meta_pushdown` helper; on `None` the
function falls back to the original `meta_filter` call. Default
behaviour is unchanged for callers that don't pass `kb_ids`.
- Updated all four callers (`agent/tools/retrieval.py`,
`api/db/services/dialog_service.py` ×2,
`api/apps/services/dataset_api_service.py`, `api/apps/sdk/session.py`)
to forward `kb_ids` so the push-down path is exercised in production.
- `test/unit_test/common/test_metadata_es_filter.py` *(new)*: 35 unit
tests covering every operator's DSL shape, value coercion
(`ast.literal_eval`, lowercasing, ISO-date pass-through), wildcard
escaping, OR-logic wrapping that protects negative clauses, and the
doc-ID extractor.
**Behaviour preserved**
- The in-memory `meta_filter` is untouched and still services every
fallback case (Infinity backend, unknown operators, ES outages).
- The eligibility / credibility / issue-multiplier semantics described
in the LLM-driven `auto` and `semi_auto` modes still hand the LLM the
full in-memory `metas` dict to choose conditions from. Only the
*evaluation* of those generated conditions is pushed down.
- Existing tests in
`test/unit_test/common/test_metadata_filter_operators.py` continue to
pass (14/14).
**Test plan**
- `pytest test/unit_test/common/test_metadata_es_filter.py` — 35 passed.
- `pytest test/unit_test/common/test_metadata_filter_operators.py` — 14
passed.
- `ruff check` clean on every modified file.
- Reviewer please validate the ES query shapes against a live cluster —
particularly `case_insensitive` on `wildcard` and `prefix` (requires ES
7.10+) and the `exists` + `must_not` pairing for `≠`.
**Notes**
- The first cut caps each push-down request at 10000 results, matching
the existing `get_flatted_meta_by_kbs` limit, and logs a warning when
the cap is hit. A `search_after` follow-up would let us drop the cap
entirely once the push-down path is validated.
- Operator parity with the in-memory path is exact for the canonical
unicode operators (`≥`, `≤`, `≠`) used internally; the ASCII aliases
(`>=`, `<=`, `!=`) are normalised by `convert_conditions` before they
reach the translator.
### Type of change
- [x] Performance Improvement
---------
Co-authored-by: sxxtony <sxxtony@users.noreply.github.com>
### What problem does this PR solve?
Feat: support local provider for code exec component & remove some
outdated models
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
preserve doc generator download metadata
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
agent toolcall null response & schema validation & DeepSeek think
history
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
prioritize explore session ID and reset default conversation variables
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Before migration: GET /v1/document/artifact/<filename>
After migration: GET /api/v1/documents/artifact/<filename>
### Type of change
- [x] Refactoring
### What problem does this PR solve?
The POST /upload_info?url=<url> endpoint accepted a user-supplied URL
and passed it directly to AsyncWebCrawler without any validation. There
were no restrictions on URL scheme, destination hostname, or resolved IP
address. This allowed any authenticated user to instruct the server to
make outbound HTTP requests to internal infrastructure — including RFC
1918 private networks, loopback addresses, and cloud metadata services
such as http://169.254.169.254 — effectively using the server as a proxy
for internal network reconnaissance or credential theft.
This PR adds an SSRF guard (_validate_url_for_crawl) that runs before
any crawl is initiated. It enforces an allowlist of safe schemes
(http/https), resolves the hostname at validation time, and rejects any
URL whose resolved IP falls within a private or reserved network range.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Add tips for installing Chinse fonts under code sandbox. Otherwise,
`matplotlib `won't render Chinese correctly.
<img width="2082" height="1186" alt="sales_analysis"
src="https://github.com/user-attachments/assets/57e675ab-1e92-4662-9aeb-ad72a6121eb5"
/>
### Type of change
- [x] Documentation Update
### What problem does this PR solve?
Add a new agent template that demonstrates how to leverage the
`CodeExec` component to do the data analysis.
### Type of change
- [x] Other (please describe): Agent template
### What problem does this PR solve?
Updated ingestion pipeline template descriptions for better technical
accuracy and readability.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Sandbox don't attach attachment metadata
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Feat: update templates && add resume template
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
- Implemented a helper function to convert markdown cell text to native
numeric types for Excel output.
- Ensured that leading zeros are preserved and handled various numeric
formats, including those with thousand separators and scientific
notation.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Closes#13907
The template catalog had duplicate files (e.g. `*_r.json`) only to place
the same template into multiple sidebar groups.
This increases maintenance cost and makes template updates error-prone.
This PR adds first-class support for multiple template categories in a
single file via `canvas_types`, then removes duplicate template files.
What changed:
- Added `canvas_types` to `CanvasTemplate` model and DB migration.
- Added normalization logic when loading templates:
- accepts legacy `canvas_type`
- accepts new `canvas_types`
- merges/deduplicates values
- preserves backward compatibility by keeping `canvas_type` as first
normalized value.
- Updated template import flow to load only `.json` files and in stable
sorted order.
- Updated frontend template filtering to match on `canvas_types` first,
with fallback to legacy `canvas_type`.
- Consolidated duplicated template pairs into single files and removed:
- `deep_search_r.json`
- `reflective_academic_paper_generator_r.json`
- `seo_article_writer_r.json`
- Added regression/edge-case tests for category normalization and route
serialization expectations.
### Type of change
- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
### What problem does this PR solve?
Sandbox cannot accept large args list.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Close#14018
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Problem
In Agent applications, even with the cite option enabled, only inline
[ID: x] citation markers are visible (showing chunk content on hover).
The Agent does not display the referenced file cards below the response,
unlike Chat applications.
### Root Cause
The Agent's Retrieval tool (agent/tools/retrieval.py) calls
retriever.retrieval() with aggs=False, which means the retrieval results
do not include doc_aggs (document aggregation) data. Without doc_aggs,
the frontend ReferenceDocumentList component has no data to render the
file cards.
In contrast, the Chat application (api/db/services/dialog_service.py)
calls the same retriever.retrieval() method with aggs=True.
### Fix
Changed aggs=False to aggs=True in agent/tools/retrieval.py so that
document aggregation data is returned along with the retrieved chunks.