Commit Graph

461 Commits

Author SHA1 Message Date
e7d45dd645 Feat: Expose Doc Generator file metadata as discrete outputs (#15080)
Declare doc_id, filename, mime_type, and size as separate outputs on the
Document Generation component so downstream nodes (e.g., the Code
component) can consume them via the variable picker. The existing
download JSON blob is preserved unchanged for the Message component's
download-chip rendering.

### What problem does this PR solve?

The Document Generation component previously exposed only a single
`download` output —
a JSON-encoded blob containing the file's `doc_id`, `filename`,
`mime_type`, `size`,
and base64 payload. On top of that, the variable picker actively hides
this `download`
entry from every consumer except the Message component (because the
embedded base64 is
  too heavy to splat into arbitrary downstream nodes).

The combined effect: users wiring the Doc Generator's output into a Code
component had
no way to retrieve basic file info such as `file_name` or `doc_id` from
the picker,
blocking workflows that need to post-process the generated file (e.g.,
registering it
  elsewhere, custom delivery, follow-up API calls).

This PR declares `doc_id`, `filename`, `mime_type`, and `size` as
**discrete outputs**
on the Document Generation component, alongside the existing `download`
blob. The new
  fields:

- Appear in the variable picker for **all** downstream nodes, including
the Code
  component, so users can bind them directly to script arguments.
- Are cheap scalars only — no base64 payload leaks into other
components.
- Leave the existing `download` JSON blob completely untouched, so the
Message
component's download-chip rendering (which parses that blob via
`_is_download_info`)
  keeps working with no behavior change.

  Changes:
- `agent/component/docs_generator.py` — declare the four new outputs in
  `DocGeneratorParam` and emit them via `set_output(...)` in `_invoke`.
- `web/src/pages/agent/constant/index.tsx` — extend
`initialDocGeneratorValues.outputs`
   with the new keys.
- `web/src/pages/agent/form/doc-generator-form/index.tsx` — mirror the
new outputs in
  the zod schema so the form is valid.

No changes needed to the picker's existing `download`-hiding filter — it
matches only
on the literal output name `download`, so the new metadata entries fall
through
  naturally.

  Reported in: https://github.com/infiniflow/ragflow/issues/14461.
  ### Type of change

  - [x] New Feature (non-breaking change which adds functionality)
2026-05-25 16:05:00 +08:00
8f90740d2e feat: pass chat_template_kwargs through agent chat completion (#14542)
### What problem does this PR solve?

The agent API currently does not pass chat_template_kwargs to the
underlying LLM call path, so clients cannot control template-level model
behavior (such as thinking-mode toggles) when invoking
/agents/chat/completion. This PR adds passthrough support for
chat_template_kwargs across agent execution flows (session and
non-session, streaming and non-streaming) by propagating it through
canvas runtime state and into LLM invocation kwargs. This addresses the
feature gap raised in [Issue
#14182](https://github.com/infiniflow/ragflow/issues/14182).

Closes #14182 

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-22 15:15:49 +08:00
3e5b11a523 Feat(browser control):Add new agent component 'browser' to control browser by AI (#14888)
### What problem does this PR solve?
This PR adds a new `Browser` operator to Agent workflows, enabling
prompt-driven browser automation in RAGFlow.Technically based
‘Browser-Use’

It includes:
- Backend browser component execution with tenant LLM integration
- Upload source support (file IDs, URLs, variables, CSV/JSON array)
- Downloaded file persistence to RAGFlow storage
- Frontend node/operator integration, form config, icon, and i18n
updates
- Unit tests for upload/download and ID parsing logic
- Dependency and Docker updates for browser-use runtime support

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-21 15:32:32 +08:00
b28e134944 Feat: add local & ssh provider in admin panel (#15039)
### What problem does this PR solve?

Feat: add local & ssh provider in admin panel

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-20 16:56:20 +08:00
7edabdf7c3 fix(retrieval): keep manual metadata filter reusable inside Iteration (#14849)
## What problem does this PR solve?

Closes #12582.

When a Retrieval component sits inside an Iteration with a **manual**
metadata filter that references the iteration variable (e.g.
`{IterationItem:abc@item}`), every iteration reuses the value resolved
on the **first** pass.

Root cause: [`_resolve_manual_filter` in
`agent/tools/retrieval.py`](https://github.com/infiniflow/ragflow/blob/main/agent/tools/retrieval.py#L144-L171)
mutated `flt["value"]` in place. The `filters` list passed in is the
live `self._param.meta_data_filter["manual"]` (see
[`apply_meta_data_filter` in
`common/metadata_utils.py:257-261`](https://github.com/infiniflow/ragflow/blob/main/common/metadata_utils.py#L257-L261)),
so after the first iteration the param dict permanently held the
resolved string instead of the original variable reference.

```text
iter #1: flt["value"] = "{IterationItem:abc@item}"  →  resolved to "AI"
         after mutation: flt["value"] = "AI"        ← written back into _param

iter #2: flt["value"] = "AI"                         ← no {…} matches
         retrieval keeps filtering by "AI" forever
```

This PR returns a shallow copy with the resolved value instead, leaving
the original filter (and its variable reference) intact for the next
iteration.

## Type of change

- [x] Bug fix (non-breaking change which fixes an issue)

## Test plan

- [ ] Build an agent: `Agent (structured output → list of areas) →
Iteration → Retrieval (manual filter: Area = {IterationItem/Item}) →
Message`. Run with a multi-area query and confirm each iteration's
Retrieval result matches its own item, not the first item.
- [ ] Regression: Retrieval with a manual metadata filter outside an
Iteration still resolves the variable correctly on each request.
- [ ] Regression: Retrieval with no metadata filter and with `auto` /
`semi_auto` filters behave unchanged.
2026-05-19 15:08:31 +08:00
f169ab4b39 feat(tts): cache synthesized speech in Redis to avoid redundant calls (#14851)
## What problem does this PR solve?

Closes #12017.

TTS output is deterministic for a given `(model, text)` pair, so
re-running the same text through the same TTS model produces the same
bytes — yet `Canvas.tts` and `dialog_service.tts` re-synthesized on
every request. That's slow and wastes provider quota whenever the same
assistant response is replayed, shared across users, or repeated within
a session.

### Change

New helper `rag/utils/tts_cache.py` with `synthesize_with_cache(tts_mdl,
cleaned_text)`:

- **Key:** `tts:cache:{model_id}:{sha256(text)}` — separate namespace
per model, identical cleaned text reuses a single entry across both call
sites.
- **Value:** the hex-encoded audio blob both call sites already
returned. No format change for downstream consumers.
- **TTL:** 7 days by default, configurable via
`RAGFLOW_TTS_CACHE_TTL_SECONDS`.
- **Failure modes:** a Redis hiccup falls back to direct synthesis; a
failed synthesis still returns `None` (existing contract preserved).


[`Canvas.tts`](https://github.com/infiniflow/ragflow/blob/main/agent/canvas.py#L683-L724)
and
[`dialog_service.tts`](https://github.com/infiniflow/ragflow/blob/main/api/db/services/dialog_service.py#L1367-L1380)
now route through the helper; the per-file bytes-accumulation/hex-encode
loop has been removed in favor of one shared implementation.

## Type of change

- [x] New Feature (non-breaking change which adds functionality)

## Test plan

- [ ] **Cache hit, chat path:** Configure a dialog with TTS enabled, ask
the same question twice with `stream=false`. Verify the second response
returns the same `audio_binary` and that the second invocation doesn't
hit the TTS provider (e.g., observe provider-side logs / usage counters;
check no `LLMBundle.tts can't update token usage` log line on the second
run).
- [ ] **Cache hit, agent path:** Same exercise via a Conversational
Agent that includes a Message component playing back the answer.
- [ ] **Cache isolation per model:** Switch tenant's `tts_id` between
two models, run the same text against each — confirm the second model's
first synthesis still happens (no cross-model hits).
- [ ] **TTL override:** Set `RAGFLOW_TTS_CACHE_TTL_SECONDS=120`, confirm
the entry expires after 2 minutes.
- [ ] **Redis unavailable:** Stop Redis (or break the connection).
Verify the TTS endpoint still works — synthesis falls back to direct
calls, with a `TTS cache lookup failed` / `TTS cache store failed`
warning logged.
- [ ] **Failure path:** Configure a TTS model with an invalid API key,
ensure the response still returns successfully with `audio_binary=None`
(no regression vs. current behavior).
2026-05-19 14:20:40 +08:00
ff318aba7a fix: correct literal_eval dispatch and bool isinstance ordering in agent components (#13988)
## Summary

This PR fixes 3 bugs in agent components:

### Bug 1: `DataOperations._invoke()` dispatches `"literal_eval"` to
wrong handler

**File:** `agent/component/data_operations.py`, line 76

The `_invoke()` method compares `self._param.operations` against
`"recursive_eval"` (line 76), but the valid value defined in
`DataOperationsParam.__init__()` (line 29) and validated in `check()`
(line 43) is `"literal_eval"`. This means selecting the `literal_eval`
operation from the frontend would never match, and the method
`_literal_eval()` would never be called.

**Fix:** Change `"recursive_eval"` to `"literal_eval"` in the dispatch
condition.

### Bug 2: `VariableAssigner._clear()` — `bool` branch unreachable

**File:** `agent/component/variable_assigner.py`, lines 95–100

In Python, `bool` is a subclass of `int` (`True` is `isinstance(True,
int) == True`). The `isinstance(variable, int)` check on line 95 catches
boolean values before the `isinstance(variable, bool)` check on line 99,
making the bool branch unreachable. A boolean variable would be cleared
to `0` instead of `False`.

**Fix:** Move the `isinstance(variable, bool)` check before
`isinstance(variable, int)`.

### Bug 3: `LoopItem.evaluate_condition()` — `bool` branch unreachable

**File:** `agent/component/loopitem.py`, lines 67–93

Same issue as Bug 2: `isinstance(var, (int, float))` on line 67 catches
boolean values before `isinstance(var, bool)` on line 85. Boolean
variables would be evaluated with numeric operators (`=`, `≠`, `>`,
etc.) instead of boolean operators (`is`, `is not`).

**Fix:** Move the `isinstance(var, bool)` check before `isinstance(var,
(int, float))`.

## Test plan

- [ ] Verify `DataOperations` with `literal_eval` operation correctly
invokes `_literal_eval()`
- [ ] Verify `VariableAssigner._clear()` returns `False` for boolean
variables (not `0`)
- [ ] Verify `LoopItem.evaluate_condition()` uses boolean operators for
`True`/`False` values


🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Fixed operation routing logic to correctly dispatch the "literal_eval"
operation to its handler.

* **Refactor**
* Reorganized conditional branch ordering in agent components to improve
code structure and maintainability without affecting functional
behavior.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-18 09:58:45 +08:00
14c0985182 feat: bump Python minimum from 3.12 to 3.13, drop strenum backport (#14767)
Closes #14753

## What changed

| File | Change |
|---|---|
| `pyproject.toml` | `requires-python` → `>=3.13,<3.15`; remove
`strenum==0.4.15` |
| `Dockerfile` | `uv python install 3.13`, `uv sync --python 3.13` |
| `.github/workflows/tests.yml` | `uv sync --python 3.13` on both matrix
legs |
| `CLAUDE.md` | dev setup command + requirements note updated |
| `deepdoc/parser/mineru_parser.py` | `from strenum import StrEnum` →
`from enum import StrEnum` |
| `agent/tools/code_exec.py` | same |

`StrEnum` has been in the stdlib since Python 3.11 — the `strenum`
backport package is no longer needed once the floor is 3.13.

## Why uv.lock is not regenerated

`uv lock --python 3.13` fails because:

1. The infiniflow/graspologic fork pins `numpy>=1.26.4,<2.0.0`
2. `tensorflow-cpu>=2.20.0` (the first release with cp313 wheels)
depends on `ml-dtypes>=0.5.1`, which requires `numpy>=2.1.0`
3. These two constraints are irreconcilable on Python 3.13

The lockfile regeneration requires loosening the `numpy` upper bound in
the `infiniflow/graspologic` fork. Once that fork commit is updated and
the SHA in `pyproject.toml:49` is bumped, `uv lock --python 3.13` will
succeed.

## RFC corrections

Two claims in the original RFC (#14753) did not hold up under code
review:

- **"graspologic hard-blocks 3.13"** — the infiniflow fork at the pinned
commit has no `<3.13` Python constraint. The blocker is the transitive
`numpy<2.0.0` conflict with tensorflow-cpu's test dependency, not a
direct Python version cap.
- **"free-threading throughput gains for I/O-bound workload"** — Python
3.13 free-threading requires a special `--disable-gil` build and
provides no benefit for async I/O code (the GIL is already released
during I/O). The real motivation is forward compatibility and improved
error messages.
2026-05-15 14:40:53 +08:00
4c68a6b86c fix(agent): pass top_k and fix similarity weight slider behavior (#14760)
### What problem does this PR solve?

This PR fixes two issues in Agent Retrieval behavior and configuration
UX:

1. `top_k` configured in Agent Retrieval was not passed down to the
backend retriever call, so retrieval could ignore the configured vector
recall limit.
2. Similarity weight slider semantics were confusing in Agent forms
because the Agent field stores `keywords_similarity_weight` while UI
interactions were interpreted as vector weight. This could cause
displayed values and actual behavior to diverge.

This PR ensures Agent retrieval uses configured `top_k`, and makes the
slider behavior consistent and explicit for both vector and keyword
weight modes.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-15 10:49:14 +08:00
63df01fe3f fix(agent): handle duplicate MCP tool names (#14217)
### What problem does this PR solve?

When multiple MCP servers expose tools with the same name, the agent
currently registers those tools using their original MCP names. This can
lead to two issues:

- later MCP tools may overwrite earlier ones in the agent tool map
- duplicate function names may be exposed to the LLM

This PR fixes duplicate MCP tool-name handling by applying the same
indexed naming strategy already used for native agent tools. Native
tools are exposed with generated names such as `<tool_name>_<index>` to
avoid collisions, and MCP tools now follow the same convention for
consistency.

Specifically, this PR:

- assigns unique indexed function names to MCP tools exposed to the LLM
- preserves each MCP tool's original server-side name in an
`MCPToolBinding`
- dispatches MCP calls using the original MCP tool name while keeping
the indexed name in the agent tool map
- allows MCP metadata conversion to override only the OpenAI function
name without modifying the original MCP tool metadata

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)


### Validation

The validation was performed using two MCP servers. Both servers exposed
a tool with the same name: `mcp0`. Both tools take no input parameters.

**MCP Server One:**
<img width="1780" height="625" alt="ONE"
src="https://github.com/user-attachments/assets/801a2654-fc10-4b71-b31c-81841fd40c55"
/>

**MCP Server Two:**
<img width="1777" height="624" alt="Second"
src="https://github.com/user-attachments/assets/c095151d-7bdf-47c8-9bfe-6aaf4a01b944"
/>

**Before the fix:**
When invoking `mcp0`, only the `mcp0` tool from the MCP server injected
later could be called successfully. As shown below, both `mcp0` tools
were present, but only the later-registered one was actually invokable.

<img width="694" height="935" alt="Three"
src="https://github.com/user-attachments/assets/3b9d7ab2-1765-492c-b8e0-bf05a69933ca"
/>

**After the fix:**
Both `mcp0` tools can now be invoked correctly.

<img width="737" height="1095" alt="F"
src="https://github.com/user-attachments/assets/6e896627-2b7f-41bb-becc-daa0c73ff58f"
/>

<img width="730" height="1090" alt="six"
src="https://github.com/user-attachments/assets/aba75593-26ae-4e3b-951d-b45ff177fd32"
/>
2026-05-14 15:28:39 +08:00
4bfdb1e123 fix: correct nested path traversal in set_variable_param_value (#13986)
## Summary

`Graph.set_variable_param_value()` in `agent/canvas.py` has a bug in its
nested path traversal logic. The `for` loop iterates through **all**
keys in the path (including the last one), descending into every level.
After the loop, it then tries to set `cur[keys[-1]] = value`, but `cur`
has already descended one level too deep.

**Example:** For `path = "a.b"`, `value = "hello"`:
- **Before (bug):** `obj["a"]["b"]` becomes `{"b": "hello"}` instead of
`"hello"`
- **After (fix):** `obj["a"]["b"]` becomes `"hello"` as expected

The fix changes `for key in keys:` to `for key in keys[:-1]:`, so the
loop only navigates to the parent dict, and the final key is set
directly. This is consistent with how the read-side counterpart
`get_variable_param_value()` works.

This method is called by `set_variable_value()` when assigning to nested
variable paths (e.g., `component@root.nested.key`), which is used by the
`VariableAssigner` component.

## Test plan
- [ ] Create a canvas with a VariableAssigner that writes to a nested
path (e.g., `component@obj.nested.key`)
- [ ] Verify the value is set correctly at the expected path, not
wrapped in an extra dict layer
- [ ] Verify single-key paths (e.g., `component@key`) still work
correctly

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Bug Fixes**
* Fixed a bug in variable parameter assignment where nested structures
were being incorrectly modified, ensuring values are now properly set at
their intended locations without unintended overwrites.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-14 13:27:04 +08:00
cc21dc7f00 fix: replace broken assert with raise ValueError in variable_assigner and loop (#13906)
\`assert \"string\"\` always passes in Python because non-empty strings
are truthy. This silently skips input validation:

- **variable_assigner.py line 51**: \`assert \"Variable is not
complete.\"\` → \`raise ValueError(\"Variable is not complete.\")\`
- **loop.py line 59**: \`assert \"Loop Variable is not complete.\"\` →
\`raise ValueError(\"Loop Variable is not complete.\")\`

Without this fix, incomplete variables pass validation silently and
cause a confusing KeyError on the next line.
2026-05-14 12:33:17 +08:00
f85e18afbc Refact: sandbox quickstart.md & add tutorial for code exec component (#14786)
### What problem does this PR solve?

Refact: sandbox quickstart.md && add tutorial for code exec component

### Type of change

- [x] Refactoring


<img width="700" alt="img_v3_0211j_dcff835b-e3bb-4c77-9bc5-3b31a983229g"
src="https://github.com/user-attachments/assets/7842fc0f-639a-458f-b164-bc81a99ce4a5"
/>

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2026-05-12 14:42:20 +08:00
e8adc977bd Fix: some agent bug (#14829)
### What problem does this PR solve?

fix: 
update null checks to use 'is None' for better clarity
replace RAGFlowSelect with SelectWithSearch in DebugContent
add max height and overflow to DialogContent in ParameterDialog
 remove unused types from DataOperationsForm

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-12 14:41:49 +08:00
02c2587ca4 fix(agent): support iteration item aliases in child nodes (#14146)
## Summary
This PR fixes the iteration variable mismatch reported in #14142.

Changes:
- restore compatibility for `IterationItem@result` by exposing `result`
alongside `item`
- support bare iteration aliases like `{item}`, `{index}`, and
`{result}` inside iteration child-node inputs
- add focused unit/runtime tests covering both alias styles and
multi-item iteration execution

## Validation
```bash
pytest -q --noconftest \
  test/testcases/test_web_api/test_canvas_app/test_iterationitem_unit.py \
  test/testcases/test_web_api/test_canvas_app/test_iteration_runtime_unit.py \
  test/testcases/test_web_api/test_canvas_app/test_invoke_component_unit.py
```

Result: `12 passed`

Closes #14142
2026-05-12 13:05:21 +08:00
139b76d2b1 Chore(deps): Bump urllib3 from 2.6.3 to 2.7.0 in /agent/sandbox (#14824)
Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.6.3 to 2.7.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/urllib3/urllib3/releases">urllib3's
releases</a>.</em></p>
<blockquote>
<h2>2.7.0</h2>
<h2>🚀 urllib3 is fundraising for HTTP/2 support</h2>
<p><a
href="https://sethmlarson.dev/urllib3-is-fundraising-for-http2-support">urllib3
is raising ~$40,000 USD</a> to release HTTP/2 support and ensure
long-term sustainable maintenance of the project after a sharp decline
in financial support. If your company or organization uses Python and
would benefit from HTTP/2 support in Requests, pip, cloud SDKs, and
thousands of other projects <a
href="https://opencollective.com/urllib3">please consider contributing
financially</a> to ensure HTTP/2 support is developed sustainably and
maintained for the long-haul.</p>
<p>Thank you for your support.</p>
<h2>Security</h2>
<p>Addressed high-severity security issues. Impact was limited to
specific use cases detailed in the accompanying advisories; overall user
exposure was estimated to be marginal.</p>
<ul>
<li>
<p>Decompression-bomb safeguards of the streaming API were bypassed:</p>
<ol>
<li>When <code>HTTPResponse.drain_conn()</code> was called after the
response had been read and decompressed partially. (Reported by <a
href="https://github.com/Cycloctane"><code>@​Cycloctane</code></a>)</li>
<li>During the second <code>HTTPResponse.read(amt=N)</code> or
<code>HTTPResponse.stream(amt=N)</code> call when the response was
decompressed using the official <a
href="https://pypi.org/project/brotli/">Brotli</a> library. (Reported by
<a
href="https://github.com/kimkou2024"><code>@​kimkou2024</code></a>)</li>
</ol>
<p>See GHSA-mf9v-mfxr-j63j for details.</p>
</li>
<li>
<p>HTTP pools created using
<code>ProxyManager.connection_from_url</code> did not strip sensitive
headers specified in <code>Retry.remove_headers_on_redirect</code> when
redirecting to a different host. (GHSA-qccp-gfcp-xxvc reported by <a
href="https://github.com/christos-spearbit"><code>@​christos-spearbit</code></a>)</p>
</li>
</ul>
<h2>Deprecations and Removals</h2>
<ul>
<li>Used <code>FutureWarning</code> instead of
<code>DeprecationWarning</code> for better visibility of existing
deprecation notices. Rescheduled the removal of deprecated features to
version 3.0. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3763">urllib3/urllib3#3763</a>)</li>
<li>Removed support for end-of-life Python 3.9. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3720">urllib3/urllib3#3720</a>)</li>
<li>Removed support for end-of-life PyPy3.10. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/4979">urllib3/urllib3#4979</a>)</li>
<li>Bumped the minimum supported pyOpenSSL version to 19.0.0. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3777">urllib3/urllib3#3777</a>)</li>
</ul>
<h2>Bugfixes</h2>
<ul>
<li>Fixed a bug where <code>HTTPResponse.read(amt=None)</code> was
ignoring decompressed data buffered from previous partial reads. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3636">urllib3/urllib3#3636</a>)</li>
<li>Fixed a bug where <code>HTTPResponse.read()</code> could cache only
part of the response after a partial read when
<code>cache_content=True</code>. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/4967">urllib3/urllib3#4967</a>)</li>
<li>Fixed <code>HTTPResponse.stream()</code> and
<code>HTTPResponse.read_chunked()</code> to handle <code>amt=0</code>.
(<a
href="https://redirect.github.com/urllib3/urllib3/issues/3793">urllib3/urllib3#3793</a>)</li>
<li>Updated <code>_TYPE_BODY</code> type alias to include missing
<code>Iterable[str]</code>, matching the documented and runtime behavior
of chunked request bodies. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3798">urllib3/urllib3#3798</a>)</li>
<li>Fixed <code>LocationParseError</code> when paths resembling
schemeless URIs were passed to
<code>HTTPConnectionPool.urlopen()</code>. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3352">urllib3/urllib3#3352</a>)</li>
<li>Fixed <code>BaseHTTPResponse.readinto()</code> type annotation to
accept <code>memoryview</code> in addition to <code>bytearray</code>,
matching the <code>io.RawIOBase.readinto</code> contract and enabling
use with <code>io.BufferedReader</code> without type errors. (<a
href="https://redirect.github.com/urllib3/urllib3/issues/3764">urllib3/urllib3#3764</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/urllib3/urllib3/blob/main/CHANGES.rst">urllib3's
changelog</a>.</em></p>
<blockquote>
<h1>2.7.0 (2026-05-07)</h1>
<h2>Security</h2>
<p>Addressed high-severity security issues.
Impact was limited to specific use cases detailed in the accompanying
advisories; overall user exposure was estimated to be marginal.</p>
<ul>
<li>
<p>Decompression-bomb safeguards of the streaming API were bypassed:</p>
<ol>
<li>When <code>HTTPResponse.drain_conn()</code> was called after the
response had been
read and decompressed partially.</li>
<li>During the second <code>HTTPResponse.read(amt=N)</code> or
<code>HTTPResponse.stream(amt=N)</code> call when the response was
decompressed
using the official <code>Brotli
&lt;https://pypi.org/project/brotli/&gt;</code>__ library.</li>
</ol>
<p>See <code>GHSA-mf9v-mfxr-j63j
&lt;https://github.com/urllib3/urllib3/security/advisories/GHSA-mf9v-mfxr-j63j&gt;</code>__
for details.</p>
</li>
<li>
<p>HTTP pools created using
<code>ProxyManager.connection_from_url</code> did not strip
sensitive headers specified in
<code>Retry.remove_headers_on_redirect</code> when
redirecting to a different host.
(<code>GHSA-qccp-gfcp-xxvc
&lt;https://github.com/urllib3/urllib3/security/advisories/GHSA-qccp-gfcp-xxvc&gt;</code>__)</p>
</li>
</ul>
<h2>Deprecations and Removals</h2>
<ul>
<li>Used <code>FutureWarning</code> instead of
<code>DeprecationWarning</code> for better
visibility of existing deprecation notices. Rescheduled the removal of
deprecated features to version 3.0.
(<code>[#3763](https://github.com/urllib3/urllib3/issues/3763)
&lt;https://github.com/urllib3/urllib3/issues/3763&gt;</code>__)</li>
<li>Removed support for end-of-life Python 3.9.
(<code>[#3720](https://github.com/urllib3/urllib3/issues/3720)
&lt;https://github.com/urllib3/urllib3/issues/3720&gt;</code>__)</li>
<li>Removed support for end-of-life PyPy3.10.
(<code>[#4979](https://github.com/urllib3/urllib3/issues/4979)
&lt;https://github.com/urllib3/urllib3/issues/4979&gt;</code>__)</li>
<li>Bumped the minimum supported pyOpenSSL version to 19.0.0.
(<code>[#3777](https://github.com/urllib3/urllib3/issues/3777)
&lt;https://github.com/urllib3/urllib3/issues/3777&gt;</code>__)</li>
</ul>
<h2>Bugfixes</h2>
<ul>
<li>Fixed a bug where <code>HTTPResponse.read(amt=None)</code> was
ignoring decompressed
data buffered from previous partial reads.
(<code>[#3636](https://github.com/urllib3/urllib3/issues/3636)
&lt;https://github.com/urllib3/urllib3/issues/3636&gt;</code>__)</li>
<li>Fixed a bug where <code>HTTPResponse.read()</code> could cache only
part of the
response after a partial read when <code>cache_content=True</code>.</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="9a950b92d9"><code>9a950b9</code></a>
Release 2.7.0</li>
<li><a
href="5ec0de499b"><code>5ec0de4</code></a>
Merge commit from fork</li>
<li><a
href="2bdcc44d1e"><code>2bdcc44</code></a>
Merge commit from fork</li>
<li><a
href="f45b0df09d"><code>f45b0df</code></a>
Fix a misleading example for <code>ProxyManager</code> (<a
href="https://redirect.github.com/urllib3/urllib3/issues/4970">#4970</a>)</li>
<li><a
href="577193ca02"><code>577193c</code></a>
Switch to nightly PyPy3.11 in CI for now (<a
href="https://redirect.github.com/urllib3/urllib3/issues/4984">#4984</a>)</li>
<li><a
href="e90af45bb0"><code>e90af45</code></a>
Avoid infinite loop in <code>HTTPResponse.read_chunked</code> when
<code>amt=0</code> (<a
href="https://redirect.github.com/urllib3/urllib3/issues/4974">#4974</a>)</li>
<li><a
href="67ed74fdae"><code>67ed74f</code></a>
Bump dev dependencies (<a
href="https://redirect.github.com/urllib3/urllib3/issues/4972">#4972</a>)</li>
<li><a
href="3abd481097"><code>3abd481</code></a>
Upgrade mypy to version 1.20.2 (<a
href="https://redirect.github.com/urllib3/urllib3/issues/4978">#4978</a>)</li>
<li><a
href="2b8725dfca"><code>2b8725d</code></a>
Drop support for EOL PyPy3.10 (<a
href="https://redirect.github.com/urllib3/urllib3/issues/4979">#4979</a>)</li>
<li><a
href="2944b2a0a6"><code>2944b2a</code></a>
Upgrade <code>setup-chrome</code> and <code>setup-firefox</code> to fix
warnings (<a
href="https://redirect.github.com/urllib3/urllib3/issues/4973">#4973</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/urllib3/urllib3/compare/2.6.3...2.7.0">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=urllib3&package-manager=uv&previous-version=2.6.3&new-version=2.7.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/infiniflow/ragflow/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-05-12 11:10:15 +08:00
daf8a58c4b Fix: add codeexec attachments output (#14787)
### What problem does this PR solve?

add codeexec attachments output

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-11 19:16:33 +08:00
292b0b8bce chore: fix some comments to improve readability (#14756)
### What problem does this PR solve?

fix some comments to improve readability

### Type of change

- [x] Documentation Update

---------

Signed-off-by: box4wangjing <box4wangjing@outlook.com>
2026-05-11 16:48:48 +08:00
e6cb9faace fix: close two security analyzer bypass paths in sandbox executor (#14690)
## Summary

Two bypass vectors in the sandbox code security analyzer allowed
malicious code to pass the safety check undetected and reach the Docker
executor.

### 1. JavaScript: template-literal bypass of `require()` block

The `SecureJavaScriptAnalyzer` regex patterns used `['"]` to match
module names, covering only single and double quotes. An attacker could
use ES6 template literals to bypass all three `require` checks:

`javascript
const cp = require(`child_process`);
async function main() {
  return cp.execSync('cat /etc/passwd').toString();
}
`

The same bypass applied to `fs` and `worker_threads`.

**Fix:** Updated all three `require` patterns from `['"]` to `['"\]` to
also match backtick template literals.

### 2. Python: `builtins` not blocked + attribute-call blind spot in
`visit_Call`

`visit_Call` only checked `ast.Name` nodes, so attribute-style calls
like `module.func()` were invisible to the analyzer. Additionally,
`builtins` was absent from `DANGEROUS_IMPORTS`. Combined, this allowed:

`python
import builtins
def main():
    builtins.exec('import os; os.system("id")')
`

Neither the import nor the exec call triggered any flag.

**Fix:** Added `builtins` to `DANGEROUS_IMPORTS` and added an
`ast.Attribute` branch to `visit_Call` so that `module.dangerous_func()`
style calls are caught alongside bare `dangerous_func()` calls.

## Tests

Added four regression tests covering each new bypass vector:
- `test_javascript_child_process_template_literal_is_rejected`
- `test_javascript_fs_template_literal_is_rejected`
- `test_python_builtins_import_is_rejected`
- `test_python_attribute_eval_call_is_rejected`

---------

Co-authored-by: bounty-hunter <bounty@hunter.local>
2026-05-11 11:46:27 +08:00
51b73850e1 feat: make sandbox Dockerfile mirrors optional with ARG (#14553)
### What problem does this PR solve?

Resolves #14447. *(Note: This supersedes stalled PR #14448 and
implements the requested CodeRabbitAI fixes).*

Currently, the Dockerfiles inside `agent/sandbox/sandbox_base_image`
(both Python and Node.js) have hardcoded Chinese package mirrors. This
forces the mirrors on all users globally, which causes build network
timeouts for contributors outside of China.

This PR introduces an enhancement to fix the issue by:
1. Implementing the `NEED_MIRROR` build argument in the sandbox
Dockerfiles.
2. Replacing static `ENV` instructions with conditional shell logic
inside `RUN` blocks to dynamically set the package registries.
3. Allowing the build to cleanly fall back to default global registries
(`pypi.org` and `npmjs.org`) when `--build-arg NEED_MIRROR=0` is passed.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring

---------

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-11 11:01:43 +08:00
ed01ac9994 Fix: resolve template strings in tool component parameters (#14601)
## Summary

- Tool-type components (Email, Invoke, etc.) fail to resolve template
strings that mix variable references with literal text in their
parameters.
- This adds template string resolution to `get_input()` in
`ComponentBase`, reusing existing `get_input_elements_from_text()` and
`string_format()` methods.

## Problem

`get_input()` in `ComponentBase` handles two cases:
1. **Pure reference** (`{Component:ID@field}`) — resolved via
`is_reff()` + `get_variable_value()`
2. **Literal value** — passed through as-is

But template strings like `{UserFillUp:X@name}@duke.edu` or `Question
from {Agent:Y@topic}` fall through to the literal branch because
`is_reff()` returns `False` (it expects the entire string to be a single
reference). The unresolved template is passed directly to the tool.

This affects **all** tool components (Email, Invoke, etc.) that need
mixed reference + text parameters — for example, constructing email
addresses or subjects dynamically.

## Fix

```python
# In get_input(), between is_reff check and literal fallback:
elif isinstance(v, str) and re.search(self.variable_ref_patt, v):
    elements = self.get_input_elements_from_text(v)
    kv = {k: e.get('value', '') for k, e in elements.items()}
    self.set_input_value(var, self.string_format(v, kv))
```

This reuses `get_input_elements_from_text()` and `string_format()` which
are already used by `Message` components for the same purpose. The fix
only activates when the string contains at least one variable reference
pattern but is not a pure reference.

## Test plan

- [x] Pure references (`{Component:ID@field}`) still resolve correctly
via `is_reff()` path
- [x] Literal values without references pass through unchanged
- [x] Template strings like `{ref}@duke.edu` resolve the reference and
keep the literal suffix
- [x] Template strings like `Question from {ref}` resolve correctly
- [x] Multiple references in one string (`{ref1} and {ref2}`) both
resolve
- [x] Message components unaffected (they use their own template
resolution in `_run`)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: wanghualoong <wanghualoong@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-11 10:01:41 +08:00
decb5dcb6f Fix: path-aware reset in canvas.run() to preserve cross-run outputs (#14600)
## Summary

- When an agent workflow has multiple `UserFillUp` pause points,
`canvas.run()` calls `reset(True)` on **all** components at the start of
each run. This clears outputs from components that completed in prior
runs, so downstream references like `{Agent:XXX@content}` resolve to
`None`.
- This fix only resets components on the **current execution path**
(`self.path`), preserving outputs from previously completed components.

## Problem

In a multi-step agent (e.g. draft email → user confirms → send email):

1. First `run()`: Agent drafts content, UserFillUp pauses for user input
→ Agent output is saved
2. Second `run()`: User submits input, but `reset(True)` clears **all**
components including the Agent that already completed
3. Email component references `{Agent:XXX@content}` → gets `None`
instead of the draft

This affects **all** agents that reference upstream component outputs
after a UserFillUp pause point.

## Fix

```python
# Before: reset ALL components
for k, cpn in self.components.items():
    self.components[k]["obj"].reset(True)

# After: only reset components on current execution path
path_set = set(self.path)
for k, cpn in self.components.items():
    if k in path_set:
        self.components[k]["obj"].reset(True)
```

`self.path` already tracks the current execution path. For agents
without UserFillUp (single run), `path` contains all components, so
behavior is unchanged.

## Test plan

- [x] Agent with single UserFillUp: outputs from prior components are
preserved after resume
- [x] Agent with multiple UserFillUp: each resume preserves all
previously completed outputs
- [x] Agent without UserFillUp: behavior unchanged (all components in
path, all reset)
- [x] Webhook-triggered agents: unaffected (path includes all components
on first run)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: wanghualoong <wanghualoong@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-08 15:10:15 +08:00
59c35100c5 Perf: push metadata filters down to Elasticsearch (#14576)
### What problem does this PR solve?

Fixes #14412.

`common.metadata_utils.meta_filter` evaluates user-defined metadata
conditions in Python after `DocMetadataService.get_flatted_meta_by_kbs`
loads the entire `meta_fields` table into memory. Past a few thousand
documents per knowledge base this becomes a memory bottleneck and a
wasted ES round-trip — every filter request currently fetches up to
10000 metadata rows even when the resulting `doc_ids` list is tiny.

This PR adds an ES push-down path that translates the same filter
language into a `bool` query and returns just the matching document IDs.

**Changes**

- `common/metadata_es_filter.py` *(new)*: pure-Python translator from
the RAGflow filter list to ES DSL. Covers every operator the in-memory
path supports (`=`, `≠`, `>`, `<`, `≥`, `≤`, `in`, `not in`, `contains`,
`not contains`, `start with`, `end with`, `empty`, `not empty`) with
`case_insensitive: true` on `prefix` and `wildcard` for parity with the
existing lower-cased Python comparisons. User wildcard metacharacters
are escaped before being injected into `wildcard` patterns. Negative
operators (`≠`, `not in`, `not contains`, ranges) are wrapped with an
`exists` guard so they do not accidentally match documents missing the
key, matching the legacy `if k not in metas` behaviour.
- `api/db/services/doc_metadata_service.py`: new
`DocMetadataService.filter_doc_ids_by_meta_pushdown(kb_ids, filters,
logic)` that returns the doc IDs ES matched, or `None` to signal the
caller should fall back to the in-memory path. Returns `None` when the
active doc store is Infinity (`meta_fields` is a JSON column, not a
dotted-object mapping), when any filter cannot be expressed in DSL
(`UnsupportedMetaFilter`), or when the ES request or metadata index
lookup errors.
- `common/metadata_utils.py`: `apply_meta_data_filter` accepts an
optional `kb_ids` argument. When supplied, conditions go through
push-down first via a new `_try_meta_pushdown` helper; on `None` the
function falls back to the original `meta_filter` call. Default
behaviour is unchanged for callers that don't pass `kb_ids`.
- Updated all four callers (`agent/tools/retrieval.py`,
`api/db/services/dialog_service.py` ×2,
`api/apps/services/dataset_api_service.py`, `api/apps/sdk/session.py`)
to forward `kb_ids` so the push-down path is exercised in production.
- `test/unit_test/common/test_metadata_es_filter.py` *(new)*: 35 unit
tests covering every operator's DSL shape, value coercion
(`ast.literal_eval`, lowercasing, ISO-date pass-through), wildcard
escaping, OR-logic wrapping that protects negative clauses, and the
doc-ID extractor.

**Behaviour preserved**

- The in-memory `meta_filter` is untouched and still services every
fallback case (Infinity backend, unknown operators, ES outages).
- The eligibility / credibility / issue-multiplier semantics described
in the LLM-driven `auto` and `semi_auto` modes still hand the LLM the
full in-memory `metas` dict to choose conditions from. Only the
*evaluation* of those generated conditions is pushed down.
- Existing tests in
`test/unit_test/common/test_metadata_filter_operators.py` continue to
pass (14/14).

**Test plan**

- `pytest test/unit_test/common/test_metadata_es_filter.py` — 35 passed.
- `pytest test/unit_test/common/test_metadata_filter_operators.py` — 14
passed.
- `ruff check` clean on every modified file.
- Reviewer please validate the ES query shapes against a live cluster —
particularly `case_insensitive` on `wildcard` and `prefix` (requires ES
7.10+) and the `exists` + `must_not` pairing for `≠`.

**Notes**

- The first cut caps each push-down request at 10000 results, matching
the existing `get_flatted_meta_by_kbs` limit, and logs a warning when
the cap is hit. A `search_after` follow-up would let us drop the cap
entirely once the push-down path is validated.
- Operator parity with the in-memory path is exact for the canonical
unicode operators (`≥`, `≤`, `≠`) used internally; the ASCII aliases
(`>=`, `<=`, `!=`) are normalised by `convert_conditions` before they
reach the translator.

### Type of change

- [x] Performance Improvement

---------

Co-authored-by: sxxtony <sxxtony@users.noreply.github.com>
2026-05-07 21:23:43 +08:00
c29335cbff Feat: support local provider for code exec component & remove some outdated models (#14637)
### What problem does this PR solve?

Feat: support local provider for code exec component & remove some
outdated models

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-05-07 21:23:13 +08:00
0501134820 Fix: support tool call config (#14616)
### What problem does this PR solve?
support tool call config

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-07 15:54:57 +08:00
5b162a0c46 Fix: preserve doc generator download metadata in message (#14626)
### What problem does this PR solve?

preserve doc generator download metadata

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-07 15:48:36 +08:00
3e396c0a72 Fix: add base64 to doc generator output (#14599)
### What problem does this PR solve?
add base64 to doc generator output
### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-06 20:33:08 +08:00
e6e80041f5 Fix: agent toolcall null response & schema validation & DeepSeek think history (#14425)
### What problem does this PR solve?
agent toolcall null response & schema validation & DeepSeek think
history

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-28 17:09:08 +08:00
c949096db0 Refactor: optimize agent reset conversation variable defaults (#14401)
### What problem does this PR solve?
optimize agent reset conversation variable defaults
### Type of change
- [x] Refactoring
2026-04-27 19:57:56 +08:00
82313020c7 Refa: align list operations and strict mode (#14387)
### What problem does this PR solve?

align list operations and strict mode

### Type of change
- [x] Refactoring
2026-04-27 19:13:00 +08:00
4f6651968a Fix: prioritize explore session ID and reset default conversation variables (#14399)
### What problem does this PR solve?

 prioritize explore session ID and reset default conversation variables

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-27 18:52:40 +08:00
290f0294d6 Refactor: migrate artifact API (#14348)
### What problem does this PR solve?

Before migration: GET /v1/document/artifact/<filename>
After migration:  GET /api/v1/documents/artifact/<filename>

### Type of change

- [x] Refactoring
2026-04-27 15:19:41 +08:00
fb95136f39 Fix: validate URL scheme and resolved IP before crawling to prevent SSRF (#14090)
### What problem does this PR solve?

The POST /upload_info?url=<url> endpoint accepted a user-supplied URL
and passed it directly to AsyncWebCrawler without any validation. There
were no restrictions on URL scheme, destination hostname, or resolved IP
address. This allowed any authenticated user to instruct the server to
make outbound HTTP requests to internal infrastructure — including RFC
1918 private networks, loopback addresses, and cloud metadata services
such as http://169.254.169.254 — effectively using the server as a proxy
for internal network reconnaissance or credential theft.

This PR adds an SSRF guard (_validate_url_for_crawl) that runs before
any crawl is initiated. It enforces an allowlist of safe schemes
(http/https), resolves the hostname at validation time, and rejects any
URL whose resolved IP falls within a private or reserved network range.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-25 14:30:15 +08:00
620088be2f fix: check isinstance before len in VariableAssigner _remove_first/_remove_last (#14281)
fix: check isinstance before len in VariableAssigner _remove_first/_remove_last
2026-04-24 19:09:44 +08:00
75a5548b85 Feat: optimize title chunk (#14325)
### What problem does this PR solve?

Feat: optimize title chunk
1. Add a new button to enable "Use root chunk as H0 heading", so that
the first chunk is carried on to all remaining chunks.
2. Update resume agent template

### Type of change

- [x] New Feature (non-breaking change which adds functionality)


<img width="700" alt="img_v3_02111_63b04951-b3d7-4001-a08b-539db6d5298g"
src="https://github.com/user-attachments/assets/4179ac4d-90e7-4353-9b93-d649a455e634"
/>

<img width="700" alt="image"
src="https://github.com/user-attachments/assets/c0ba0f3c-05aa-4f2c-b418-e808ca1a2641"
/>
2026-04-23 18:55:55 +08:00
9c7c105007 Fix: Doc generator (#14223)
### What problem does this PR solve?

Doc generator

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-20 16:37:33 +08:00
d053317c4d Fix: variable in doc generator (#14180)
### What problem does this PR solve?

Fix: variable in doc generator

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-20 14:19:42 +08:00
f554f6ae85 chore(docs): tips for installing CN fonts (#14189)
### What problem does this PR solve?
Add tips for installing Chinse fonts under code sandbox. Otherwise,
`matplotlib `won't render Chinese correctly.

<img width="2082" height="1186" alt="sales_analysis"
src="https://github.com/user-attachments/assets/57e675ab-1e92-4662-9aeb-ad72a6121eb5"
/>



### Type of change

- [x] Documentation Update
2026-04-20 12:11:23 +08:00
c3bf8d9d60 feat(templates): add a data analysis agent template (#14130)
### What problem does this PR solve?

Add a new agent template that demonstrates how to leverage the
`CodeExec` component to do the data analysis.

### Type of change

- [x] Other (please describe): Agent template
2026-04-17 11:32:04 +08:00
0df5d830d4 Refact: Updated agent template descriptions. (#14175)
### What problem does this PR solve?

Updated ingestion pipeline template descriptions for better technical
accuracy and readability.

### Type of change

- [x] Refactoring
2026-04-17 10:46:06 +08:00
901023a80a Fix: literal eval http request input (#14145)
### What problem does this PR solve?

Fix: literal eval http request input

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

<img width="700" alt="img_v3_0210q_f4b49ff7-e670-4054-ab0e-9443a09215fg"
src="https://github.com/user-attachments/assets/089300be-06f9-4bb6-97af-61bf5f4a5e8c"
/>


<img width="700" alt="img_v3_0210q_398cd52a-2ad9-42be-8d5b-4e6e68a7d22g"
src="https://github.com/user-attachments/assets/239b43cd-a2a5-49d8-9200-991bb26336c8"
/>
2026-04-16 16:52:34 +08:00
356ba5650a Fix: sandbox don't attach attachment metadata (#14135)
### What problem does this PR solve?

Sandbox don't attach attachment metadata

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-16 12:08:54 +08:00
d51789e2be Feat: update templates && add resume template (#14124)
### What problem does this PR solve?

Feat: update templates  && add resume template

### Type of change


- [x] New Feature (non-breaking change which adds functionality)
2026-04-15 18:42:29 +08:00
1376c004a9 Fix: update docs generator (#14070)
### What problem does this PR solve?

Refactor: update docs generator

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

1. Support multiple document generator components and correctly display
messages in the message component. The document generator will not
overwrite other messages.

<img width="700" alt="Screenshot from 2026-04-13 13-56-17"
src="https://github.com/user-attachments/assets/3f3e06e8-33ce-4df1-8b05-510c86af70a4"
/>

2. Support Chinese content and ensure correct Markdown rendering in PDF
and DOCX
<img width="700" alt="image"
src="https://github.com/user-attachments/assets/69bf1f7b-261d-48e5-a9f3-8e94462b90ed"
/>

3. Simplify configuration page and support more output format
 
<img height="700" alt="image"
src="https://github.com/user-attachments/assets/8647374c-c055-4daa-ad71-cd9052eb138e"
/>

4. Hide download from other components except for message 
<img width="700" alt="image"
src="https://github.com/user-attachments/assets/a723dfcb-b60d-4eb5-b2f6-d41ca5955eb4"
/>

<img width="700" alt="image"
src="https://github.com/user-attachments/assets/a8762ac4-807b-4f0b-9287-65f82f7c9c98"
/>

5. Sanitize filename
 
<img width="700" alt="image"
src="https://github.com/user-attachments/assets/df49509f-37c0-40f9-b03d-bd6ce7fdefa8"
/>


6. And more changes on usability
2026-04-14 15:24:43 +08:00
2b6c50734f Sync code from EE (#14080)
### What problem does this PR solve?

As title.

### Type of change

- [x] Refactoring

---------

Signed-off-by: Jin Hai <haijin.chn@gmail.com>
2026-04-14 15:03:46 +08:00
8723c3aa86 Feat: more templates (#14075)
### What problem does this PR solve?

Feat: more templates
<img width="700" alt="image"
src="https://github.com/user-attachments/assets/533e88f1-fc56-4337-a026-6623fc978893"
/>


### Type of change

- [x] New Feature (non-breaking change which adds functionality)

---------

Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>
2026-04-14 10:00:55 +08:00
356d45fda1 Feat: add cell type coercion for Excel export (#13808)
### What problem does this PR solve?

- Implemented a helper function to convert markdown cell text to native
numeric types for Excel output.
- Ensured that leading zeros are preserved and handled various numeric
formats, including those with thousand separators and scientific
notation.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
2026-04-13 20:54:57 +08:00
853021ff2a feat: support multiple canvas_types for agent templates and remove duplicate files (#14030)
### What problem does this PR solve?

Closes #13907

The template catalog had duplicate files (e.g. `*_r.json`) only to place
the same template into multiple sidebar groups.
This increases maintenance cost and makes template updates error-prone.

This PR adds first-class support for multiple template categories in a
single file via `canvas_types`, then removes duplicate template files.

What changed:
- Added `canvas_types` to `CanvasTemplate` model and DB migration.
- Added normalization logic when loading templates:
  - accepts legacy `canvas_type`
  - accepts new `canvas_types`
  - merges/deduplicates values
- preserves backward compatibility by keeping `canvas_type` as first
normalized value.
- Updated template import flow to load only `.json` files and in stable
sorted order.
- Updated frontend template filtering to match on `canvas_types` first,
with fallback to legacy `canvas_type`.
- Consolidated duplicated template pairs into single files and removed:
  - `deep_search_r.json`
  - `reflective_academic_paper_generator_r.json`
  - `seo_article_writer_r.json`
- Added regression/edge-case tests for category normalization and route
serialization expectations.

### Type of change

- [ ] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [ ] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
2026-04-13 20:26:30 +08:00
1638083e18 Fix: sandbox cannot accept large args list (#14063)
### What problem does this PR solve?

Sandbox cannot accept large args list.

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-04-13 14:14:08 +08:00
3911d90993 Fix: agent application can not show Cite (#14047)
Close #14018

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

### Problem
In Agent applications, even with the cite option enabled, only inline
[ID: x] citation markers are visible (showing chunk content on hover).
The Agent does not display the referenced file cards below the response,
unlike Chat applications.

### Root Cause
The Agent's Retrieval tool (agent/tools/retrieval.py) calls
retriever.retrieval() with aggs=False, which means the retrieval results
do not include doc_aggs (document aggregation) data. Without doc_aggs,
the frontend ReferenceDocumentList component has no data to render the
file cards.

In contrast, the Chat application (api/db/services/dialog_service.py)
calls the same retriever.retrieval() method with aggs=True.

### Fix
Changed aggs=False to aggs=True in agent/tools/retrieval.py so that
document aggregation data is returned along with the retrieved chunks.
2026-04-13 11:06:14 +08:00