## Summary
This PR fully addresses all CodeRabbit review feedback and enhances the
robustness of the reranking module with 100% backward compatibility.
## Key Fixes
1. Fixed JinaRerank hardcoded base_url to support subclass endpoint
overrides
2. Corrected GPUStackRerank exception handling to use proper requests
exceptions and preserve stack traces
3. Added 30s timeout to all API calls to prevent service hanging
4. Added empty input validation for all rerank providers
5. Replaced direct dict key access with .get() to eliminate KeyError
crashes
6. Fixed _normalize_rank edge case for empty arrays
7. Implemented missing functionality for Ai302Rerank
8. Standardized type hints and fixed typo issues
## Compatibility
- No breaking changes to any existing functionality
- All rerank providers work as originally intended
- Fully compatible with existing configurations and workflows
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
---------
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
### What problem does this PR solve?
Multiple `requests.post()` calls across the LLM integration layer lack a
`timeout` parameter. Without a timeout, a single unresponsive upstream
service can block the calling thread **indefinitely**, eventually
exhausting the thread pool and degrading the entire system.
This is a well-known issue — Python's `requests` library defaults to
`timeout=None` (infinite wait), and [the library docs explicitly
recommend](https://requests.readthedocs.io/en/latest/user/advanced/#timeouts)
always setting a timeout.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Change
Added `timeout` to all `requests.post()` calls missing it:
| File | Calls fixed | Timeout |
|------|-------------|---------|
| `rag/llm/rerank_model.py` | 9 | 30s |
| `rag/llm/embedding_model.py` | 8 | 30s |
| `rag/llm/cv_model.py` | 3 | 60s |
| `rag/llm/tts_model.py` | 2 | 60s |
| `rag/llm/sequence2txt_model.py` | 2 | 60s |
Embedding/rerank calls use 30s (lightweight API calls). Vision, TTS, and
audio transcription use 60s (heavier workloads with file uploads).
Note: other files in the codebase (e.g. `check_minio_alive`,
`check_ragflow_server_alive`) already use `timeout=10`, so this PR
brings the LLM layer in line with existing practice.
Signed-off-by: Ricardo-M-L <Sibyl_Hartmanbnb@webname.com>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>
## Summary
- `CvModel["Bedrock"]` was absent from `rag/llm/cv_model.py`, causing
`model_instance()` to return `None` when a Bedrock model was used as a
PDF parser — even after correct model resolution.
- This PR adds `BedrockCV`, enabling Bedrock vision models (e.g.
`amazon.nova-pro-v1:0`, `anthropic.claude-3-5-sonnet`) to be used as PDF
parsers.
## What problem does this PR solve?
When a Bedrock model is selected as the PDF parser in a knowledge base,
ingestion failed with:
```
'LiteLLMBase' object has no attribute 'describe_with_prompt'
```
The root cause: `LiteLLMBase` (the Bedrock chat implementation) was the
only registered handler for the Bedrock factory. It does not implement
`describe_with_prompt`. `CvModel` had no Bedrock entry, so
`model_instance()` returned `None` for `image2text` requests.
## Type of change
- [x] New Feature (non-breaking change which adds functionality)
## Changes
**`rag/llm/cv_model.py`**
Adds `BedrockCV(Base)` with `_FACTORY_NAME = "Bedrock"`:
- Uses `litellm.completion` with the `bedrock/` prefix (consistent with
`LiteLLMBase`)
- Parses AWS credentials from the JSON key assembled by `add_llm`
(`auth_mode`, `bedrock_ak`, `bedrock_sk`, `bedrock_region`,
`aws_role_arn`)
- Supports three auth modes: `access_key_secret`, `iam_role` (via STS
`assume_role`), and default credential chain (IRSA, instance profile)
- Implements `describe_with_prompt` and `describe`
## Test plan
- [ ] Configure a Bedrock vision model (e.g. `amazon.nova-pro-v1:0`)
with valid AWS credentials
- [ ] Select it as PDF parser in a knowledge base
- [ ] Verify ingestion of a PDF document completes without errors
- [ ] Verify `CvModel["Bedrock"]` resolves to `BedrockCV`
🤖 Generated with [Claude Code](https://claude.ai/claude-code)
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
### What problem does this PR solve?
HuggingfaceRerank.post() unconditionally prepends `http://` to base_url,
which already contains a protocol. This creates invalid URLs like
http://http://127.0.0.1:8080/rerank, breaking all requests. The fix
normalizes URL handling to match the rest of the codebase, removing
redunant `http://`.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### Related Issues
- #7318
- #7796
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
### What
19 methods across `rag/llm/chat_model.py` and `rag/llm/cv_model.py`
declare `gen_conf={}` (or `gen_conf: dict = {}`) as a parameter default
and then mutate `gen_conf` in place — typically `del
gen_conf["max_tokens"]`, `gen_conf["penalty_score"] = ...`, or
`gen_conf.pop(...)` as part of provider-specific normalization.
### The two bugs in this pattern
**1. Mutable default argument (Python footgun).** Python evaluates
default values **once** at function-definition time, so the single `{}`
dict is *shared* across every caller that doesn't pass `gen_conf`. The
first such call's mutations leak into the default seen by every
subsequent call.
```python
# Before
def chat_streamly(self, system, history, gen_conf={}, **kwargs):
if "max_tokens" in gen_conf:
del gen_conf["max_tokens"] # mutates the SHARED default dict
...
```
After call N with `max_tokens` set, call N+1 that omits `gen_conf` no
longer sees `max_tokens` — even though the caller never touched it.
**2. Caller-dict pollution.** When the caller *does* pass a `gen_conf`
dict, the same in-place mutations modify the caller's dict. A reused
`gen_conf` (very common for chat-loop callers that build the config once
and pass it on every turn) silently loses `max_tokens`,
`presence_penalty`, etc. after the first round.
### The fix
In every affected method:
- Change `gen_conf={}` (or `gen_conf: dict = {}`) → `gen_conf=None`.
- Add `gen_conf = dict(gen_conf or {})` as the first statement of the
body so all subsequent mutations operate on a fresh local copy.
```python
# After
def chat_streamly(self, system, history, gen_conf=None, **kwargs):
gen_conf = dict(gen_conf or {})
if "max_tokens" in gen_conf:
del gen_conf["max_tokens"] # local copy — safe
...
```
This is byte-for-byte identical provider-side behavior for callers that
already pass a fresh `gen_conf` per call. The new `dict(...)` copy is
O(small constant) per call.
### Files changed
- `rag/llm/chat_model.py` — 17 methods
- `rag/llm/cv_model.py` — 2 methods
### Tests
Adds `test/unit_test/rag/llm/test_gen_conf_no_mutable_default.py` — an
`ast`-based regression guard that walks both modules and asserts no
parameter named `gen_conf` ever has a mutable literal (`{}` or `[]`) as
its default. The test caught **five additional `gen_conf: dict = {}`
sites** that an initial `gen_conf={}` text grep had missed (annotated
parameters with whitespace), and would fail again if the pattern is ever
reintroduced.
```
$ pytest test/unit_test/rag/llm/test_gen_conf_no_mutable_default.py -v
============================== 3 passed in 0.04s ===============================
```
`ruff check` passes on all touched files.
### Notes
- This PR is intentionally focused on **just** the `gen_conf` default +
copy fix. There's a related (but separate) `history.insert(0, ...)`
pattern in the same files that mutates the caller's history list in 12
places — left for a follow-up so this PR stays mechanical and easy to
review.
### Latest revision (`700bb54a7`) — addresses CodeRabbit review
- Type annotation: `gen_conf: dict = None` → `gen_conf: dict | None =
None` (5 occurrences in `chat_model.py`). The old annotation was a
static-checker mismatch since `None` isn't a `dict`.
- Regression test: the AST check accessed `default.keys` directly.
`ast.List` has no `.keys` attribute — a future `gen_conf=[]` would crash
with `AttributeError` instead of being caught. Use `getattr` for both
`.keys` (Dict) and `.elts` (List). Manually verified the updated check
correctly catches both `gen_conf={}` and `gen_conf=[]` while ignoring
`gen_conf=None` and non-empty literals.
---------
Co-authored-by: Ricardo <ricardo@example.com>
## Summary
- Add MiniMax provider GroupId query parameter support in `LiteLLMBase`
- Extract `group_id` from key configuration in `__init__`
- Append `GroupId` as query parameter to `api_base` in
`_construct_complete_args`
## Why this change is needed
MiniMax provides an OpenAI-compatible API endpoint
(`/v1/chat/completions`), but `GroupId` is a MiniMax-specific account
identifier required for billing and rate limiting - it is not part of
the OpenAI standard.
Looking at LiteLLM's `MinimaxChatConfig`:
- `get_complete_url()` only constructs the base URL (e.g.,
`https://api.minimaxi.com/v1/chat/completions`)
- LiteLLM does **not** automatically inject `GroupId` into requests
- This must be handled by the caller (ragflow's chat_model.py)
The implementation appends `GroupId` as a query parameter to `api_base`:
```python
api_base = completion_args.get("api_base", self.base_url)
separator = "&" if "?" in api_base else "?"
completion_args["api_base"] = f"{api_base}{separator}GroupId={self.group_id}"
```
This matches MiniMax's official API format (as documented by
LlamaFactory):
```bash
curl --location 'https://api.minimaxi.chat/v1/text/chatcompletion?GroupId=你的GroupId' \
--header 'Authorization: Bearer 你的API_Key'
```
## Test plan
- [ ] Verify MiniMax API calls work with GroupId query parameter
- [ ] Verify backward compatibility for other providers
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
### What problem does this PR solve?
Fix#14340
## Problem Description
When using an **Agentic Agent** (not Workflow) with one or more
Retrieval tools (e.g., Dataset Retrieval + Memory Retrieval), the agent
silently returns an empty response (`agent_response: ""`) after hanging
for several minutes. The server logs show:
```
AttributeError: 'ChatCompletionMessageToolCall' object has no attribute 'index'
```
This error propagates as a `GENERIC_ERROR`, causing the canvas to return
an empty response. The subsequent Memory save task then receives the
empty `agent_response` and logs:
```
Document for referred_document_id XXXX not found
```
## Reproduction Steps
1. Set `DOC_ENGINE=infinity` (or `elasticsearch` — the engine itself is
not the root cause).
2. Create a blank **Agentic Agent** (not a Workflow).
3. Add **two Retrieval tools** to the Agent node:
- `Retrieval_DS` → Dataset (Knowledge Base)
- `Retrieval_Mem` → Memory component
4. Add a **Message** node with **Save to Memory** enabled.
5. Launch the agent and send any message (e.g., "hola").
6. The agent hangs and returns an empty response.
## Root Cause Analysis
The crash occurs in `_append_history` and `_append_history_batch` inside
`rag/llm/chat_model.py`. These methods directly access `.index` on tool
call objects:
```python
# _append_history_batch
{
"index": tc.index, # <-- crashes here
...
}
```
However, **non-streaming** LLM responses (`stream=False`) return
`ChatCompletionMessageToolCall` objects, which **do not have an `index`
field** according to the OpenAI API specification. The `index` field
only exists on `ChoiceDeltaToolCall` objects returned in **streaming**
responses (`stream=True`).
When the agentic agent triggers an internal `full_question` call (used
to compress multi-turn conversation history), the request is incorrectly
routed through `async_chat_with_tools` because `is_tools=True` is set at
the `LLMBundle` level. If the LLM decides to emit `tool_calls` during
this auxiliary request, the code enters the non-streaming tool loop and
crashes when trying to append history.
## Fix
Replaced all direct `.index` accesses with `getattr(..., "index", None)`
for safe, backward-compatible access:
| Method | File | Line | Change |
|--------|------|------|--------|
| `_append_history` | `rag/llm/chat_model.py` | ~L304 |
`tool_call.index` → `getattr(tool_call, "index", None)` |
| `_append_history_batch` | `rag/llm/chat_model.py` | ~L332 | `tc.index`
→ `getattr(tc, "index", None)` |
| `_append_history` | `rag/llm/chat_model.py` | ~L1467 |
`tool_call.index` → `getattr(tool_call, "index", None)` |
| `_append_history_batch` | `rag/llm/chat_model.py` | ~L1496 |
`tc.index` → `getattr(tc, "index", None)` |
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Signed-off-by: noob <yixiao121314@outlook.com>
### What problem does this PR solve?
Both tokenizer (`rag/flow/tokenizer/tokenizer.py`) and
`BuiltinEmbed.encode`
(`rag/llm/embedding_model.py`) currently accumulate embedding batches
via
`np.concatenate` inside the per-batch loop. `np.concatenate` allocates a
new
array and copies all existing data on every call, so accumulating N
batches
is O(N²) in both time and peak memory.
Replacing the incremental concatenate with a list-of-batches + a single
`np.vstack` at the end gives O(N) total work.
For tokenizer the title-vector broadcast `np.concatenate([vts[0]] * N)`
is
also replaced by `np.tile`, which does the same job with a single
contiguous
allocation instead of building a Python list of references.
This is purely a CPU/memory optimisation — output shape and dtype are
unchanged. Measured impact grows with document size:
- 1k chunks (batch 512, 2 iters): ~negligible
- 10k chunks (20 iters): ~10× speedup on this stage
- 100k chunks (195 iters): ~100× speedup, and peak RAM
drops from O(N) extra to near-zero
### Type of change
- [x] Performance Improvement
Co-authored-by: yoan sapienza <Yoan Sapienza yoan.sapienza@orange.fr Yoan Sapienza zappy@macbookpro.home>
### What problem does this PR solve?
agent toolcall null response & schema validation & DeepSeek think
history
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
## Add Astraflow Provider Support
This PR integrates [Astraflow](https://astraflow.ucloud.cn/) (by UCloud
/ 优刻得) as a new AI model provider in RAGFlow, with support for both
global and China endpoints.
### About Astraflow
Astraflow is an OpenAI-compatible AI model aggregation platform
supporting 200+ models from major providers including DeepSeek, Qwen,
GPT, Claude, Gemini, Llama, Mistral, and more.
| Variant | Factory Name | Endpoint | Env Var |
|---------|-------------|----------|---------|
| Global | `Astraflow` | `https://api-us-ca.umodelverse.ai/v1` |
`ASTRAFLOW_API_KEY` |
| China | `Astraflow-CN` | `https://api.modelverse.cn/v1` |
`ASTRAFLOW_CN_API_KEY` |
- **API key signup**: https://astraflow.ucloud.cn/
---
### Files Changed
| File | Change |
|------|--------|
| `rag/llm/__init__.py` | Register `Astraflow` and `Astraflow-CN` in
`SupportedLiteLLMProvider` enum, `FACTORY_DEFAULT_BASE_URL`, and
`LITELLM_PROVIDER_PREFIX` |
| `rag/llm/chat_model.py` | Add `AstraflowChat` and `AstraflowCNChat`
(OpenAI-compatible `Base` subclass) |
| `rag/llm/embedding_model.py` | Add `AstraflowEmbed` and
`AstraflowCNEmbed` (subclasses of `OpenAIEmbed`) |
| `rag/llm/rerank_model.py` | Add `AstraflowRerank` and
`AstraflowCNRerank` (subclasses of `OpenAI_APIRerank`) |
| `rag/llm/cv_model.py` | Add `AstraflowCV` and `AstraflowCNCV`
(subclasses of `GptV4`) |
| `rag/llm/tts_model.py` | Add `AstraflowTTS` and `AstraflowCNTTS`
(subclasses of `OpenAITTS`) |
| `rag/llm/sequence2txt_model.py` | Add `AstraflowSeq2txt` and
`AstraflowCNSeq2txt` (subclasses of `GPTSeq2txt`) |
| `conf/llm_factories.json` | Register `Astraflow` and `Astraflow-CN`
factories with a curated list of popular models |
---
### Supported Model Types
- ✅ **Chat / LLM** — DeepSeek-V3/R1, Qwen3, GPT-4o/4.1, Claude 3.5/3.7,
Gemini 2.0/2.5 Flash, Llama 3.3/4, Mistral, and 200+ more
- ✅ **Text Embedding** — text-embedding-3-small/large
- ✅ **Image / Vision (IMAGE2TEXT)** — GPT-4o, GPT-4.1, Claude, Gemini,
Llama-4, etc.
- ✅ **Text Re-Rank**
- ✅ **TTS** — tts-1
- ✅ **Speech-to-Text (SPEECH2TEXT)** — whisper-1
### Implementation Notes
- Uses the `openai/` LiteLLM prefix — consistent with other
OpenAI-compatible aggregation platforms (SILICONFLOW, DeerAPI, CometAPI,
OpenRouter, n1n, Avian, etc.)
- `Astraflow` (global, rank 250) and `Astraflow-CN` (China, rank 249)
are separate factory entries, allowing users to choose the optimal
endpoint based on their region.
- All model classes cleanly subclass existing base classes (`Base`,
`OpenAIEmbed`, `OpenAI_APIRerank`, `GptV4`, `OpenAITTS`, `GPTSeq2txt`)
with no custom logic needed — the provider is fully OpenAI-compatible.
---------
Co-authored-by: user <user@xzaaaMacBook-Air.local>
https://bailian.console.aliyun.com/cn-beijing?tab=api#/api/?type=model&url=2780056
### What problem does this PR solve?
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Other (please describe): add gte-rerank-v2、qwen3-rerank
### What problem does this PR solve?
fix#13944 where OpenAI-compatible custom endpoints failed verification
when model names contained `gpt-5` becauser of incorrect name-based
handling in the Base/backend=`base` path.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
Adds Perplexity contextualized embeddings API as a new model provider,
as requested in #13610.
- `PerplexityEmbed` provider in `rag/llm/embedding_model.py` supporting
both standard (`/v1/embeddings`) and contextualized
(`/v1/contextualizedembeddings`) endpoints
- All 4 Perplexity embedding models registered in
`conf/llm_factories.json`: `pplx-embed-v1-0.6b`, `pplx-embed-v1-4b`,
`pplx-embed-context-v1-0.6b`, `pplx-embed-context-v1-4b`
- Frontend entries (enum, icon mapping, API key URL) in
`web/src/constants/llm.ts`
- Updated `docs/guides/models/supported_models.mdx`
- 22 unit tests in `test/unit_test/rag/llm/test_perplexity_embed.py`
Perplexity's API returns `base64_int8` encoded embeddings (not
OpenAI-compatible), so this uses a custom `requests`-based
implementation. Contextualized vs standard model is auto-detected from
the model name.
Closes#13610
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
### What problem does this PR solve?
add a handler for gpt 5 models that do not accept parameters by dropping
them, and centralize all models with specific paramter handling function
into a single helper.
solves issue #13639
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] Refactoring
## Summary
Add MiniMax's latest M2.5 model family to the model registry and update
the default API base URL to the international endpoint for broader
accessibility.
## Changes
- **Add MiniMax-M2.5 models** to `conf/llm_factories.json`:
- `MiniMax-M2.5` — Peak Performance. Ultimate Value. Master the Complex.
- `MiniMax-M2.5-highspeed` — Same performance, faster and more agile.
- Both support 204,800 token context window and tool calling (`is_tools:
true`).
- **Update default MiniMax API base URL** in `rag/llm/__init__.py`:
- From `https://api.minimaxi.com/v1` (domestic) to
`https://api.minimax.io/v1` (international).
- Chinese users can still override via the Base URL field in the UI
settings (as documented in existing i18n strings).
## Supported Models
| Model | Context Window | Tool Calling | Description |
|-------|---------------|-------------|-------------|
| `MiniMax-M2.5` | 204,800 tokens | Yes | Peak Performance. Ultimate
Value. |
| `MiniMax-M2.5-highspeed` | 204,800 tokens | Yes | Same performance,
faster and more agile. |
## API Documentation
- OpenAI Compatible API:
https://platform.minimax.io/docs/api-reference/text-openai-api
## Testing
- [x] JSON validation passes
- [x] Python syntax validation passes
- [x] Ruff lint passes
- [x] MiniMax-M2.5 API call verified (returns valid response)
- [x] MiniMax-M2.5-highspeed API call verified (returns valid response)
Co-authored-by: PR Bot <pr-bot@minimaxi.com>
Co-authored-by: Jin Hai <haijin.chn@gmail.com>
Co-authored-by: Yingfeng <yingfeng.zhang@gmail.com>
## Summary
- Convert bare `open()` calls to `with` context managers or
`Path.read_text()`
- File handles leak if not properly closed, especially on exceptions
- Fixes in crypt.py, sequence2txt_model.py, term_weight.py,
deepdoc/vision/__init__.py
## Test plan
- [x] File operations work correctly with context managers
- [x] Resources properly cleaned up on exceptions
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
### What problem does this PR solve?
This PR aims to extend the list of possible providers. Adds new Provider
"RAGcon" within the Ollama Modal. It provides all model types except OCR
via Openai-compatible endpoints.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
---------
Co-authored-by: Jakob <16180662+hauberj@users.noreply.github.com>
### What problem does this PR solve?
Refer to issue: #13236
The base url for GPUStack chat model requires `/v1` suffix. For the
other model type like `Embedding` or `Rerank`, the `/v1` suffix is not
required and will be appended in code.
So keep the same logic for chat model as other model type.
### Type of change
- [X] Bug Fix (non-breaking change which fixes an issue)
### What problem does this PR solve?
This PR adds [Avian](https://avian.io) as a new LLM provider to RAGFlow.
Avian provides an OpenAI-compatible API with competitive pricing,
offering access to models like DeepSeek V3.2, Kimi K2.5, GLM-5, and
MiniMax M2.5.
**Provider details:**
- API Base URL: `https://api.avian.io/v1`
- Auth: Bearer token via API key
- OpenAI-compatible (chat completions, streaming, function calling)
- Models:
- `deepseek/deepseek-v3.2` — 164K context, $0.26/$0.38 per 1M tokens
- `moonshotai/kimi-k2.5` — 131K context, $0.45/$2.20 per 1M tokens
- `z-ai/glm-5` — 131K context, $0.30/$2.55 per 1M tokens
- `minimax/minimax-m2.5` — 1M context, $0.30/$1.10 per 1M tokens
**Changes:**
- `rag/llm/chat_model.py` — Add `AvianChat` class extending `Base`
- `rag/llm/__init__.py` — Register in `SupportedLiteLLMProvider`,
`FACTORY_DEFAULT_BASE_URL`, `LITELLM_PROVIDER_PREFIX`
- `conf/llm_factories.json` — Add Avian factory with model definitions
- `web/src/constants/llm.ts` — Add to `LLMFactory` enum, `IconMap`,
`APIMapUrl`
- `web/src/components/svg-icon.tsx` — Register SVG icon
- `web/src/assets/svg/llm/avian.svg` — Provider icon
- `docs/references/supported_models.mdx` — Add to supported models table
This follows the same pattern as other OpenAI-compatible providers
(e.g., n1n #12680, TokenPony).
cc @KevinHuSh @JinHai-CN
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
- [x] Documentation Update
### What problem does this PR solve?
Refact: switch from oogle-generativeai to google-genai #13132
Refact: commnet out unused pywencai.
### Type of change
- [x] Refactoring
### What problem does this PR solve?
Update stepfun list.
Add TTS and Sequence2Text functionalities.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Add support `doubao-embedding-vision` model.
`doubao-embedding-large-text` is deprecated.
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
## What problem does this PR solve?
This PR addresses three specific issues to improve agent reliability and
model support:
1. **`codeExec` Output Limitation**: Previously, the `codeExec` tool was
strictly limited to returning `string` types. I updated the output
constraint to `object` to support structured data (Dicts, Lists, etc.)
required for complex downstream tasks.
2. **`codeExec` Error Handling**: Improved the execution logic so that
when runtime errors occur, the tool captures the exception and returns
the error message as the output instead of causing the process to abort
or fail silently.
3. **Spark Model Configuration**:
- Added support for the `MAX-32k` model variant.
- Fixed the `Spark-Lite` mapping from `general` to `lite` to match the
latest API specifications.
## Type of change
- [x] Bug Fix (fixes execution logic and model mapping)
- [x] New Feature / Enhancement (adds model support and improves tool
flexibility)
## Key Changes
### `agent/tools/code_exec.py`
- Changed the output type definition from `string` to `object`.
- Refactored the execution flow to gracefully catch exceptions and
return error messages as part of the tool output.
### `rag/llm/chat_model.py`
- Added `"Spark-Max-32K": "max-32k"` to the model list.
- Updated `"Spark-Lite"` value from `"general"` to `"lite"`.
## Checklist
- [x] My code follows the style guidelines of this project.
- [x] I have performed a self-review of my own code.
Signed-off-by: evilhero <2278596667@qq.com>
### Issue
When using Qwen3 models (`qwen3-32b`, `qwen3-max`) through the
Tongyi-Qianwen provider for non-streaming calls (e.g., knowledge graph
generation), the API fails with:
Closes#12424
```
parameter.enable_thinking must be set to false for non-streaming calls
```
### Root Cause
In `LiteLLMBase.async_chat()`, the `extra_body={"enable_thinking":
False}` was set in `kwargs` but never forwarded to
`_construct_completion_args()`.
### What problem does this PR solve?
Pass merged kwargs to `_construct_completion_args()` using
`**{**gen_conf, **kwargs}` to safely handle potential duplicate
parameters.
### Changes
- `rag/llm/chat_model.py`: Forward kwargs containing `extra_body` to
`_construct_completion_args()` in `async_chat()`
_Briefly describe what this PR aims to solve. Include background context
that will help reviewers understand the purpose of the PR._
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
Contribution by Gittensor, see my contribution statistics at
https://gittensor.io/miners/details?githubId=42954461
### What problem does this PR solve?
Add PaddleOCR as a new PDF parser.
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Feat: bedrock iam authentication #12008
### Type of change
- [x] New Feature (non-breaking change which adds functionality)
### What problem does this PR solve?
Only support MinerU-API now, still need to complete frontend for
pipeline to allow the configuration of MinerU options.
### Type of change
- [x] Refactoring
我已在下面的评论中用中文重复说明。
### What problem does this PR solve?
## Summary
This PR enhances the MinerU document parser with additional
configuration options, giving users more control over PDF parsing
behavior and improving support for multilingual documents.
## Changes
### Backend (`deepdoc/parser/mineru_parser.py`)
- Added configurable parsing options:
- **Parse Method**: `auto`, `txt`, or `ocr` — allows users to choose the
extraction strategy
- **Formula Recognition**: Toggle for enabling/disabling formula
extraction (useful to disable for Cyrillic documents where it may cause
issues)
- **Table Recognition**: Toggle for enabling/disabling table extraction
- Added language code mapping (`LANGUAGE_TO_MINERU_MAP`) to translate
RAGFlow language settings to MinerU-compatible language codes for better
OCR accuracy
- Improved parser configuration handling to pass these options through
the processing pipeline
### Frontend (`web/`)
- Created new `MinerUOptionsFormField` component that conditionally
renders when MinerU is selected as the layout recognition engine
- Added UI controls for:
- Parse method selection (dropdown)
- Formula recognition toggle (switch)
- Table recognition toggle (switch)
- Added i18n translations for English and Chinese
- Integrated the options into both the dataset creation dialog and
dataset settings page
### Integration
- Updated `rag/app/naive.py` to forward MinerU options to the parser
- Updated task service to handle the new configuration parameters
## Why
MinerU is a powerful document parser, but the default settings don't
work well for all document types. This PR allows users to:
1. Choose the best parsing method for their documents
2. Disable formula recognition for Cyrillic/non-Latin scripts where it
causes issues
3. Control table extraction based on document needs
4. Benefit from automatic language detection for better OCR results
## Testing
- [x] Tested MinerU parsing with different parse methods
- [x] Verified UI renders correctly when MinerU is selected/deselected
- [x] Confirmed settings persist correctly in dataset configuration
### Type of change
- [x] Bug Fix (non-breaking change which fixes an issue)
- [x] New Feature (non-breaking change which adds functionality)
- [ ] Documentation Update
- [x] Refactoring
- [ ] Performance Improvement
- [ ] Other (please describe):
---------
Co-authored-by: user210 <user210@rt>
Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>