Commit Graph

3 Commits

Author SHA1 Message Date
85d0b46d8e fix(mistral): handle structured content from magistral reasoning models (#14805)
### What problem does this PR solve?

`MistralModel.ChatWithMessages` (in the driver merged via #14807)
assumes that `choices[0].message.content` from `/v1/chat/completions` is
always a string and falls through to `return nil, fmt.Errorf("invalid
content format")` on anything else.

That assumption breaks for the **magistral reasoning family**
(`magistral-small-*`, `magistral-medium-*`). When the model needs a
chain-of-thought to answer, Mistral returns `content` as a **structured
array of typed parts**:

```json
"content": [
  {"type": "thinking",
   "thinking": [{"type": "text", "text": "Combined speed is 150 mph. 300 / 150 = 2 hours."}],
   "closed": true},
  {"type": "text", "text": "They will meet after **2 hours**."}
]
```

Concretely, this is what the live API returns today (probed against
`api.mistral.ai/v1`):

```
$ curl -H "Authorization: Bearer <key>" -H "Content-Type: application/json" \
       -X POST https://api.mistral.ai/v1/chat/completions \
       -d '{"model":"magistral-medium-latest",
            "messages":[{"role":"user","content":"two trains 60mph and 90mph, 300mi apart, when do they meet? step by step."}],
            "max_tokens":1024}'
HTTP 200
{ "choices":[{"message":{
    "role":"assistant",
    "content":[
      {"type":"thinking","thinking":[{"type":"text","text":"Okay, let's see..."}],"closed":true},
      {"type":"text","text":"To determine when the two trains meet..."}
    ]}}] }
```

With the current driver, every call like that returns the generic
`"invalid content format"` error. Trivial prompts that happen to fit in
a string answer still succeed, so the breakage is **non-deterministic
from the tenant's POV**: same model, same provider, sometimes works,
sometimes 500s with no useful error.

A secondary issue: `conf/models/mistral.json` does not include any
magistral model. The picker hid the broken path, which is why this
wasn't caught during #14807's review.

### What this PR includes

- New helper `extractMistralContent(raw interface{}) (answer,
reasonContent string, err error)` in
`internal/entity/models/mistral.go`, which normalizes both shapes
Mistral can return:
- `string` → historical path. `Answer = content`, `ReasonContent = ""`.
Preserves behavior for every non-reasoning model (`mistral-large-*`,
`mistral-small-*`, `ministral-*`, `codestral-*`, `pixtral-*`,
`open-mistral-nemo`).
- `[]interface{}` → walk the parts. Concatenate every `{"type":"text",
"text":...}` part into `Answer`; concatenate the inner text inside every
`{"type":"thinking", "thinking":[...]}` part into `ReasonContent`.
- `ChatWithMessages` now calls the helper instead of doing the raw
`.(string)` cast.
- Unknown part types are **skipped, not failed**. Mistral has been
adding new content variants quickly (audio chunks, citations, etc.);
this driver should not 500 every call when a new part type appears.
- `conf/models/mistral.json`: add `magistral-medium-latest` and
`magistral-small-latest`. Both are visible in `/v1/models` today.

No interface change. No factory change. No new dependencies.

### How was this tested?

**Unit tests** — 5 new tests in `internal/entity/models/mistral_test.go`
on top of the 27 already shipped via #14807:

- `TestMistralChatHandlesStringContent` — regression net for the
historical path
- `TestMistralChatExtractsReasoningFromStructuredContent` — the fixture
body is a trimmed copy of the actual `magistral-medium-latest` response
captured above; asserts both `Answer` and `ReasonContent` are populated
correctly
- `TestMistralChatHandlesStructuredContentWithoutThinking` —
`magistral-*` with a trivial answer returns a structured shape that has
only a `text` part; `ReasonContent` must stay empty
- `TestMistralChatIgnoresUnknownContentPartTypes` — `audio_url` and
`future_part_type` parts are skipped, `text` parts still flow through
- `TestExtractMistralContent` — table-driven unit coverage of the helper
for string, empty string, nil, empty array, text-only, thinking+text,
unsupported root type

```
$ go test -vet=off -run "TestMistral|TestExtractMistralContent" -count=1 -v ./internal/entity/models/...
=== RUN   TestMistralChatHandlesStringContent
--- PASS: TestMistralChatHandlesStringContent (0.00s)
=== RUN   TestMistralChatExtractsReasoningFromStructuredContent
--- PASS: TestMistralChatExtractsReasoningFromStructuredContent (0.00s)
=== RUN   TestMistralChatHandlesStructuredContentWithoutThinking
--- PASS: TestMistralChatHandlesStructuredContentWithoutThinking (0.00s)
=== RUN   TestMistralChatIgnoresUnknownContentPartTypes
--- PASS: TestMistralChatIgnoresUnknownContentPartTypes (0.00s)
=== RUN   TestExtractMistralContent
=== RUN   TestExtractMistralContent/plain_string
=== RUN   TestExtractMistralContent/empty_string
=== RUN   TestExtractMistralContent/nil
=== RUN   TestExtractMistralContent/empty_array
=== RUN   TestExtractMistralContent/text_only
=== RUN   TestExtractMistralContent/thinking_then_text
=== RUN   TestExtractMistralContent/unknown_root_type
--- PASS: TestExtractMistralContent (0.00s)
PASS
ok      ragflow/internal/entity/models  0.046s
```

All 32 Mistral tests pass on go 1.25. `go build
./internal/entity/models/...` exits 0.

**Live integration test** — driver exercised against `api.mistral.ai/v1`
with the patched code:

```
=== RUN   TestMistralMagistralSmoke
    [OK] "magistral-small-latest" present upstream
    [OK] "magistral-medium-latest" present upstream
    [OK trivial]    Answer="7"  ReasonContent=""
    [OK reasoning]  Answer len=797   head="To determine when the two trains meet, we can follow these steps:\n\n1. **Identify..."
                    ReasonContent len=1069 head="Okay, let's see. There are two trains, one going 60 mph and the other going 90 mph. They're moving towards each other, s..."

MAGISTRAL SMOKE PASSED
--- PASS: TestMistralMagistralSmoke (18.09s)
PASS
ok      ragflow/internal/entity/models  18.112s
```

What the live run proves on the wire:

- `magistral-small-latest` with a trivial prompt still uses the
string-content shape; the regression-net path is exercised against the
real server, not just the mock.
- `magistral-medium-latest` with a reasoning prompt uses the
structured-array shape; the new code path extracts a 1069-character
reasoning trace into `ChatResponse.ReasonContent` and a 797-character
visible answer into `ChatResponse.Answer`. Before this fix, the same
call returned `"invalid content format"` and the caller saw nothing.

The smoke-test file itself is not committed (live tests live outside the
PR diff, same convention used for prior provider PRs).

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)
2026-05-21 15:33:14 +08:00
bf41d35729 Go: implement PaddleOCR provider and implement ASR for CoHere (#14954)
### What problem does this PR solve?

This PR implement implement OCR for Baidu and Mistral, implement
PaddleOCR provider and implement ASR for CoHere

**Verified examples from the CLI:**

```
RAGFlow(user)> ocr with 'mistral-ocr-2512@test@mistral' file './internal/text.jpg'
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| text                                                                                                                                                                                                                                                             |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Parallel to these organizational innovations there were significant complementary technical innovations (e.g., improved methods of manufacturing cast-iron pipe and of coating interiors for pressure maintenance, and newer paving and construction material... |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


RAGFlow(user)> ocr with 'paddleocr-vl-0.9b@test@baidu' file './internal/text.jpg'
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| text                                                                                                                                                                                                                                                             |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Parallel to these organizational innovations there were significant complementary technical innovations (e.g., improved methods of manufacturing cast-iron pipe and of coating interiors for pressure maintenance, and newer paving and construction material... |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

# PaddleOCR
RAGFlow(user)> ocr with 'PaddleOCR-VL-1.5@test@paddleocr' file './internal/test.pdf'
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| text                                                                                                                                                                                                                                                             |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| # Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation

Bingxin Ke

Nando Metzger

Photogra

Anton Obukhov

Rodrigo Caye Daudt

netry and Remote Sensing,

Shengyu Huang

Konrad Schindler

ETH Zürich





<div style="text-align: c...  |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

# Cohere

RAGFlow(user)> asr with 'cohere-transcribe-03-2026@test@cohere' audio './internal/test.wav' param '{"language": "en"}'
+-----------------------------------------------------------------------------------------------------------------------+
| text                                                                                                                  |
+-----------------------------------------------------------------------------------------------------------------------+
|  The examination and testimony of the experts enabled the Commission to conclude that five shots may have been fired. |
+-----------------------------------------------------------------------------------------------------------------------+
```

### Type of change

- [x] New Feature (non-breaking change which adds functionality)
- [x] Refactoring
2026-05-15 18:41:43 +08:00
7d3836907a Go: implement Embed (embeddings) in Mistral driver (#14807)
### What problem does this PR solve?

The Mistral Go driver landed in #14805 with chat, list models, and check
connection. `Embed` was left as a stub that returns `"not implemented"`.
This PR fills the gap.

`conf/models/mistral.json` did not list any embedding model out of the
box, so a tenant who wanted to use Mistral end to end (chat +
embeddings) could not run an embedding call. This PR adds
`mistral-embed` to the config and a real `/v1/embeddings`
implementation.

### What this PR includes

- `conf/models/mistral.json`: add `"embedding": "embeddings"` under
`url_suffix` so the driver can build the URL from config (matches the
`URLSuffix.Embedding` field already used by openai, siliconflow,
zhipu-ai), and add a `mistral-embed` entry under `models`
(1024-dimensional vectors, 8192 max input tokens).
- `internal/entity/models/mistral.go`: replace the `Embed` stub with a
real implementation that POSTs to `/v1/embeddings`. Adds local response
types `mistralEmbeddingData` and `mistralEmbeddingResponse`.

No factory change. No interface change.

### How the implementation works

- Validate `apiConfig`, the API key, and the model name. Use the
existing `baseURLForRegion` helper so an unknown region fails fast with
a clear error.
- Wrap the request with `context.WithTimeout(nonStreamCallTimeout)` so
the call has a clear deadline. Same pattern as `ChatWithMessages` and
`ListModels` already use in this file.
- Send all input texts in one request. The Mistral API accepts the
`input` field as an array.
- Parse `data[*].embedding` and copy each slice into a `[]EmbeddingData`
indexed by `data[*].index` so the output order matches the input order
even if the API returns items in a different order.
- An empty input slice returns `[]EmbeddingData{}` with no HTTP call.
- Non-200 responses propagate the upstream status line and body.
- A final pass checks that every input slot got a vector. If any slot is
still empty, return a clear error so the caller does not silently use a
zero vector.

### Note on stacking

This PR builds on #14805 (the Mistral driver). Until #14805 merges, this
PR's diff on GitHub will include both that PR's commits and this one.
After #14805 lands on `main`, GitHub will auto-reduce this PR to only
the `Embed` changes (one commit, ~111 line diff in `mistral.go` plus 8
lines in `mistral.json`).

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

### How was this tested?

- `go build ./internal/entity/models/...` returns exit 0 on go 1.25 (the
`go.mod` minimum).
- The full method set on `MistralModel` still matches the `ModelDriver`
interface.
- Pattern parity with the existing OpenAI Embed implementation
(`internal/entity/models/openai.go`).

Closes #14806
Depends on #14805
Tracking: #14736

---------

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-12 17:45:48 +08:00