mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-05-27 19:25:58 +08:00
### What problem does this PR solve? Add a Go driver for Novita.ai (https://novita.ai), one of the unchecked providers on the umbrella tracking issue #14736. Novita exposes an OpenAI-compatible REST API at `https://api.novita.ai/v3/openai` and proxies a large catalog of third-party models (DeepSeek, Llama, Qwen3, Kimi, Gemma, Mistral, MiniMax, GLM, etc.) behind a single OpenAI-shaped surface — 102 models live at the time of writing. Until this PR, a tenant who configured `novita` as a model provider in the Go layer fell through to the default branch of `internal/entity/models/factory.go` and got the dummy driver. ### What this PR includes - New `internal/entity/models/novita.go` with a `NovitaModel` implementing the `ModelDriver` interface (~520 lines). - New `conf/models/novita.json` with 7 representative chat models (DeepSeek-V4, Llama-3.3-70B, Qwen3-30B/235B reasoning, Kimi-K2, Gemma-3-27B, Mistral-Nemo). - `factory.go`: route `"novita"` to `NewNovitaModel`. - `internal/entity/models/novita_test.go`: 23 unit tests. ### Notable design point: `<think>...</think>` reasoning extraction Novita-routed reasoning models like `qwen3-*` and `deepseek-r1-*` embed their chain-of-thought **inline inside content as `<think>...</think>` tags**, rather than in a separate `reasoning_content` field. Verified live by probing `api.novita.ai`: ``` content head 200: <think> Okay, let's see. I need to find 15% of 80. Hmm, percentages can sometimes be tricky, but I think content tail 100: h, that works. Alternatively, 0.15 × 80. If I move the decimal two places to the left for </think> ``` Without handling, a tenant picking qwen3 via Novita would see raw `<think>` tags in their UI answer — different from every other reasoning provider in the Go layer. The driver detects those tags and routes the inner text to `ChatResponse.ReasonContent` (non-stream) or the sender's second arg (stream), keeping the visible answer clean of tag clutter: - **`splitNovitaThink`** — scans a complete content string. Used by the non-streaming path. Handles multiple `<think>` blocks, unclosed tags (the model got cut off mid-reasoning), pure-text content with no tags. - **`novitaThinkExtractor`** — stateful streaming version. Buffers trailing bytes that might be the start of a tag (e.g. `<thi` held back when the next chunk completes `nk>`), then emits segments in routing order so callers can pipe them to a UI. Tested with byte-level chunk boundaries and tag-spanning scenarios. ### Method coverage | Method | Behavior | |---|---| | `ChatWithMessages` | `POST /v3/openai/chat/completions`, `<think>` extraction on response | | `ChatStreamlyWithSender` | SSE stream, stateful `<think>` extraction across deltas | | `ListModels` / `CheckConnection` | `GET /v3/openai/models` (102 live) | | `Embed` / `Rerank` / `Balance` / `TranscribeAudio` / `AudioSpeech` / `OCRFile` | `"no such method"` — Novita's OpenAI-compatible surface does not expose any | No interface change. No new dependencies. ### How was this tested? **23 unit tests** in `internal/entity/models/novita_test.go` — all pass: ``` $ go test -vet=off -run "TestNovita|TestSplitNovita" -count=1 ./internal/entity/models/... ok ragflow/internal/entity/models 0.020s ``` Coverage: - `splitNovitaThink` (5 cases: pure text, single block, leading text, multiple blocks, unclosed tag) - `novitaThinkExtractor` (6 cases: single-chunk, opening tag span, closing tag span, byte-level chunking, no tags, lone `<` not as tag start) - `ChatWithMessages`: pure text, with `<think>` tags, missing API key, empty messages, HTTP error - `ChatStreamlyWithSender`: tag-stripping with spanning deltas, pure content, sender-required, stream-true-required - `ListModels` / `CheckConnection` (happy paths) - All sentinel methods `go build ./internal/entity/models/...` exits 0 on go 1.25. **Live integration test** against `api.novita.ai/v3/openai`: ``` === RUN TestNovitaLiveSmoke [OK] Name() = "novita" [OK] CheckConnection [OK] ListModels: 102 models (showing first 6) [deepseek/deepseek-v4-pro deepseek/deepseek-v4-flash deepseek/deepseek-v3.2 xiaomimimo/mimo-v2.5-pro moonshotai/kimi-k2.6 zai-org/glm-5.1] [OK] Chat (llama-3.3) answer="ok" reason="" [OK] Chat (qwen3) answer len=0 head="" ReasonContent len=1657 head="Okay, so I need to figure out what 15% of 80 is. Hmm, percentages can sometimes trip me up, but let ..." [OK] Stream content: 0 chunks, 0 chars; reasoning: 600 chunks, 1667 chars [OK] Embed/Rerank/Balance/TranscribeAudio/AudioSpeech/OCRFile all return "novita, no such method" NOVITA LIVE SMOKE PASSED --- PASS: TestNovitaLiveSmoke (26.18s) ``` What the live run proves on the wire: - Auth (`Bearer <key>`) accepted by `api.novita.ai`. - `/v3/openai/models` parser handles the real 102-model response. - Non-stream chat against `meta-llama/llama-3.3-70b-instruct`: clean string answer, empty ReasonContent (non-reasoning model, pure-text path). - Non-stream chat against `qwen/qwen3-30b-a3b-fp8`: 1657-char reasoning extracted from `<think>...</think>` and routed to `ChatResponse.ReasonContent`. Visible answer is 0 chars in this run because qwen3 spent its 600-token budget entirely on reasoning before reaching the answer phase — that's the model's behavior, not a driver bug. The important thing: **no `<think>` tags leaked into the visible Answer field**. - Streaming against qwen3: 600 reasoning chunks (1667 chars) emitted via the sender's 2nd arg across SSE deltas; **no `<think>` tag fragments leaked into the content channel** despite tag boundaries crossing chunk boundaries on the wire. - All 6 sentinel methods return the documented `"no such method"` strings. ### Type of change - [x] New Feature (non-breaking change which adds functionality) Tracking: #14736