Files
ragflow/conf/models/xinference.json
Renzo 394cd5d116 Go: implement Embed in Xinference driver (#14932)
## Summary

- Replaces the `"no such method"` stub on `XinferenceModel.Embed`
(`internal/entity/models/xinference.go`) with a real implementation
against Xinference's OpenAI-compatible `/v1/embeddings` endpoint.
- Adds the `"embedding": "v1/embeddings"` URL suffix to
`conf/models/xinference.json`.
- Mirrors the Python `XinferenceEmbed` class in
`rag/llm/embedding_model.py:407` for payload shape (OpenAI-compatible
`model + input` → `data[*].index + data[*].embedding`) and tolerates the
same no-auth default Xinference deployments use. Authorization is only
sent when a non-empty API key is configured, via the existing
`setXinferenceAuth` helper.
- Reuses the existing `normalizeXinferenceBaseURL` + `baseURLForRegion`
helpers so both `http://127.0.0.1:9997` and `http://127.0.0.1:9997/v1`
resolve to the same `/v1/embeddings` target without doubled `/v1`.
- Validates response indices — duplicate, missing, or out-of-range
`data[*].index` values fail with a clear error rather than silently
producing misaligned vectors.
- Returns `[]EmbeddingData` in original input order (placed by `Index`)
so downstream callers can index positionally without re-sorting.
- Forwards `EmbeddingConfig.Dimension` as `dimensions` when `> 0`,
matching the OpenAI cluster pattern.

Closes #14810

Co-authored-by: Jin Hai <haijin.chn@gmail.com>
2026-05-21 11:47:30 +08:00

11 lines
192 B
JSON

{
"name": "xinference",
"url_suffix": {
"chat": "v1/chat/completions",
"embedding": "v1/embeddings",
"models": "v1/models",
"rerank": "v1/rerank"
},
"class": "local"
}