ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-06-01 05:17:51 +08:00

Files

Panda Dev fa53b93dd5 Go: implement Encode (embeddings) in vLLM driver (#14688 )

### What problem does this PR solve?

The vLLM Go driver shipped with a stub \`Encode\` method that returned
\`not implemented\`, even though vLLM is one of the most common
production-grade self-hosted inference servers and exposes an
OpenAI-compatible embeddings endpoint at \`/v1/embeddings\`.

Users who self-host \`BAAI/bge-m3\`, \`Qwen3-Embedding-*\`,
\`NV-Embed-v2\`, or similar models on vLLM could not run an embedding
call through the Go layer. The existing \`ListModels\` already discovers
the loaded models, but the embedding path failed because \`Encode\` was
a stub.

### What this PR includes

- \`conf/models/vllm.json\`: add \`\"embedding\": \"embeddings\"\` under
\`url_suffix\` so the driver can build the URL from config.
- \`internal/entity/models/vllm.go\`: replace the \`Encode\` stub with a
real implementation. Adds a small local response
  type that matches the OpenAI-compatible shape.

No factory change. No interface change.

### How the driver works

- Validate the model name. The API key is optional for self-hosted vLLM,
so the Authorization header is only set when both \`apiConfig\` and
\`ApiKey\` are non-nil and non-empty, the same pattern the recently
merged CheckConnection PR (#14614) uses.
- Resolve the region with a default fallback. Return a clear "missing
base URL" error when the user has not configured
  the local access address yet.
- Use a per-call \`context.WithTimeout(30s)\` and
\`http.NewRequestWithContext\`, the same pattern the merged
  Aliyun Encode (#14647) and in-flight Ollama Encode (#14664) use.
- Send \`{model, input: [texts]}\` in one request.
- Parse \`data[*].embedding\` and copy each slice into a \`[][]float64\`
indexed by \`data[*].index\`, so the output
  order matches the input order.
- Handle both \`float64\` and \`float32\` element types.
- Empty input returns \`[][]float64{}\` with no HTTP call.
- Length mismatch between input and result, out-of-range index, and any
missing slot all return clear errors instead
  of silent zero vectors.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

### How was this tested?

- \`go build ./internal/entity/models/...\` in a clean go 1.25 image
returns exit 0.
- The full method set on \`VllmModel\` still matches the \`ModelDriver\`
interface.
- Pattern parity with the merged Aliyun Encode (#14647), the in-flight
Ollama Encode (#14664), and the existing
  SiliconFlow Encode.

Closes #14687

2026-05-11 12:09:17 +08:00

models

Go: implement Encode (embeddings) in vLLM driver (#14688 )

2026-05-11 12:09:17 +08:00

api_token.go

Add rename model directory to entity to avoid name misunderstanding (#13829 )

2026-03-27 19:25:18 +08:00

base.go

Add rename model directory to entity to avoid name misunderstanding (#13829 )

2026-03-27 19:25:18 +08:00

canvas.go

Add rename model directory to entity to avoid name misunderstanding (#13829 )