ragflow/vllm.json at 67f7d87dff12bab2ea9f1ecf5992c080896ff049 - ragflow - Gitea: Git with a cup of tea

youngkingdom/ragflow

mirror of https://github.com/infiniflow/ragflow.git synced 2026-05-29 20:17:35 +08:00

Files

Hunnyboy1217 86bcf9767d Go: implement Rerank in vLLM driver (#14878 ) (#14880 )

### What problem does this PR solve?

Closes #14878.

`VllmModel.Rerank()` in
[internal/entity/models/vllm.go:551](internal/entity/models/vllm.go#L551)
is currently a stub returning `nil, fmt.Errorf("%s, Rerank not
implemented", z.Name())`, and
[conf/models/vllm.json](conf/models/vllm.json) is missing a `rerank`
entry in `url_suffix`. Chat (long-standing) and embeddings (#14688)
already work, so rerank is the last missing leg of the retrieval
pipeline for operators running everything on a single self-hosted vLLM
server — today they have to point rerank at a different provider, which
defeats the point of a fully local deployment.

Upstream vLLM has supported a Jina/Cohere-compatible `POST /v1/rerank`
endpoint since v0.7
([vllm-project/vllm#12376](https://github.com/vllm-project/vllm/pull/12376)).
The request/response shape is essentially identical to the NVIDIA driver
landed in #14778, so this PR mirrors that structure with two
vLLM-specific adjustments.

This PR replaces the stub with a real implementation against vLLM's
`/v1/rerank`:

- `POST {baseURL}/rerank`
- Request body: `{"model": "<modelName>", "query": "<query>",
"documents": [...], "top_n": <int>}` — documents are a flat `[]string`,
**not** wrapped as `{text: "..."}` like NVIDIA's `/ranking`.
- Response body: `{"results": [{"index": int, "relevance_score": float},
...]}` (Jina-compatible; the optional `document` field is ignored since
callers reconstruct text via `Index`).
- `Authorization: Bearer <ApiKey>` is set **only when `APIConfig.ApiKey`
is non-empty**, matching the existing `Embed`/`ListModels` behaviour in
this file. vLLM is a local driver and can be deployed without an API
key.

The return shape matches the existing `*RerankResponse` contract used by
the NVIDIA ([nvidia.go:461](internal/entity/models/nvidia.go#L461)),
Aliyun ([aliyun.go:507](internal/entity/models/aliyun.go#L507)), and
ZhipuAI ([zhipu-ai.go:554](internal/entity/models/zhipu-ai.go#L554))
drivers, i.e. `Data []RerankResult` carrying `{Index, RelevanceScore}`
in the API's ranking order. Callers that need original-input order sort
by `Index`.

Behaviour requirements from the issue, all covered:

1. Empty `documents` → returns `&RerankResponse{}` without an HTTP call.
2. Missing `modelName` → `"model name is required"` validation error.
3. `rerankConfig.TopN` honored when `0 < TopN < len(documents)`;
otherwise `top_n` defaults to `len(documents)` so callers get a score
per input.
4. Non-200 responses return an error including upstream status and body
(`"vLLM rerank API error: <status>, body: <body>"`).
5. Response `index` values are bounds-checked against `len(documents)`.

**Scope:**

- [internal/entity/models/vllm.go](internal/entity/models/vllm.go) —
replaces the `Rerank` stub at line 551 with a real implementation; adds
`vllmRerankRequest`/`vllmRerankResponse` types for the slim subset of
the payload we need. Region/baseURL resolution, 30s context timeout,
conditional bearer header, and error wrapping all follow the existing
patterns in this file.
- [conf/models/vllm.json](conf/models/vllm.json) — adds `"rerank":
"rerank"` to `url_suffix`, joined to the operator-configured vLLM base
URL the same way the NVIDIA driver joins at
[nvidia.go:485](internal/entity/models/nvidia.go#L485).
-
[internal/entity/models/vllm_rerank_test.go](internal/entity/models/vllm_rerank_test.go)
— adds 7 `httptest`-backed tests mirroring `nvidia_rerank_test.go`:
happy path (out-of-order ranking → Index preservation), `top_n` clamp to
`RerankConfig.TopN`, empty-documents short-circuit, missing-model-name
validation, HTTP error propagation, out-of-range index rejection, and a
vLLM-specific `TestVllmRerankWithoutAPIKey` locking in the optional-auth
behaviour that distinguishes this driver from NVIDIA.

**Out of scope:** no interface change, no DDL, no frontend change. Chat,
embeddings, and balance paths are untouched. No new user-facing docs
required beyond the existing rerank model setup page — vLLM joins the
list of providers whose rerank model can be selected once `/v1/rerank`
is exposed by the server.

### Type of change

- [x] New Feature (non-breaking change which adds functionality)

2026-05-15 13:27:22 +08:00

10 lines

173 B

JSON

Raw Blame History

 {
   "name": "vllm",
   "url_suffix": {
     "chat": "chat/completions",
     "models": "models",
     "embedding": "embeddings",
     "rerank": "rerank"
   },
   "class": "local"
 }