mirror of
https://github.com/infiniflow/ragflow.git
synced 2026-05-29 20:17:35 +08:00
### What problem does this PR solve? Closes #14878. `VllmModel.Rerank()` in [internal/entity/models/vllm.go:551](internal/entity/models/vllm.go#L551) is currently a stub returning `nil, fmt.Errorf("%s, Rerank not implemented", z.Name())`, and [conf/models/vllm.json](conf/models/vllm.json) is missing a `rerank` entry in `url_suffix`. Chat (long-standing) and embeddings (#14688) already work, so rerank is the last missing leg of the retrieval pipeline for operators running everything on a single self-hosted vLLM server — today they have to point rerank at a different provider, which defeats the point of a fully local deployment. Upstream vLLM has supported a Jina/Cohere-compatible `POST /v1/rerank` endpoint since v0.7 ([vllm-project/vllm#12376](https://github.com/vllm-project/vllm/pull/12376)). The request/response shape is essentially identical to the NVIDIA driver landed in #14778, so this PR mirrors that structure with two vLLM-specific adjustments. This PR replaces the stub with a real implementation against vLLM's `/v1/rerank`: - `POST {baseURL}/rerank` - Request body: `{"model": "<modelName>", "query": "<query>", "documents": [...], "top_n": <int>}` — documents are a flat `[]string`, **not** wrapped as `{text: "..."}` like NVIDIA's `/ranking`. - Response body: `{"results": [{"index": int, "relevance_score": float}, ...]}` (Jina-compatible; the optional `document` field is ignored since callers reconstruct text via `Index`). - `Authorization: Bearer <ApiKey>` is set **only when `APIConfig.ApiKey` is non-empty**, matching the existing `Embed`/`ListModels` behaviour in this file. vLLM is a local driver and can be deployed without an API key. The return shape matches the existing `*RerankResponse` contract used by the NVIDIA ([nvidia.go:461](internal/entity/models/nvidia.go#L461)), Aliyun ([aliyun.go:507](internal/entity/models/aliyun.go#L507)), and ZhipuAI ([zhipu-ai.go:554](internal/entity/models/zhipu-ai.go#L554)) drivers, i.e. `Data []RerankResult` carrying `{Index, RelevanceScore}` in the API's ranking order. Callers that need original-input order sort by `Index`. Behaviour requirements from the issue, all covered: 1. Empty `documents` → returns `&RerankResponse{}` without an HTTP call. 2. Missing `modelName` → `"model name is required"` validation error. 3. `rerankConfig.TopN` honored when `0 < TopN < len(documents)`; otherwise `top_n` defaults to `len(documents)` so callers get a score per input. 4. Non-200 responses return an error including upstream status and body (`"vLLM rerank API error: <status>, body: <body>"`). 5. Response `index` values are bounds-checked against `len(documents)`. **Scope:** - [internal/entity/models/vllm.go](internal/entity/models/vllm.go) — replaces the `Rerank` stub at line 551 with a real implementation; adds `vllmRerankRequest`/`vllmRerankResponse` types for the slim subset of the payload we need. Region/baseURL resolution, 30s context timeout, conditional bearer header, and error wrapping all follow the existing patterns in this file. - [conf/models/vllm.json](conf/models/vllm.json) — adds `"rerank": "rerank"` to `url_suffix`, joined to the operator-configured vLLM base URL the same way the NVIDIA driver joins at [nvidia.go:485](internal/entity/models/nvidia.go#L485). - [internal/entity/models/vllm_rerank_test.go](internal/entity/models/vllm_rerank_test.go) — adds 7 `httptest`-backed tests mirroring `nvidia_rerank_test.go`: happy path (out-of-order ranking → Index preservation), `top_n` clamp to `RerankConfig.TopN`, empty-documents short-circuit, missing-model-name validation, HTTP error propagation, out-of-range index rejection, and a vLLM-specific `TestVllmRerankWithoutAPIKey` locking in the optional-auth behaviour that distinguishes this driver from NVIDIA. **Out of scope:** no interface change, no DDL, no frontend change. Chat, embeddings, and balance paths are untouched. No new user-facing docs required beyond the existing rerank model setup page — vLLM joins the list of providers whose rerank model can be selected once `/v1/rerank` is exposed by the server. ### Type of change - [x] New Feature (non-breaking change which adds functionality)
10 lines
173 B
JSON
10 lines
173 B
JSON
{
|
|
"name": "vllm",
|
|
"url_suffix": {
|
|
"chat": "chat/completions",
|
|
"models": "models",
|
|
"embedding": "embeddings",
|
|
"rerank": "rerank"
|
|
},
|
|
"class": "local"
|
|
} |