[Misc] Fix examples openai_pooling_client.py (#24853)
Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
@ -228,7 +228,7 @@ outputs = llm.embed(["Follow the white rabbit."],
|
||||
print(outputs[0].outputs)
|
||||
```
|
||||
|
||||
A code example can be found here: <gh-file:examples/offline_inference/embed_matryoshka_fy.py>
|
||||
A code example can be found here: <gh-file:examples/offline_inference/pooling/embed_matryoshka_fy.py>
|
||||
|
||||
### Online Inference
|
||||
|
||||
@ -258,4 +258,4 @@ Expected output:
|
||||
{"id":"embd-5c21fc9a5c9d4384a1b021daccaf9f64","object":"list","created":1745476417,"model":"jinaai/jina-embeddings-v3","data":[{"index":0,"object":"embedding","embedding":[-0.3828125,-0.1357421875,0.03759765625,0.125,0.21875,0.09521484375,-0.003662109375,0.1591796875,-0.130859375,-0.0869140625,-0.1982421875,0.1689453125,-0.220703125,0.1728515625,-0.2275390625,-0.0712890625,-0.162109375,-0.283203125,-0.055419921875,-0.0693359375,0.031982421875,-0.04052734375,-0.2734375,0.1826171875,-0.091796875,0.220703125,0.37890625,-0.0888671875,-0.12890625,-0.021484375,-0.0091552734375,0.23046875]}],"usage":{"prompt_tokens":8,"total_tokens":8,"completion_tokens":0,"prompt_tokens_details":null}}
|
||||
```
|
||||
|
||||
An OpenAI client example can be found here: <gh-file:examples/online_serving/openai_embedding_matryoshka_fy.py>
|
||||
An OpenAI client example can be found here: <gh-file:examples/online_serving/pooling/openai_embedding_matryoshka_fy.py>
|
||||
|
||||
@ -530,7 +530,7 @@ These models primarily support the [`LLM.score`](./pooling_models.md#llmscore) A
|
||||
```
|
||||
|
||||
!!! note
|
||||
Load the official original `Qwen3 Reranker` by using the following command. More information can be found at: <gh-file:examples/offline_inference/qwen3_reranker.py>.
|
||||
Load the official original `Qwen3 Reranker` by using the following command. More information can be found at: <gh-file:examples/offline_inference/pooling/qwen3_reranker.py>.
|
||||
|
||||
```bash
|
||||
vllm serve Qwen/Qwen3-Reranker-0.6B --hf_overrides '{"architectures": ["Qwen3ForSequenceClassification"],"classifier_from_token": ["no", "yes"],"is_original_qwen3_reranker": true}'
|
||||
|
||||
@ -239,7 +239,7 @@ you can use the [official OpenAI Python client](https://github.com/openai/openai
|
||||
If the model has a [chat template][chat-template], you can replace `inputs` with a list of `messages` (same schema as [Chat API][chat-api])
|
||||
which will be treated as a single prompt to the model.
|
||||
|
||||
Code example: <gh-file:examples/online_serving/openai_embedding_client.py>
|
||||
Code example: <gh-file:examples/online_serving/pooling/openai_embedding_client.py>
|
||||
|
||||
#### Multi-modal inputs
|
||||
|
||||
@ -313,7 +313,7 @@ and passing a list of `messages` in the request. Refer to the examples below for
|
||||
`MrLight/dse-qwen2-2b-mrl-v1` requires a placeholder image of the minimum image size for text query embeddings. See the full code
|
||||
example below for details.
|
||||
|
||||
Full example: <gh-file:examples/online_serving/openai_chat_embedding_client_for_multimodal.py>
|
||||
Full example: <gh-file:examples/online_serving/pooling/openai_chat_embedding_client_for_multimodal.py>
|
||||
|
||||
#### Extra parameters
|
||||
|
||||
@ -421,7 +421,7 @@ Our Pooling API encodes input prompts using a [pooling model](../models/pooling_
|
||||
|
||||
The input format is the same as [Embeddings API][embeddings-api], but the output data can contain an arbitrary nested list, not just a 1-D list of floats.
|
||||
|
||||
Code example: <gh-file:examples/online_serving/openai_pooling_client.py>
|
||||
Code example: <gh-file:examples/online_serving/pooling/openai_pooling_client.py>
|
||||
|
||||
[](){ #classification-api }
|
||||
|
||||
@ -431,7 +431,7 @@ Our Classification API directly supports Hugging Face sequence-classification mo
|
||||
|
||||
We automatically wrap any other transformer via `as_seq_cls_model()`, which pools on the last token, attaches a `RowParallelLinear` head, and applies a softmax to produce per-class probabilities.
|
||||
|
||||
Code example: <gh-file:examples/online_serving/openai_classification_client.py>
|
||||
Code example: <gh-file:examples/online_serving/pooling/openai_classification_client.py>
|
||||
|
||||
#### Example Requests
|
||||
|
||||
@ -760,7 +760,7 @@ endpoints are compatible with both [Jina AI's re-rank API interface](https://jin
|
||||
[Cohere's re-rank API interface](https://docs.cohere.com/v2/reference/rerank) to ensure compatibility with
|
||||
popular open-source tools.
|
||||
|
||||
Code example: <gh-file:examples/online_serving/jinaai_rerank_client.py>
|
||||
Code example: <gh-file:examples/online_serving/pooling/jinaai_rerank_client.py>
|
||||
|
||||
#### Example Request
|
||||
|
||||
|
||||
Reference in New Issue
Block a user