[Deprecation][2/N] Replace --task with --runner and --convert (#21470)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
@ -9,7 +9,7 @@ Launch the vLLM server with the following command:
|
||||
vllm serve llava-hf/llava-1.5-7b-hf
|
||||
|
||||
(multi-image inference with Phi-3.5-vision-instruct)
|
||||
vllm serve microsoft/Phi-3.5-vision-instruct --task generate \
|
||||
vllm serve microsoft/Phi-3.5-vision-instruct --runner generate \
|
||||
--trust-remote-code --max-model-len 4096 --limit-mm-per-prompt '{"image":2}'
|
||||
|
||||
(audio inference with Ultravox)
|
||||
|
||||
@ -92,7 +92,7 @@ def dse_qwen2_vl(inp: dict):
|
||||
def parse_args():
|
||||
parser = argparse.ArgumentParser(
|
||||
"Script to call a specified VLM through the API. Make sure to serve "
|
||||
"the model with --task embed before running this."
|
||||
"the model with `--runner pooling` before running this."
|
||||
)
|
||||
parser.add_argument(
|
||||
"--model",
|
||||
|
||||
@ -3,7 +3,7 @@
|
||||
"""
|
||||
Example online usage of Score API.
|
||||
|
||||
Run `vllm serve <model> --task score` to start up the server in vLLM.
|
||||
Run `vllm serve <model> --runner pooling` to start up the server in vLLM.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
|
||||
@ -3,7 +3,7 @@
|
||||
"""
|
||||
Example online usage of Score API.
|
||||
|
||||
Run `vllm serve <model> --task score` to start up the server in vLLM.
|
||||
Run `vllm serve <model> --runner pooling` to start up the server in vLLM.
|
||||
"""
|
||||
|
||||
import argparse
|
||||
|
||||
@ -3,7 +3,7 @@
|
||||
"""
|
||||
Example online usage of Pooling API.
|
||||
|
||||
Run `vllm serve <model> --task <embed|classify|reward|score>`
|
||||
Run `vllm serve <model> --runner pooling`
|
||||
to start up the server in vLLM.
|
||||
"""
|
||||
|
||||
|
||||
@ -10,7 +10,7 @@ This script demonstrates how to:
|
||||
|
||||
Run the vLLM server first:
|
||||
vllm serve meta-llama/Llama-3.2-1B-Instruct \
|
||||
--task generate \
|
||||
--runner generate \
|
||||
--max-model-len 4096 \
|
||||
--enable-prompt-embeds
|
||||
|
||||
|
||||
Reference in New Issue
Block a user