[CI/Build] Replace vllm.entrypoints.openai.api_server entrypoint with vllm serve command (#25967)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
This commit is contained in:
Cyrus Leung
2025-10-03 01:04:57 +08:00
committed by yewentao256
parent 5c8a4a2208
commit fa179abde3
22 changed files with 101 additions and 66 deletions

View File

@ -67,8 +67,7 @@ docker run -it \
XPU platform supports **tensor parallel** inference/serving and also supports **pipeline parallel** as a beta feature for online serving. For **pipeline parallel**, we support it on single node with mp as the backend. For example, a reference execution like following:
```bash
python -m vllm.entrypoints.openai.api_server \
--model=facebook/opt-13b \
vllm serve facebook/opt-13b \
--dtype=bfloat16 \
--max_model_len=1024 \
--distributed-executor-backend=mp \