[CI/Build] Replace vllm.entrypoints.openai.api_server entrypoint with vllm serve command (#25967)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-03 01:04:57 +08:00
parent 3b279a84be
commit d00d652998
22 changed files with 101 additions and 66 deletions
--- a/docs/features/sleep_mode.md
+++ b/docs/features/sleep_mode.md
@ -64,8 +64,7 @@ To enable sleep mode in a vLLM server you need to initialize it with the flag `V
 When using the flag `VLLM_SERVER_DEV_MODE=1` you enable development endpoints, and these endpoints should not be exposed to users.

 ```bash
-VLLM_SERVER_DEV_MODE=1 python -m vllm.entrypoints.openai.api_server \
-  --model Qwen/Qwen3-0.6B \
+VLLM_SERVER_DEV_MODE=1 vllm serve Qwen/Qwen3-0.6B \
  --enable-sleep-mode \
  --port 8000
 ```
--- a/docs/features/spec_decode.md
+++ b/docs/features/spec_decode.md
@ -48,10 +48,9 @@ The following code configures vLLM in an offline mode to use speculative decodin
 To perform the same with an online mode launch the server:

 ```bash
-python -m vllm.entrypoints.openai.api_server \
+vllm serve facebook/opt-6.7b \
    --host 0.0.0.0 \
    --port 8000 \
-    --model facebook/opt-6.7b \
    --seed 42 \
    -tp 1 \
    --gpu_memory_utilization 0.8 \