[V0 Deprecation] Remove VLLM_USE_V1 from docs and scripts (#26336)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
Cyrus Leung
2025-10-07 16:46:44 +08:00
committed by GitHub
parent 46b0779996
commit 7e4cd070b0
11 changed files with 17 additions and 26 deletions

View File

@ -97,7 +97,7 @@ python3 disagg_proxy_p2p_nccl_xpyd.py &
??? console "Command"
```shell
VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=0 vllm serve {your model directory} \
CUDA_VISIBLE_DEVICES=0 vllm serve {your model directory} \
--host 0.0.0.0 \
--port 20001 \
--tensor-parallel-size 1 \
@ -118,7 +118,7 @@ python3 disagg_proxy_p2p_nccl_xpyd.py &
??? console "Command"
```shell
VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=1 vllm serve {your model directory} \
CUDA_VISIBLE_DEVICES=1 vllm serve {your model directory} \
--host 0.0.0.0 \
--port 20002 \
--tensor-parallel-size 1 \
@ -139,7 +139,7 @@ python3 disagg_proxy_p2p_nccl_xpyd.py &
??? console "Command"
```shell
VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=2 vllm serve {your model directory} \
CUDA_VISIBLE_DEVICES=2 vllm serve {your model directory} \
--host 0.0.0.0 \
--port 20003 \
--tensor-parallel-size 1 \
@ -160,7 +160,7 @@ python3 disagg_proxy_p2p_nccl_xpyd.py &
??? console "Command"
```shell
VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=3 vllm serve {your model directory} \
CUDA_VISIBLE_DEVICES=3 vllm serve {your model directory} \
--host 0.0.0.0 \
--port 20004 \
--tensor-parallel-size 1 \
@ -190,7 +190,7 @@ python3 disagg_proxy_p2p_nccl_xpyd.py &
??? console "Command"
```shell
VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=0 vllm serve {your model directory} \
CUDA_VISIBLE_DEVICES=0 vllm serve {your model directory} \
--host 0.0.0.0 \
--port 20001 \
--tensor-parallel-size 1 \
@ -211,7 +211,7 @@ python3 disagg_proxy_p2p_nccl_xpyd.py &
??? console "Command"
```shell
VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=1 vllm serve {your model directory} \
CUDA_VISIBLE_DEVICES=1 vllm serve {your model directory} \
--host 0.0.0.0 \
--port 20002 \
--tensor-parallel-size 1 \
@ -232,7 +232,7 @@ python3 disagg_proxy_p2p_nccl_xpyd.py &
??? console "Command"
```shell
VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=2 vllm serve {your model directory} \
CUDA_VISIBLE_DEVICES=2 vllm serve {your model directory} \
--host 0.0.0.0 \
--port 20003 \
--tensor-parallel-size 1 \
@ -253,7 +253,7 @@ python3 disagg_proxy_p2p_nccl_xpyd.py &
??? console "Command"
```shell
VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=3 vllm serve {your model directory} \
CUDA_VISIBLE_DEVICES=3 vllm serve {your model directory} \
--host 0.0.0.0 \
--port 20004 \
--tensor-parallel-size 1 \

View File

@ -2,7 +2,7 @@
In vLLM's V1 architecture, `torch.compile` is enabled by default and is a critical part of the framework. This document gives a simple walk-through example to show how to understand the `torch.compile` usage.
Throughout the example, we will run a common Llama model using v1, and turn on debug level logging to show all the details. The command to be used is `VLLM_USE_V1=1 VLLM_LOGGING_LEVEL=DEBUG vllm serve meta-llama/Llama-3.2-1B`.
Throughout the example, we will run a common Llama model, and turn on debug level logging to show all the details. The command to be used is `VLLM_LOGGING_LEVEL=DEBUG vllm serve meta-llama/Llama-3.2-1B`.
## Compilation Cache