[V0 Deprecation] Remove VLLM_USE_V1 from docs and scripts (#26336)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
@ -97,7 +97,7 @@ python3 disagg_proxy_p2p_nccl_xpyd.py &
|
||||
??? console "Command"
|
||||
|
||||
```shell
|
||||
VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=0 vllm serve {your model directory} \
|
||||
CUDA_VISIBLE_DEVICES=0 vllm serve {your model directory} \
|
||||
--host 0.0.0.0 \
|
||||
--port 20001 \
|
||||
--tensor-parallel-size 1 \
|
||||
@ -118,7 +118,7 @@ python3 disagg_proxy_p2p_nccl_xpyd.py &
|
||||
??? console "Command"
|
||||
|
||||
```shell
|
||||
VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=1 vllm serve {your model directory} \
|
||||
CUDA_VISIBLE_DEVICES=1 vllm serve {your model directory} \
|
||||
--host 0.0.0.0 \
|
||||
--port 20002 \
|
||||
--tensor-parallel-size 1 \
|
||||
@ -139,7 +139,7 @@ python3 disagg_proxy_p2p_nccl_xpyd.py &
|
||||
??? console "Command"
|
||||
|
||||
```shell
|
||||
VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=2 vllm serve {your model directory} \
|
||||
CUDA_VISIBLE_DEVICES=2 vllm serve {your model directory} \
|
||||
--host 0.0.0.0 \
|
||||
--port 20003 \
|
||||
--tensor-parallel-size 1 \
|
||||
@ -160,7 +160,7 @@ python3 disagg_proxy_p2p_nccl_xpyd.py &
|
||||
??? console "Command"
|
||||
|
||||
```shell
|
||||
VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=3 vllm serve {your model directory} \
|
||||
CUDA_VISIBLE_DEVICES=3 vllm serve {your model directory} \
|
||||
--host 0.0.0.0 \
|
||||
--port 20004 \
|
||||
--tensor-parallel-size 1 \
|
||||
@ -190,7 +190,7 @@ python3 disagg_proxy_p2p_nccl_xpyd.py &
|
||||
??? console "Command"
|
||||
|
||||
```shell
|
||||
VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=0 vllm serve {your model directory} \
|
||||
CUDA_VISIBLE_DEVICES=0 vllm serve {your model directory} \
|
||||
--host 0.0.0.0 \
|
||||
--port 20001 \
|
||||
--tensor-parallel-size 1 \
|
||||
@ -211,7 +211,7 @@ python3 disagg_proxy_p2p_nccl_xpyd.py &
|
||||
??? console "Command"
|
||||
|
||||
```shell
|
||||
VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=1 vllm serve {your model directory} \
|
||||
CUDA_VISIBLE_DEVICES=1 vllm serve {your model directory} \
|
||||
--host 0.0.0.0 \
|
||||
--port 20002 \
|
||||
--tensor-parallel-size 1 \
|
||||
@ -232,7 +232,7 @@ python3 disagg_proxy_p2p_nccl_xpyd.py &
|
||||
??? console "Command"
|
||||
|
||||
```shell
|
||||
VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=2 vllm serve {your model directory} \
|
||||
CUDA_VISIBLE_DEVICES=2 vllm serve {your model directory} \
|
||||
--host 0.0.0.0 \
|
||||
--port 20003 \
|
||||
--tensor-parallel-size 1 \
|
||||
@ -253,7 +253,7 @@ python3 disagg_proxy_p2p_nccl_xpyd.py &
|
||||
??? console "Command"
|
||||
|
||||
```shell
|
||||
VLLM_USE_V1=1 CUDA_VISIBLE_DEVICES=3 vllm serve {your model directory} \
|
||||
CUDA_VISIBLE_DEVICES=3 vllm serve {your model directory} \
|
||||
--host 0.0.0.0 \
|
||||
--port 20004 \
|
||||
--tensor-parallel-size 1 \
|
||||
|
||||
@ -2,7 +2,7 @@
|
||||
|
||||
In vLLM's V1 architecture, `torch.compile` is enabled by default and is a critical part of the framework. This document gives a simple walk-through example to show how to understand the `torch.compile` usage.
|
||||
|
||||
Throughout the example, we will run a common Llama model using v1, and turn on debug level logging to show all the details. The command to be used is `VLLM_USE_V1=1 VLLM_LOGGING_LEVEL=DEBUG vllm serve meta-llama/Llama-3.2-1B`.
|
||||
Throughout the example, we will run a common Llama model, and turn on debug level logging to show all the details. The command to be used is `VLLM_LOGGING_LEVEL=DEBUG vllm serve meta-llama/Llama-3.2-1B`.
|
||||
|
||||
## Compilation Cache
|
||||
|
||||
|
||||
Reference in New Issue
Block a user