[Docs] Fix syntax highlighting of shell commands (#19870)

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-06-23 18:59:09 +01:00
parent 53243e5c42
commit c3649e4fee
53 changed files with 220 additions and 220 deletions
--- a/docs/getting_started/installation/aws_neuron.md
+++ b/docs/getting_started/installation/aws_neuron.md
@ -26,7 +26,7 @@ The easiest way to launch a Trainium or Inferentia instance with pre-installed N
 - After launching the instance, follow the instructions in [Connect to your instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html) to connect to the instance
 - Once inside your instance, activate the pre-installed virtual environment for inference by running

-```console
+```bash
 source /opt/aws_neuronx_venv_pytorch_2_6_nxd_inference/bin/activate
 ```

@ -47,7 +47,7 @@ Currently, there are no pre-built Neuron wheels.

 To build and install vLLM from source, run:

-```console
+```bash
 git clone https://github.com/vllm-project/vllm.git
 cd vllm
 pip install -U -r requirements/neuron.txt
@ -66,7 +66,7 @@ Refer to [vLLM User Guide for NxD Inference](https://awsdocs-neuron.readthedocs-

 To install the AWS Neuron fork, run the following:

-```console
+```bash
 git clone -b neuron-2.23-vllm-v0.7.2 https://github.com/aws-neuron/upstreaming-to-vllm.git
 cd upstreaming-to-vllm
 pip install -r requirements/neuron.txt
@ -100,7 +100,7 @@ to perform most of the heavy lifting which includes PyTorch model initialization
 To configure NxD Inference features through the vLLM entrypoint, use the `override_neuron_config` setting. Provide the configs you want to override
 as a dictionary (or JSON object when starting vLLM from the CLI). For example, to disable auto bucketing, include

-```console
+```python
 override_neuron_config={
    "enable_bucketing":False,
 }
@ -108,7 +108,7 @@ override_neuron_config={

 or when launching vLLM from the CLI, pass

-```console
+```bash
 --override-neuron-config "{\"enable_bucketing\":false}"
 ```

--- a/docs/getting_started/installation/cpu.md
+++ b/docs/getting_started/installation/cpu.md
@ -78,13 +78,13 @@ Currently, there are no pre-built CPU wheels.

 ??? Commands

-    ```console
-    $ docker build -f docker/Dockerfile.cpu \
+    ```bash
+    docker build -f docker/Dockerfile.cpu \
            --tag vllm-cpu-env \
            --target vllm-openai .

-    # Launching OpenAI server 
-    $ docker run --rm \
+    # Launching OpenAI server
+    docker run --rm \
                --privileged=true \
                --shm-size=4g \
                -p 8000:8000 \
@ -123,7 +123,7 @@ vLLM CPU backend supports the following vLLM features:

 - We highly recommend to use TCMalloc for high performance memory allocation and better cache locality. For example, on Ubuntu 22.4, you can run:

-```console
+```bash
 sudo apt-get install libtcmalloc-minimal4 # install TCMalloc library
 find / -name *libtcmalloc* # find the dynamic link library path
 export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc_minimal.so.4:$LD_PRELOAD # prepend the library to LD_PRELOAD
@ -132,7 +132,7 @@ python examples/offline_inference/basic/basic.py # run vLLM

 - When using the online serving, it is recommended to reserve 1-2 CPU cores for the serving framework to avoid CPU oversubscription. For example, on a platform with 32 physical CPU cores, reserving CPU 30 and 31 for the framework and using CPU 0-29 for OpenMP:

-```console
+```bash
 export VLLM_CPU_KVCACHE_SPACE=40
 export VLLM_CPU_OMP_THREADS_BIND=0-29
 vllm serve facebook/opt-125m
@ -140,7 +140,7 @@ vllm serve facebook/opt-125m

 or using default auto thread binding:

-```console
+```bash
 export VLLM_CPU_KVCACHE_SPACE=40
 export VLLM_CPU_NUM_OF_RESERVED_CPU=2
 vllm serve facebook/opt-125m
@ -189,7 +189,7 @@ vllm serve facebook/opt-125m

  - Tensor Parallel is supported for serving and offline inferencing. In general each NUMA node is treated as one GPU card. Below is the example script to enable Tensor Parallel = 2 for serving:

-    ```console
+    ```bash
    VLLM_CPU_KVCACHE_SPACE=40 VLLM_CPU_OMP_THREADS_BIND="0-31|32-63" \
        vllm serve meta-llama/Llama-2-7b-chat-hf \
        -tp=2 \
@ -198,7 +198,7 @@ vllm serve facebook/opt-125m

    or using default auto thread binding:

-    ```console
+    ```bash
    VLLM_CPU_KVCACHE_SPACE=40 \
        vllm serve meta-llama/Llama-2-7b-chat-hf \
        -tp=2 \
--- a/docs/getting_started/installation/cpu/apple.inc.md
+++ b/docs/getting_started/installation/cpu/apple.inc.md
@ -25,11 +25,11 @@ Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.

 After installation of XCode and the Command Line Tools, which include Apple Clang, execute the following commands to build and install vLLM from the source.

-```console
+```bash
 git clone https://github.com/vllm-project/vllm.git
 cd vllm
 pip install -r requirements/cpu.txt
-pip install -e . 
+pip install -e .
 ```

 !!! note
--- a/docs/getting_started/installation/cpu/build.inc.md
+++ b/docs/getting_started/installation/cpu/build.inc.md
@ -1,6 +1,6 @@
 First, install recommended compiler. We recommend to use `gcc/g++ >= 12.3.0` as the default compiler to avoid potential problems. For example, on Ubuntu 22.4, you can run:

-```console
+```bash
 sudo apt-get update  -y
 sudo apt-get install -y gcc-12 g++-12 libnuma-dev python3-dev
 sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 10 --slave /usr/bin/g++ g++ /usr/bin/g++-12
@ -8,14 +8,14 @@ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 10 --slave /

 Second, clone vLLM project:

-```console
+```bash
 git clone https://github.com/vllm-project/vllm.git vllm_source
 cd vllm_source
 ```

 Third, install Python packages for vLLM CPU backend building:

-```console
+```bash
 pip install --upgrade pip
 pip install "cmake>=3.26.1" wheel packaging ninja "setuptools-scm>=8" numpy
 pip install -v -r requirements/cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu
@ -23,13 +23,13 @@ pip install -v -r requirements/cpu.txt --extra-index-url https://download.pytorc

 Finally, build and install vLLM CPU backend:

-```console
+```bash
 VLLM_TARGET_DEVICE=cpu python setup.py install
 ```

 If you want to develop vllm, install it in editable mode instead.

-```console
+```bash
 VLLM_TARGET_DEVICE=cpu python setup.py develop
 ```

--- a/docs/getting_started/installation/cpu/s390x.inc.md
+++ b/docs/getting_started/installation/cpu/s390x.inc.md
@ -26,7 +26,7 @@ Currently the CPU implementation for s390x architecture supports FP32 datatype o

 Install the following packages from the package manager before building the vLLM. For example on RHEL 9.4:

-```console
+```bash
 dnf install -y \
    which procps findutils tar vim git gcc g++ make patch make cython zlib-devel \
    libjpeg-turbo-devel libtiff-devel libpng-devel libwebp-devel freetype-devel harfbuzz-devel \
@ -35,7 +35,7 @@ dnf install -y \

 Install rust>=1.80 which is needed for `outlines-core` and `uvloop` python packages installation.

-```console
+```bash
 curl https://sh.rustup.rs -sSf | sh -s -- -y && \
    . "$HOME/.cargo/env"
 ```
@ -45,7 +45,7 @@ Execute the following commands to build and install vLLM from the source.
 !!! tip
    Please build the following dependencies, `torchvision`, `pyarrow` from the source before building vLLM.

-```console
+```bash
    sed -i '/^torch/d' requirements-build.txt    # remove torch from requirements-build.txt since we use nightly builds
    pip install -v \
        --extra-index-url https://download.pytorch.org/whl/nightly/cpu \
--- a/docs/getting_started/installation/google_tpu.md
+++ b/docs/getting_started/installation/google_tpu.md
@ -68,7 +68,7 @@ For more information about using TPUs with GKE, see:

 Create a TPU v5e with 4 TPU chips:

-```console
+```bash
 gcloud alpha compute tpus queued-resources create QUEUED_RESOURCE_ID \
  --node-id TPU_NAME \
  --project PROJECT_ID \
@ -156,13 +156,13 @@ See [deployment-docker-pre-built-image][deployment-docker-pre-built-image] for i

 You can use <gh-file:docker/Dockerfile.tpu> to build a Docker image with TPU support.

-```console
+```bash
 docker build -f docker/Dockerfile.tpu -t vllm-tpu .
 ```

 Run the Docker image with the following command:

-```console
+```bash
 # Make sure to add `--privileged --net host --shm-size=16G`.
 docker run --privileged --net host --shm-size=16G -it vllm-tpu
 ```
@ -185,6 +185,6 @@ docker run --privileged --net host --shm-size=16G -it vllm-tpu

    Install OpenBLAS with the following command:

-    ```console
+    ```bash
    sudo apt-get install --no-install-recommends --yes libopenblas-base libopenmpi-dev libomp-dev
    ```
--- a/docs/getting_started/installation/gpu/cuda.inc.md
+++ b/docs/getting_started/installation/gpu/cuda.inc.md
@ -22,7 +22,7 @@ Therefore, it is recommended to install vLLM with a **fresh new** environment. I

 You can install vLLM using either `pip` or `uv pip`:

-```console
+```bash
 # Install vLLM with CUDA 12.8.
 # If you are using pip.
 pip install vllm --extra-index-url https://download.pytorch.org/whl/cu128
@ -37,7 +37,7 @@ We recommend leveraging `uv` to [automatically select the appropriate PyTorch in

 As of now, vLLM's binaries are compiled with CUDA 12.8 and public PyTorch release versions by default. We also provide vLLM binaries compiled with CUDA 12.6, 11.8, and public PyTorch release versions:

-```console
+```bash
 # Install vLLM with CUDA 11.8.
 export VLLM_VERSION=0.6.1.post1
 export PYTHON_VERSION=312
@ -52,7 +52,7 @@ LLM inference is a fast-evolving field, and the latest code may contain bug fixe

 ##### Install the latest code using `pip`

-```console
+```bash
 pip install -U vllm \
    --pre \
    --extra-index-url https://wheels.vllm.ai/nightly
@ -62,7 +62,7 @@ pip install -U vllm \

 Another way to install the latest code is to use `uv`:

-```console
+```bash
 uv pip install -U vllm \
    --torch-backend=auto \
    --extra-index-url https://wheels.vllm.ai/nightly
@ -72,7 +72,7 @@ uv pip install -U vllm \

 If you want to access the wheels for previous commits (e.g. to bisect the behavior change, performance regression), due to the limitation of `pip`, you have to specify the full URL of the wheel file by embedding the commit hash in the URL:

-```console
+```bash
 export VLLM_COMMIT=33f460b17a54acb3b6cc0b03f4a17876cff5eafd # use full commit hash from the main branch
 pip install https://wheels.vllm.ai/${VLLM_COMMIT}/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
 ```
@ -83,7 +83,7 @@ Note that the wheels are built with Python 3.8 ABI (see [PEP 425](https://peps.p

 If you want to access the wheels for previous commits (e.g. to bisect the behavior change, performance regression), you can specify the commit hash in the URL:

-```console
+```bash
 export VLLM_COMMIT=72d9c316d3f6ede485146fe5aabd4e61dbc59069 # use full commit hash from the main branch
 uv pip install vllm \
    --torch-backend=auto \
@ -99,7 +99,7 @@ The `uv` approach works for vLLM `v0.6.6` and later and offers an easy-to-rememb

 If you only need to change Python code, you can build and install vLLM without compilation. Using `pip`'s [`--editable` flag](https://pip.pypa.io/en/stable/topics/local-project-installs/#editable-installs), changes you make to the code will be reflected when you run vLLM:

-```console
+```bash
 git clone https://github.com/vllm-project/vllm.git
 cd vllm
 VLLM_USE_PRECOMPILED=1 pip install --editable .
@ -118,7 +118,7 @@ This command will do the following:

 In case you see an error about wheel not found when running the above command, it might be because the commit you based on in the main branch was just merged and the wheel is being built. In this case, you can wait for around an hour to try again, or manually assign the previous commit in the installation using the `VLLM_PRECOMPILED_WHEEL_LOCATION` environment variable.

-```console
+```bash
 export VLLM_COMMIT=72d9c316d3f6ede485146fe5aabd4e61dbc59069 # use full commit hash from the main branch
 export VLLM_PRECOMPILED_WHEEL_LOCATION=https://wheels.vllm.ai/${VLLM_COMMIT}/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
 pip install --editable .
@ -134,7 +134,7 @@ You can find more information about vLLM's wheels in [install-the-latest-code][i

 If you want to modify C++ or CUDA code, you'll need to build vLLM from source. This can take several minutes:

-```console
+```bash
 git clone https://github.com/vllm-project/vllm.git
 cd vllm
 pip install -e .
@ -160,7 +160,7 @@ There are scenarios where the PyTorch dependency cannot be easily installed via

 To build vLLM using an existing PyTorch installation:

-```console
+```bash
 git clone https://github.com/vllm-project/vllm.git
 cd vllm
 python use_existing_torch.py
@ -173,7 +173,7 @@ pip install --no-build-isolation -e .
 Currently, before starting the build process, vLLM fetches cutlass code from GitHub. However, there may be scenarios where you want to use a local version of cutlass instead.
 To achieve this, you can set the environment variable VLLM_CUTLASS_SRC_DIR to point to your local cutlass directory.

-```console
+```bash
 git clone https://github.com/vllm-project/vllm.git
 cd vllm
 VLLM_CUTLASS_SRC_DIR=/path/to/cutlass pip install -e .
@ -184,7 +184,7 @@ VLLM_CUTLASS_SRC_DIR=/path/to/cutlass pip install -e .
 To avoid your system being overloaded, you can limit the number of compilation jobs
 to be run simultaneously, via the environment variable `MAX_JOBS`. For example:

-```console
+```bash
 export MAX_JOBS=6
 pip install -e .
 ```
@ -194,7 +194,7 @@ A side effect is a much slower build process.

 Additionally, if you have trouble building vLLM, we recommend using the NVIDIA PyTorch Docker image.

-```console
+```bash
 # Use `--ipc=host` to make sure the shared memory is large enough.
 docker run \
    --gpus all \
@ -205,14 +205,14 @@ docker run \

 If you don't want to use docker, it is recommended to have a full installation of CUDA Toolkit. You can download and install it from [the official website](https://developer.nvidia.com/cuda-toolkit-archive). After installation, set the environment variable `CUDA_HOME` to the installation path of CUDA Toolkit, and make sure that the `nvcc` compiler is in your `PATH`, e.g.:

-```console
+```bash
 export CUDA_HOME=/usr/local/cuda
 export PATH="${CUDA_HOME}/bin:$PATH"
 ```

 Here is a sanity check to verify that the CUDA Toolkit is correctly installed:

-```console
+```bash
 nvcc --version # verify that nvcc is in your PATH
 ${CUDA_HOME}/bin/nvcc --version # verify that nvcc is in your CUDA_HOME
 ```
@ -223,7 +223,7 @@ vLLM can fully run only on Linux but for development purposes, you can still bui

 Simply disable the `VLLM_TARGET_DEVICE` environment variable before installing:

-```console
+```bash
 export VLLM_TARGET_DEVICE=empty
 pip install -e .
 ```
@ -238,7 +238,7 @@ See [deployment-docker-pre-built-image][deployment-docker-pre-built-image] for i

 Another way to access the latest code is to use the docker images:

-```console
+```bash
 export VLLM_COMMIT=33f460b17a54acb3b6cc0b03f4a17876cff5eafd # use full commit hash from the main branch
 docker pull public.ecr.aws/q9t5s3a7/vllm-ci-postmerge-repo:${VLLM_COMMIT}
 ```
--- a/docs/getting_started/installation/gpu/rocm.inc.md
+++ b/docs/getting_started/installation/gpu/rocm.inc.md
@ -31,17 +31,17 @@ Currently, there are no pre-built ROCm wheels.

    Alternatively, you can install PyTorch using PyTorch wheels. You can check PyTorch installation guide in PyTorch [Getting Started](https://pytorch.org/get-started/locally/). Example:

-    ```console
+    ```bash
    # Install PyTorch
-    $ pip uninstall torch -y
-    $ pip install --no-cache-dir --pre torch --index-url https://download.pytorch.org/whl/nightly/rocm6.3
+    pip uninstall torch -y
+    pip install --no-cache-dir --pre torch --index-url https://download.pytorch.org/whl/nightly/rocm6.3
    ```

 1. Install [Triton flash attention for ROCm](https://github.com/ROCm/triton)

    Install ROCm's Triton flash attention (the default triton-mlir branch) following the instructions from [ROCm/triton](https://github.com/ROCm/triton/blob/triton-mlir/README.md)

-    ```console
+    ```bash
    python3 -m pip install ninja cmake wheel pybind11
    pip uninstall -y triton
    git clone https://github.com/OpenAI/triton.git
@ -62,7 +62,7 @@ Currently, there are no pre-built ROCm wheels.

    For example, for ROCm 6.3, suppose your gfx arch is `gfx90a`. To get your gfx architecture, run `rocminfo |grep gfx`.

-    ```console
+    ```bash
    git clone https://github.com/ROCm/flash-attention.git
    cd flash-attention
    git checkout b7d29fb
@ -76,7 +76,7 @@ Currently, there are no pre-built ROCm wheels.

 3. If you choose to build AITER yourself to use a certain branch or commit, you can build AITER using the following steps:

-    ```console
+    ```bash
    python3 -m pip uninstall -y aiter
    git clone --recursive https://github.com/ROCm/aiter.git
    cd aiter
@ -148,7 +148,7 @@ If you choose to build this rocm_base image yourself, the steps are as follows.

 It is important that the user kicks off the docker build using buildkit. Either the user put DOCKER_BUILDKIT=1 as environment variable when calling docker build command, or the user needs to setup buildkit in the docker daemon configuration /etc/docker/daemon.json as follows and restart the daemon:

-```console
+```json
 {
    "features": {
        "buildkit": true
@ -158,7 +158,7 @@ It is important that the user kicks off the docker build using buildkit. Either

 To build vllm on ROCm 6.3 for MI200 and MI300 series, you can use the default:

-```console
+```bash
 DOCKER_BUILDKIT=1 docker build \
    -f docker/Dockerfile.rocm_base \
    -t rocm/vllm-dev:base .
@ -169,7 +169,7 @@ DOCKER_BUILDKIT=1 docker build \
 First, build a docker image from <gh-file:docker/Dockerfile.rocm> and launch a docker container from the image.
 It is important that the user kicks off the docker build using buildkit. Either the user put `DOCKER_BUILDKIT=1` as environment variable when calling docker build command, or the user needs to setup buildkit in the docker daemon configuration /etc/docker/daemon.json as follows and restart the daemon:

-```console
+```bash
 {
    "features": {
        "buildkit": true
@ -187,13 +187,13 @@ Their values can be passed in when running `docker build` with `--build-arg` opt

 To build vllm on ROCm 6.3 for MI200 and MI300 series, you can use the default:

-```console
+```bash
 DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile.rocm -t vllm-rocm .
 ```

 To build vllm on ROCm 6.3 for Radeon RX7900 series (gfx1100), you should pick the alternative base image:

-```console
+```bash
 DOCKER_BUILDKIT=1 docker build \
    --build-arg BASE_IMAGE="rocm/vllm-dev:navi_base" \
    -f docker/Dockerfile.rocm \
@ -205,7 +205,7 @@ To run the above docker image `vllm-rocm`, use the below command:

 ??? Command

-    ```console
+    ```bash
    docker run -it \
    --network=host \
    --group-add=video \
--- a/docs/getting_started/installation/gpu/xpu.inc.md
+++ b/docs/getting_started/installation/gpu/xpu.inc.md
@ -25,7 +25,7 @@ Currently, there are no pre-built XPU wheels.
 - First, install required driver and Intel OneAPI 2025.0 or later.
 - Second, install Python packages for vLLM XPU backend building:

-```console
+```bash
 git clone https://github.com/vllm-project/vllm.git
 cd vllm
 pip install --upgrade pip
@ -34,7 +34,7 @@ pip install -v -r requirements/xpu.txt

 - Then, build and install vLLM XPU backend:

-```console
+```bash
 VLLM_TARGET_DEVICE=xpu python setup.py install
 ```

@ -53,9 +53,9 @@ Currently, there are no pre-built XPU images.
 # --8<-- [end:pre-built-images]
 # --8<-- [start:build-image-from-source]

-```console
-$ docker build -f docker/Dockerfile.xpu -t vllm-xpu-env --shm-size=4g .
-$ docker run -it \
+```bash
+docker build -f docker/Dockerfile.xpu -t vllm-xpu-env --shm-size=4g .
+docker run -it \
             --rm \
             --network=host \
             --device /dev/dri \
@ -68,7 +68,7 @@ $ docker run -it \

 XPU platform supports **tensor parallel** inference/serving and also supports **pipeline parallel** as a beta feature for online serving. We require Ray as the distributed runtime backend. For example, a reference execution like following:

-```console
+```bash
 python -m vllm.entrypoints.openai.api_server \
     --model=facebook/opt-13b \
     --dtype=bfloat16 \
--- a/docs/getting_started/installation/intel_gaudi.md
+++ b/docs/getting_started/installation/intel_gaudi.md
@ -24,7 +24,7 @@ please follow the methods outlined in the

 To verify that the Intel Gaudi software was correctly installed, run:

-```console
+```bash
 hl-smi # verify that hl-smi is in your PATH and each Gaudi accelerator is visible
 apt list --installed | grep habana # verify that habanalabs-firmware-tools, habanalabs-graph, habanalabs-rdma-core, habanalabs-thunk and habanalabs-container-runtime are installed
 pip list | grep habana # verify that habana-torch-plugin, habana-torch-dataloader, habana-pyhlml and habana-media-loader are installed
@ -42,7 +42,7 @@ for more details.

 Use the following commands to run a Docker image:

-```console
+```bash
 docker pull vault.habana.ai/gaudi-docker/1.18.0/ubuntu22.04/habanalabs/pytorch-installer-2.4.0:latest
 docker run \
  -it \
@ -65,7 +65,7 @@ Currently, there are no pre-built Intel Gaudi wheels.

 To build and install vLLM from source, run:

-```console
+```bash
 git clone https://github.com/vllm-project/vllm.git
 cd vllm
 pip install -r requirements/hpu.txt
@ -74,7 +74,7 @@ python setup.py develop

 Currently, the latest features and performance optimizations are developed in Gaudi's [vLLM-fork](https://github.com/HabanaAI/vllm-fork) and we periodically upstream them to vLLM main repo. To install latest [HabanaAI/vLLM-fork](https://github.com/HabanaAI/vllm-fork), run the following:

-```console
+```bash
 git clone https://github.com/HabanaAI/vllm-fork.git
 cd vllm-fork
 git checkout habana_main
@ -90,7 +90,7 @@ Currently, there are no pre-built Intel Gaudi images.

 ### Build image from source

-```console
+```bash
 docker build -f docker/Dockerfile.hpu -t vllm-hpu-env  .
 docker run \
  -it \
--- a/docs/getting_started/installation/python_env_setup.inc.md
+++ b/docs/getting_started/installation/python_env_setup.inc.md
@ -1,6 +1,6 @@
 It's recommended to use [uv](https://docs.astral.sh/uv/), a very fast Python environment manager, to create and manage Python environments. Please follow the [documentation](https://docs.astral.sh/uv/#getting-started) to install `uv`. After installing `uv`, you can create a new Python environment and install vLLM using the following commands:

-```console
+```bash
 uv venv --python 3.12 --seed
 source .venv/bin/activate
 ```