[Docs] Fix syntax highlighting of shell commands (#19870)

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-06-23 18:59:09 +01:00
parent 53243e5c42
commit c3649e4fee
53 changed files with 220 additions and 220 deletions
--- a/docs/serving/distributed_serving.md
+++ b/docs/serving/distributed_serving.md
@ -34,15 +34,15 @@ output = llm.generate("San Francisco is a")

 To run multi-GPU serving, pass in the `--tensor-parallel-size` argument when starting the server. For example, to run API server on 4 GPUs:

-```console
- vllm serve facebook/opt-13b \
+```bash
+vllm serve facebook/opt-13b \
     --tensor-parallel-size 4
 ```

 You can also additionally specify `--pipeline-parallel-size` to enable pipeline parallelism. For example, to run API server on 8 GPUs with pipeline parallelism and tensor parallelism:

-```console
- vllm serve gpt2 \
+```bash
+vllm serve gpt2 \
     --tensor-parallel-size 4 \
     --pipeline-parallel-size 2
 ```
@ -55,7 +55,7 @@ The first step, is to start containers and organize them into a cluster. We have

 Pick a node as the head node, and run the following command:

-```console
+```bash
 bash run_cluster.sh \
                vllm/vllm-openai \
                ip_of_head_node \
@ -66,7 +66,7 @@ bash run_cluster.sh \

 On the rest of the worker nodes, run the following command:

-```console
+```bash
 bash run_cluster.sh \
                vllm/vllm-openai \
                ip_of_head_node \
@ -87,7 +87,7 @@ Then, on any node, use `docker exec -it node /bin/bash` to enter the container,

 After that, on any node, use `docker exec -it node /bin/bash` to enter the container again. **In the container**, you can use vLLM as usual, just as you have all the GPUs on one node: vLLM will be able to leverage GPU resources of all nodes in the Ray cluster, and therefore, only run the `vllm` command on this node but not other nodes. The common practice is to set the tensor parallel size to the number of GPUs in each node, and the pipeline parallel size to the number of nodes. For example, if you have 16 GPUs in 2 nodes (8 GPUs per node), you can set the tensor parallel size to 8 and the pipeline parallel size to 2:

-```console
+```bash
 vllm serve /path/to/the/model/in/the/container \
     --tensor-parallel-size 8 \
     --pipeline-parallel-size 2
@ -95,7 +95,7 @@ After that, on any node, use `docker exec -it node /bin/bash` to enter the conta

 You can also use tensor parallel without pipeline parallel, just set the tensor parallel size to the number of GPUs in the cluster. For example, if you have 16 GPUs in 2 nodes (8 GPUs per node), you can set the tensor parallel size to 16:

-```console
+```bash
 vllm serve /path/to/the/model/in/the/container \
     --tensor-parallel-size 16
 ```
--- a/docs/serving/integrations/langchain.md
+++ b/docs/serving/integrations/langchain.md
@ -7,7 +7,7 @@ vLLM is also available via [LangChain](https://github.com/langchain-ai/langchain

 To install LangChain, run

-```console
+```bash
 pip install langchain langchain_community -q
 ```

--- a/docs/serving/integrations/llamaindex.md
+++ b/docs/serving/integrations/llamaindex.md
@ -7,7 +7,7 @@ vLLM is also available via [LlamaIndex](https://github.com/run-llama/llama_index

 To install LlamaIndex, run

-```console
+```bash
 pip install llama-index-llms-vllm -q
 ```