[Docs] Fix syntax highlighting of shell commands (#19870)

Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
This commit is contained in:
Lukas Geiger
2025-06-23 18:59:09 +01:00
committed by GitHub
parent 53243e5c42
commit c3649e4fee
53 changed files with 220 additions and 220 deletions

View File

@ -10,7 +10,7 @@ title: Using Docker
vLLM offers an official Docker image for deployment.
The image can be used to run OpenAI compatible server and is available on Docker Hub as [vllm/vllm-openai](https://hub.docker.com/r/vllm/vllm-openai/tags).
```console
```bash
docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \
@ -22,7 +22,7 @@ docker run --runtime nvidia --gpus all \
This image can also be used with other container engines such as [Podman](https://podman.io/).
```console
```bash
podman run --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
@ -71,7 +71,7 @@ You can add any other [engine-args][engine-args] you need after the image tag (`
You can build and run vLLM from source via the provided <gh-file:docker/Dockerfile>. To build vLLM:
```console
```bash
# optionally specifies: --build-arg max_jobs=8 --build-arg nvcc_threads=2
DOCKER_BUILDKIT=1 docker build . \
--target vllm-openai \
@ -99,7 +99,7 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--
??? Command
```console
```bash
# Example of building on Nvidia GH200 server. (Memory usage: ~15GB, Build time: ~1475s / ~25 min, Image size: 6.93GB)
python3 use_existing_torch.py
DOCKER_BUILDKIT=1 docker build . \
@ -118,7 +118,7 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--
Run the following command on your host machine to register QEMU user static handlers:
```console
```bash
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
```
@ -128,7 +128,7 @@ of PyTorch Nightly and should be considered **experimental**. Using the flag `--
To run vLLM with the custom-built Docker image:
```console
```bash
docker run --runtime nvidia --gpus all \
-v ~/.cache/huggingface:/root/.cache/huggingface \
-p 8000:8000 \

View File

@ -15,7 +15,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac
- Start the vLLM server with the supported chat completion model, e.g.
```console
```bash
vllm serve Qwen/Qwen1.5-32B-Chat-AWQ --max-model-len 4096
```

View File

@ -11,7 +11,7 @@ title: AutoGen
- Setup [AutoGen](https://microsoft.github.io/autogen/0.2/docs/installation/) environment
```console
```bash
pip install vllm
# Install AgentChat and OpenAI client from Extensions
@ -23,7 +23,7 @@ pip install -U "autogen-agentchat" "autogen-ext[openai]"
- Start the vLLM server with the supported chat completion model, e.g.
```console
```bash
python -m vllm.entrypoints.openai.api_server \
--model mistralai/Mistral-7B-Instruct-v0.2
```

View File

@ -11,14 +11,14 @@ vLLM can be run on a cloud based GPU machine with [Cerebrium](https://www.cerebr
To install the Cerebrium client, run:
```console
```bash
pip install cerebrium
cerebrium login
```
Next, create your Cerebrium project, run:
```console
```bash
cerebrium init vllm-project
```
@ -58,7 +58,7 @@ Next, let us add our code to handle inference for the LLM of your choice (`mistr
Then, run the following code to deploy it to the cloud:
```console
```bash
cerebrium deploy
```

View File

@ -15,7 +15,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac
- Start the vLLM server with the supported chat completion model, e.g.
```console
```bash
vllm serve qwen/Qwen1.5-0.5B-Chat
```

View File

@ -18,13 +18,13 @@ This guide walks you through deploying Dify using a vLLM backend.
- Start the vLLM server with the supported chat completion model, e.g.
```console
```bash
vllm serve Qwen/Qwen1.5-7B-Chat
```
- Start the Dify server with docker compose ([details](https://github.com/langgenius/dify?tab=readme-ov-file#quick-start)):
```console
```bash
git clone https://github.com/langgenius/dify.git
cd dify
cd docker

View File

@ -11,14 +11,14 @@ vLLM can be run on a cloud based GPU machine with [dstack](https://dstack.ai/),
To install dstack client, run:
```console
```bash
pip install "dstack[all]
dstack server
```
Next, to configure your dstack project, run:
```console
```bash
mkdir -p vllm-dstack
cd vllm-dstack
dstack init

View File

@ -13,7 +13,7 @@ It allows you to deploy a large language model (LLM) server with vLLM as the bac
- Setup vLLM and Haystack environment
```console
```bash
pip install vllm haystack-ai
```
@ -21,7 +21,7 @@ pip install vllm haystack-ai
- Start the vLLM server with the supported chat completion model, e.g.
```console
```bash
vllm serve mistralai/Mistral-7B-Instruct-v0.1
```

View File

@ -22,7 +22,7 @@ Before you begin, ensure that you have the following:
To install the chart with the release name `test-vllm`:
```console
```bash
helm upgrade --install --create-namespace --namespace=ns-vllm test-vllm . -f values.yaml --set secrets.s3endpoint=$ACCESS_POINT --set secrets.s3bucketname=$BUCKET --set secrets.s3accesskeyid=$ACCESS_KEY --set secrets.s3accesskey=$SECRET_KEY
```
@ -30,7 +30,7 @@ helm upgrade --install --create-namespace --namespace=ns-vllm test-vllm . -f val
To uninstall the `test-vllm` deployment:
```console
```bash
helm uninstall test-vllm --namespace=ns-vllm
```

View File

@ -18,7 +18,7 @@ And LiteLLM supports all models on VLLM.
- Setup vLLM and litellm environment
```console
```bash
pip install vllm litellm
```
@ -28,7 +28,7 @@ pip install vllm litellm
- Start the vLLM server with the supported chat completion model, e.g.
```console
```bash
vllm serve qwen/Qwen1.5-0.5B-Chat
```
@ -56,7 +56,7 @@ vllm serve qwen/Qwen1.5-0.5B-Chat
- Start the vLLM server with the supported embedding model, e.g.
```console
```bash
vllm serve BAAI/bge-base-en-v1.5
```

View File

@ -7,13 +7,13 @@ title: Open WebUI
2. Start the vLLM server with the supported chat completion model, e.g.
```console
```bash
vllm serve qwen/Qwen1.5-0.5B-Chat
```
1. Start the [Open WebUI](https://github.com/open-webui/open-webui) docker container (replace the vllm serve host and vllm serve port):
```console
```bash
docker run -d -p 3000:8080 \
--name open-webui \
-v open-webui:/app/backend/data \

View File

@ -15,7 +15,7 @@ Here are the integrations:
- Setup vLLM and langchain environment
```console
```bash
pip install -U vllm \
langchain_milvus langchain_openai \
langchain_community beautifulsoup4 \
@ -26,14 +26,14 @@ pip install -U vllm \
- Start the vLLM server with the supported embedding model, e.g.
```console
```bash
# Start embedding service (port 8000)
vllm serve ssmits/Qwen2-7B-Instruct-embed-base
```
- Start the vLLM server with the supported chat completion model, e.g.
```console
```bash
# Start chat service (port 8001)
vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
```
@ -52,7 +52,7 @@ python retrieval_augmented_generation_with_langchain.py
- Setup vLLM and llamaindex environment
```console
```bash
pip install vllm \
llama-index llama-index-readers-web \
llama-index-llms-openai-like \
@ -64,14 +64,14 @@ pip install vllm \
- Start the vLLM server with the supported embedding model, e.g.
```console
```bash
# Start embedding service (port 8000)
vllm serve ssmits/Qwen2-7B-Instruct-embed-base
```
- Start the vLLM server with the supported chat completion model, e.g.
```console
```bash
# Start chat service (port 8001)
vllm serve qwen/Qwen1.5-0.5B-Chat --port 8001
```

View File

@ -15,7 +15,7 @@ vLLM can be **run and scaled to multiple service replicas on clouds and Kubernet
- Check that you have installed SkyPilot ([docs](https://skypilot.readthedocs.io/en/latest/getting-started/installation.html)).
- Check that `sky check` shows clouds or Kubernetes are enabled.
```console
```bash
pip install skypilot-nightly
sky check
```
@ -71,7 +71,7 @@ See the vLLM SkyPilot YAML for serving, [serving.yaml](https://github.com/skypil
Start the serving the Llama-3 8B model on any of the candidate GPUs listed (L4, A10g, ...):
```console
```bash
HF_TOKEN="your-huggingface-token" sky launch serving.yaml --env HF_TOKEN
```
@ -83,7 +83,7 @@ Check the output of the command. There will be a shareable gradio link (like the
**Optional**: Serve the 70B model instead of the default 8B and use more GPU:
```console
```bash
HF_TOKEN="your-huggingface-token" \
sky launch serving.yaml \
--gpus A100:8 \
@ -159,7 +159,7 @@ SkyPilot can scale up the service to multiple service replicas with built-in aut
Start the serving the Llama-3 8B model on multiple replicas:
```console
```bash
HF_TOKEN="your-huggingface-token" \
sky serve up -n vllm serving.yaml \
--env HF_TOKEN
@ -167,7 +167,7 @@ HF_TOKEN="your-huggingface-token" \
Wait until the service is ready:
```console
```bash
watch -n10 sky serve status vllm
```
@ -271,13 +271,13 @@ This will scale the service up to when the QPS exceeds 2 for each replica.
To update the service with the new config:
```console
```bash
HF_TOKEN="your-huggingface-token" sky serve update vllm serving.yaml --env HF_TOKEN
```
To stop the service:
```console
```bash
sky serve down vllm
```
@ -317,7 +317,7 @@ It is also possible to access the Llama-3 service with a separate GUI frontend,
1. Start the chat web UI:
```console
```bash
sky launch \
-c gui ./gui.yaml \
--env ENDPOINT=$(sky serve status --endpoint vllm)

View File

@ -15,13 +15,13 @@ It can be quickly integrated with vLLM as a backend API server, enabling powerfu
- Start the vLLM server with the supported chat completion model, e.g.
```console
```bash
vllm serve qwen/Qwen1.5-0.5B-Chat
```
- Install streamlit and openai:
```console
```bash
pip install streamlit openai
```
@ -29,7 +29,7 @@ pip install streamlit openai
- Start the streamlit web UI and start to chat:
```console
```bash
streamlit run streamlit_openai_chatbot_webserver.py
# or specify the VLLM_API_BASE or VLLM_API_KEY

View File

@ -7,7 +7,7 @@ vLLM is also available via [Llama Stack](https://github.com/meta-llama/llama-sta
To install Llama Stack, run
```console
```bash
pip install llama-stack -q
```

View File

@ -115,7 +115,7 @@ Next, start the vLLM server as a Kubernetes Deployment and Service:
We can verify that the vLLM server has started successfully via the logs (this might take a couple of minutes to download the model):
```console
```bash
kubectl logs -l app.kubernetes.io/name=vllm
...
INFO: Started server process [1]
@ -358,14 +358,14 @@ INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
Apply the deployment and service configurations using `kubectl apply -f <filename>`:
```console
```bash
kubectl apply -f deployment.yaml
kubectl apply -f service.yaml
```
To test the deployment, run the following `curl` command:
```console
```bash
curl http://mistral-7b.default.svc.cluster.local/v1/completions \
-H "Content-Type: application/json" \
-d '{

View File

@ -11,13 +11,13 @@ This document shows how to launch multiple vLLM serving containers and use Nginx
This guide assumes that you have just cloned the vLLM project and you're currently in the vllm root directory.
```console
```bash
export vllm_root=`pwd`
```
Create a file named `Dockerfile.nginx`:
```console
```dockerfile
FROM nginx:latest
RUN rm /etc/nginx/conf.d/default.conf
EXPOSE 80
@ -26,7 +26,7 @@ CMD ["nginx", "-g", "daemon off;"]
Build the container:
```console
```bash
docker build . -f Dockerfile.nginx --tag nginx-lb
```
@ -60,14 +60,14 @@ Create a file named `nginx_conf/nginx.conf`. Note that you can add as many serve
## Build vLLM Container
```console
```bash
cd $vllm_root
docker build -f docker/Dockerfile . --tag vllm
```
If you are behind proxy, you can pass the proxy settings to the docker build command as shown below:
```console
```bash
cd $vllm_root
docker build \
-f docker/Dockerfile . \
@ -80,7 +80,7 @@ docker build \
## Create Docker Network
```console
```bash
docker network create vllm_nginx
```
@ -129,7 +129,7 @@ Notes:
## Launch Nginx
```console
```bash
docker run \
-itd \
-p 8000:80 \
@ -142,7 +142,7 @@ docker run \
## Verify That vLLM Servers Are Ready
```console
```bash
docker logs vllm0 | grep Uvicorn
docker logs vllm1 | grep Uvicorn
```