Update deploying_with_k8s.rst (#10922)

This commit is contained in:
AlexHe99
2024-12-16 08:33:58 +08:00
committed by GitHub
parent 25ebed2f8c
commit da6f409246

View File

@ -162,7 +162,7 @@ To test the deployment, run the following ``curl`` command:
curl http://mistral-7b.default.svc.cluster.local/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "facebook/opt-125m",
"model": "mistralai/Mistral-7B-Instruct-v0.3",
"prompt": "San Francisco is a",
"max_tokens": 7,
"temperature": 0
@ -172,4 +172,4 @@ If the service is correctly deployed, you should receive a response from the vLL
Conclusion
----------
Deploying vLLM with Kubernetes allows for efficient scaling and management of ML models leveraging GPU resources. By following the steps outlined above, you should be able to set up and test a vLLM deployment within your Kubernetes cluster. If you encounter any issues or have suggestions, please feel free to contribute to the documentation.
Deploying vLLM with Kubernetes allows for efficient scaling and management of ML models leveraging GPU resources. By following the steps outlined above, you should be able to set up and test a vLLM deployment within your Kubernetes cluster. If you encounter any issues or have suggestions, please feel free to contribute to the documentation.