Files

Cyrus Leung 8ceffbf315 [Doc][3/N] Reorganize Serving section (#11766 )

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

2025-01-07 11:20:01 +08:00

487 B

Raw Blame History

(deployment-lws)=

LWS

LeaderWorkerSet (LWS) is a Kubernetes API that aims to address common deployment patterns of AI/ML inference workloads. A major use case is for multi-host/multi-node distributed inference.

vLLM can be deployed with LWS on Kubernetes for distributed model serving.

Please see this guide for more details on deploying vLLM on Kubernetes using LWS.

487 B Raw Blame History

LWS

487 B

Raw Blame History