[Docs] Add CUDA graph support to docs (#2148)

This commit is contained in:
Woosuk Kwon
2023-12-17 01:49:20 -08:00
committed by GitHub
parent c3372e87be
commit 26c52a5ea6
2 changed files with 4 additions and 2 deletions

View File

@ -30,6 +30,7 @@ vLLM is fast with:
* State-of-the-art serving throughput
* Efficient management of attention key and value memory with **PagedAttention**
* Continuous batching of incoming requests
* Fast model execution with CUDA/HIP graph
* Quantization: `GPTQ <https://arxiv.org/abs/2210.17323>`_, `AWQ <https://arxiv.org/abs/2306.00978>`_, `SqueezeLLM <https://arxiv.org/abs/2306.07629>`_
* Optimized CUDA kernels
@ -40,7 +41,7 @@ vLLM is flexible and easy to use with:
* Tensor parallelism support for distributed inference
* Streaming outputs
* OpenAI-compatible API server
* Support NVIDIA GPUs and AMD GPUs.
* Support NVIDIA GPUs and AMD GPUs
For more information, check out the following: