[Docs] Add supported quantization methods to docs (#2135)
This commit is contained in:
@ -30,6 +30,7 @@ vLLM is fast with:
|
||||
* State-of-the-art serving throughput
|
||||
* Efficient management of attention key and value memory with **PagedAttention**
|
||||
* Continuous batching of incoming requests
|
||||
* Quantization: `GPTQ <https://arxiv.org/abs/2210.17323>`_, `AWQ <https://arxiv.org/abs/2306.00978>`_, `SqueezeLLM <https://arxiv.org/abs/2306.07629>`_
|
||||
* Optimized CUDA kernels
|
||||
|
||||
vLLM is flexible and easy to use with:
|
||||
@ -39,7 +40,7 @@ vLLM is flexible and easy to use with:
|
||||
* Tensor parallelism support for distributed inference
|
||||
* Streaming outputs
|
||||
* OpenAI-compatible API server
|
||||
* Support NVIDIA CUDA and AMD ROCm.
|
||||
* Support NVIDIA GPUs and AMD GPUs.
|
||||
|
||||
For more information, check out the following:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user