[Docs] Add supported quantization methods to docs (#2135)

2023-12-15 13:29:22 -08:00
parent 0fbfc4b81b
commit b81a6a6bb3
2 changed files with 4 additions and 2 deletions
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@ -30,6 +30,7 @@ vLLM is fast with:
 * State-of-the-art serving throughput
 * Efficient management of attention key and value memory with **PagedAttention**
 * Continuous batching of incoming requests
+* Quantization: `GPTQ <https://arxiv.org/abs/2210.17323>`_, `AWQ <https://arxiv.org/abs/2306.00978>`_, `SqueezeLLM <https://arxiv.org/abs/2306.07629>`_
 * Optimized CUDA kernels

 vLLM is flexible and easy to use with:
@ -39,7 +40,7 @@ vLLM is flexible and easy to use with:
 * Tensor parallelism support for distributed inference
 * Streaming outputs
 * OpenAI-compatible API server
-* Support NVIDIA CUDA and AMD ROCm.
+* Support NVIDIA GPUs and AMD GPUs.

 For more information, check out the following: