vllm/quantization at 27a09dc52c8317b531b6d2b862198a8a0d2a88eb - vllm - Gitea: Git with a cup of tea

youngkingdom/vllm

Files

History

Kaixi Hou 27a09dc52c [NVIDIA] Fix an issue to use current stream for the nvfp4 quant (#13632 )

2025-02-20 22:01:48 -08:00

..

[Kernel] fix types used in aqlm and ggml kernels to support dynamo (#7596 )

2024-08-16 14:00:11 -07:00

[Kernel] Fix awq error when n is not divisable by 128 (#13227 )

2025-02-13 20:07:05 -08:00

compressed_tensors

[MISC] Replace c10::optional with std::optional (#11730 )

2025-01-05 10:20:34 +09:00

[Kernel][Bugfix] Refactor and Fix CUTLASS 2:4 Sparse Kernels (#13198 )

2025-02-14 00:01:14 +00:00

[NVIDIA] Fix an issue to use current stream for the nvfp4 quant (#13632 )

2025-02-20 22:01:48 -08:00

[ROCm] MI300A compile targets deprecation (#13560 )

2025-02-19 23:05:00 -08:00

[torch.compile] Dynamic fp8 + rms_norm fusion (#10906 )

2024-12-13 03:19:23 +00:00

[AMD] Add support for GGUF quantization on ROCm (#10254 )

2024-11-22 21:14:49 -08:00

[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )

2024-06-09 16:23:30 -04:00

Update pre-commit hooks (#12475 )

2025-01-27 17:23:08 -07:00

[CI/Build] Auto-fix Markdown files (#12941 )

2025-02-08 04:25:15 -08:00

Update pre-commit hooks (#12475 )

2025-01-27 17:23:08 -07:00

vectorization.cuh

[torch.compile] Dynamic fp8 + rms_norm fusion (#10906 )

2024-12-13 03:19:23 +00:00