vllm/fp8 at 035fd2bd2cd2fb70f5834f5ca6c2ea30cdae9187 - vllm - Gitea: Git with a cup of tea

youngkingdom/vllm

Files

History

Aidyn-A bfe9380161 Apply fixes for CUDA 13 (#24599 )

Signed-off-by: Aidyn-A <aidyn.b.aitzhan@gmail.com>

2025-09-17 09:15:42 -04:00

..

[MISC] Remove unused variableds in C++ (#19609 )

2025-06-15 20:05:28 -07:00

[Perf] Use NVIDIA hardware-accelerated instruction for float to fp8_e4m3 quantization (#24757 )

2025-09-13 00:16:24 -07:00

common.cu

Apply fixes for CUDA 13 (#24599 )

2025-09-17 09:15:42 -04:00

common.cuh

[Perf] Use NVIDIA hardware-accelerated instruction for float to fp8_e4m3 quantization (#24757 )

2025-09-13 00:16:24 -07:00

per_token_group_quant.cu

[Perf] Using __nv_fp8_e4m3 instead of c10::e4m3 for per_token_group_quant (#21867 )

2025-07-29 21:50:46 -06:00