Commit Graph

69 Commits

Author SHA1 Message Date
e757a629e7 [Bug] Fix Cutlass Scaled MM Compilation Error (#24887)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-15 17:21:17 -04:00
c3aea10dc8 [Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-11 15:43:14 -07:00
b17109beea [Kernel] CUTLASS MoE FP8: Integrate cuda moe permute/unpermute (#23045)
Signed-off-by: Shixian Cui <shixian@amazon.com>
2025-08-20 10:35:26 -04:00
7f89ed248f [Fix] enable swap_ab for pplx problem size computation (#22991)
Signed-off-by: Shixian Cui <shixian@amazon.com>
Co-authored-by: Shixian Cui <shixian@amazon.com>
2025-08-15 14:02:12 -07:00
3303f134e0 [Kernel] Add support for block FP8 on SM120 (NVIDIA 5090 and RTX PRO 6000) (#22131)
Signed-off-by: Junhao Li <junhao@ubicloud.com>
2025-08-07 19:18:28 -07:00
ba5c5e5404 [Docs] Switch to better markdown linting pre-commit hook (#21851)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-29 19:45:08 -07:00
c6c9122d50 [Kernel] SM90 CUTLASS FP8 GEMM: add support for swap AB + kernel tuning (#20396)
Signed-off-by: Faqin Zhong <faqin.zhong@gmail.com>
Co-authored-by: Duncan Moss <djm.moss@gmail.com>
2025-07-28 23:13:58 +00:00
e8cb0d0495 [Bug] Fix Compressed Tensor NVFP4 cutlass_fp4_group_mm illegal memory access (#21465)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-24 08:13:24 -07:00
2c8db17cfd [feat]: add SM100 support for cutlass FP8 groupGEMM (#20447)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-22 07:27:12 -07:00
5780121c95 [Perf] Add swap_ab to SM90 FP8 non-block CUTLASS moe grouped gemm (#20911)
Signed-off-by: Shixian Cui <shixian@amazon.com>
Co-authored-by: Shixian Cui <shixian@amazon.com>
2025-07-18 04:34:43 +00:00
5923ab9524 [fix]: disable cutlass block scaled group gemm for EP (#20781)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
2025-07-11 02:39:18 +00:00
c438183e99 [Bugfix] Fix topk_ids indices_type for CUTLASS w8a8 FP8 MoE (#20166)
Signed-off-by: Ming Yang <yming@meta.com>
2025-07-08 23:10:57 +00:00
40b86aa05e [BugFix] Fix: ImportError when building on hopper systems (#20513)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-07-06 12:17:30 +08:00
3d184b95b8 [feat]: CUTLASS block scaled group gemm for SM100 (#19757)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Co-authored-by: Duncan Moss <dmoss@nvidia.com>
2025-07-04 12:58:04 -06:00
9e5552aa13 [NVIDIA] Support Cutlass w8a8 FP8 for Blackwell Geforce GPUs (sm120) (#17280)
Signed-off-by: kaln27 <liaojuncheng123@foxmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-02 06:47:19 -06:00
44d2e6af63 [Bugfix] Build moe_data for both sm100 and sm90 (#20086)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-26 20:50:12 -07:00
2d7779f888 [Perf] SM100 FP8 GEMM Optimizations after cutlass_profiler (#20071)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-06-26 20:50:09 -07:00
e13945f9dd [Perf] Further tunings for SM100 FP8 CUTLASS kernel (#19566) 2025-06-14 17:25:10 -07:00
84166fee97 [Kernel] Integrate CUTLASS MoE kernel with PPLX (#18762)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-06-06 18:26:11 -07:00
61059bee40 [Hardware][NVIDIA] FP4 MoE kernel optimization (#19110)
Signed-off-by: Chiyue Wei <chiyuew@nvidia.com>
Co-authored-by: Chiyue Wei <chiyuew@nvidia.com>
2025-06-05 09:48:26 -07:00
53a5a0ce30 [Perf] Tunings for SM100 FP8 CUTLASS kernel (#18778)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-04 10:46:28 -07:00
5f2cd251d2 Sm100 blockwise fp8 swap ab (#18564) 2025-06-04 07:48:45 -07:00
6e588da0f4 [Build/CI] Fix CUDA 11.8 build (#17679)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-05-22 12:13:54 -07:00
e23564cb70 use ceil_div in cutlass block scaling shape check (#17918) 2025-05-16 03:02:58 -07:00
e57e4d6e9e Fix Broken macro for cutlass moe (#18049)
Signed-off-by: drisspg <drisspguessous@gmail.com>
2025-05-12 23:31:06 -07:00
0c0fdae84f [Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model (#16362) 2025-05-09 16:24:41 -07:00
376786fac1 Add cutlass support for blackwell fp8 blockwise gemm (#14383)
Signed-off-by: Shu Wang <shuw@nvidia.com>
2025-05-08 15:09:55 -07:00
c12df53b60 [Bugfix] Fix cutlass dispatch for fp8/int8 to properly invoke M<=16 c… (#16751)
Signed-off-by: Ther-LF <2639852836@qq.com>
2025-04-27 19:38:42 -07:00
7b8a2ab76f [Kernel] Add expert_map support to Cutlass FP8 MOE (#16861)
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>
2025-04-21 20:44:32 -07:00
9239bf718e [Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com>
2025-03-27 00:54:44 +00:00
7caff01a7b [Build/BugFix] Fix hopper 12.8 build (#14354)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-03-08 08:11:56 +00:00
f89978ad7c add cutlass support for blackwell fp8 gemm (#13798) 2025-03-04 07:55:07 -08:00
288cc6c234 [Attention] MLA with chunked prefill (#12639)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Patrick Horn <patrick.horn@gmail.com>
Co-authored-by: simon-mo <xmo@berkeley.edu>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-21 15:30:12 -08:00
839b27c6cc [Kernel]Add streamK for block-quantized CUTLASS kernels (#12978) 2025-02-20 22:14:24 -08:00
c1e37bf71b [Kernel][Bugfix] Refactor and Fix CUTLASS 2:4 Sparse Kernels (#13198)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-14 00:01:14 +00:00
8a69e0e20e [CI/Build] Auto-fix Markdown files (#12941) 2025-02-08 04:25:15 -08:00
c11de33dad [Bugfix][Kernel] Fix per-token/per-channel quantization for Hopper scaled mm (#12696)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-03 13:04:59 -08:00
eb5741ad42 [Kernel][Quantization] Integrate block-quantized CUTLASS kernels for DeepSeekV3 (#12587)
Integrates the block-quantized kernels introduced in
https://github.com/vllm-project/vllm/pull/11868 for use in linear
layers.

Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-01-31 15:29:11 -08:00
9798b2fb00 [Kernel] Update cutlass_scaled_mm to support 2d group (blockwise) scaling (#11868) 2025-01-30 18:33:00 -08:00
4068f4b5b5 [MISC] Replace c10::optional with std::optional (#11730)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-01-05 10:20:34 +09:00
8936316d58 [Kernel] Refactor Cutlass c3x (#10049)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-12-19 07:00:18 +00:00
60508ffda9 [Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995)
Co-authored-by: Faraz Shahsavan <faraz.shahsavan@gmail.com>
Co-authored-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Rahul Tuli <rahul@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
2024-12-18 09:57:16 -05:00
96d999fbe8 [Kernel] Initial Machete W4A8 support + Refactors (#9855)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2024-11-18 12:59:29 -07:00
d1e8240875 [Bugfix] Fix spurious "No compiled cutlass_scaled_mm ..." for W8A8 on Turing (#9487) 2024-10-22 15:41:13 -07:00
aeb37c2a72 [CI/Build] Per file CUDA Archs (improve wheel size and dev build times) (#8845) 2024-10-03 22:55:25 -04:00
8d59dbb000 [Kernel] Add per-tensor and per-token AZP epilogues (#5941)
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2024-08-06 18:17:08 +00:00
8571ac4672 [Kernel] Update CUTLASS to 3.5.1 (#7085) 2024-08-05 15:13:43 -04:00
35e9c12bfa [Kernel] Tuned int8 Cutlass Kernels for SM75 (T4) (#6996)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-07-31 14:40:32 -07:00
93548eb37e [Kernel] Enable FP8 Cutlass for Ada Lovelace (#6950)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-07-31 14:40:22 -07:00
af647fb8b3 [Kernel] Tuned int8 kernels for Ada Lovelace (#6848)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-07-29 20:24:58 -06:00