|
|
dec66d253b
|
[Kernel] GGUF MMVQ kernel for multiple input vectors (#18754)
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
|
2025-06-16 17:33:26 +08:00 |
|
|
|
c6703d1e0d
|
[MISC] Remove unused variableds in C++ (#19609)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-06-15 20:05:28 -07:00 |
|
|
|
e13945f9dd
|
[Perf] Further tunings for SM100 FP8 CUTLASS kernel (#19566)
|
2025-06-14 17:25:10 -07:00 |
|
|
|
294fc1e2c9
|
[Hardware][NVIDIA][kernel] Fp4 MOE quant kernel optimization (#19500)
|
2025-06-14 09:34:28 -07:00 |
|
|
|
b6efafd9e4
|
[Perf] Vectorize static / dynamic INT8 quant kernels (#19233)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-12 06:51:41 -07:00 |
|
|
|
2f1c19b245
|
[CI] change spell checker from codespell to typos (#18711)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-06-11 19:57:10 -07:00 |
|
|
|
84166fee97
|
[Kernel] Integrate CUTLASS MoE kernel with PPLX (#18762)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-06-06 18:26:11 -07:00 |
|
|
|
61059bee40
|
[Hardware][NVIDIA] FP4 MoE kernel optimization (#19110)
Signed-off-by: Chiyue Wei <chiyuew@nvidia.com>
Co-authored-by: Chiyue Wei <chiyuew@nvidia.com>
|
2025-06-05 09:48:26 -07:00 |
|
|
|
53a5a0ce30
|
[Perf] Tunings for SM100 FP8 CUTLASS kernel (#18778)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-04 10:46:28 -07:00 |
|
|
|
5f2cd251d2
|
Sm100 blockwise fp8 swap ab (#18564)
|
2025-06-04 07:48:45 -07:00 |
|
|
|
e31446b6c8
|
[Perf] Tune scaled_fp8_quant by increasing vectorization (#18844)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-06-03 13:48:25 -07:00 |
|
|
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
|
|
6e588da0f4
|
[Build/CI] Fix CUDA 11.8 build (#17679)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-22 12:13:54 -07:00 |
|
|
|
e73b7dfd69
|
[Bugfix] fix an illegal memory access was encountered of marlin kernel + act_order (#18245)
|
2025-05-16 16:02:44 -07:00 |
|
|
|
e23564cb70
|
use ceil_div in cutlass block scaling shape check (#17918)
|
2025-05-16 03:02:58 -07:00 |
|
|
|
7b2f28deba
|
[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-05-13 22:13:56 -07:00 |
|
|
|
e57e4d6e9e
|
Fix Broken macro for cutlass moe (#18049)
Signed-off-by: drisspg <drisspguessous@gmail.com>
|
2025-05-12 23:31:06 -07:00 |
|
|
|
d8487ef557
|
[ROCm]: Fix build from source failure with gcc14 and ROCm 6.3 (#13779)
Signed-off-by: Arjun Kathuria <arjun.kathuria8@gmail.com>
|
2025-05-12 20:36:33 -07:00 |
|
|
|
d74e5f37bc
|
[Kernel] fp4 marlin kernel (#17687)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-05-10 19:58:49 -07:00 |
|
|
|
0c0fdae84f
|
[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model (#16362)
|
2025-05-09 16:24:41 -07:00 |
|
|
|
376786fac1
|
Add cutlass support for blackwell fp8 blockwise gemm (#14383)
Signed-off-by: Shu Wang <shuw@nvidia.com>
|
2025-05-08 15:09:55 -07:00 |
|
|
|
f50dcb7c21
|
[Easy] Eliminate c10::optional usage in vllm/csrc (#17819)
|
2025-05-08 03:05:10 -07:00 |
|
|
|
1a45a61387
|
[Kernel] GGUF MoeVec kernel (#16780)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-05-06 23:07:23 -07:00 |
|
|
|
a17cef70ea
|
Removed unused marlin cuda code (#17684)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-06 17:59:47 -07:00 |
|
|
|
1d0c9d6b2d
|
[Kernel] some optimizations for dense marlin and moe marlin (#16850)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-05-05 09:39:30 -07:00 |
|
|
|
460a2b1100
|
[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations (#10867)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-05-01 07:59:28 -07:00 |
|
|
|
40896bdf3f
|
pre-commit autoupdate (#17380)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-29 06:46:55 -07:00 |
|
|
|
d6da8a8ff2
|
[Bugfix] Fix numel() downcast in fused_layernorm_dynamic_per_token_quant.cu (#17316)
|
2025-04-28 19:23:18 -07:00 |
|
|
|
c12df53b60
|
[Bugfix] Fix cutlass dispatch for fp8/int8 to properly invoke M<=16 c… (#16751)
Signed-off-by: Ther-LF <2639852836@qq.com>
|
2025-04-27 19:38:42 -07:00 |
|
|
|
ed7a29d9f8
|
[NVIDIA] Support Cutlass MLA for Blackwell GPUs (#16032)
Signed-off-by: kaixih <kaixih@nvidia.com>
|
2025-04-27 06:29:21 -07:00 |
|
|
|
7b8a2ab76f
|
[Kernel] Add expert_map support to Cutlass FP8 MOE (#16861)
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>
|
2025-04-21 20:44:32 -07:00 |
|
|
|
d06ba4ed3f
|
[Kernel] moe wna16 marlin kernel (#14447)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-04-14 20:05:22 -07:00 |
|
|
|
9351f91be9
|
[BugFix][ROCm] Fix GGUF MoE Dispatch Block_Dim for ROCm (#16247)
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>
|
2025-04-08 05:10:26 -07:00 |
|
|
|
2fa66ef713
|
[Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine (#15946)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-04-05 20:04:22 -07:00 |
|
|
|
230b131b54
|
[Bugfix][kernels] Fix half2float conversion in gguf kernels (#15995)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-04-04 09:38:58 -07:00 |
|
|
|
90969fb39a
|
[Kernel] Add more dtype support for GGUF dequantization (#15879)
Signed-off-by: lukas.bluebaum <lukas.bluebaum@aleph-alpha.com>
|
2025-04-02 01:58:48 -07:00 |
|
|
|
e85829450d
|
[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-03-31 04:42:18 -07:00 |
|
|
|
9239bf718e
|
[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com>
|
2025-03-27 00:54:44 +00:00 |
|
|
|
a608160027
|
[Kernel] Fix conflicting macro names for gguf kernels (#15456)
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
|
2025-03-25 13:50:49 +00:00 |
|
|
|
051da7efe3
|
Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 (#15160)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Richard Barnes <rbarnes@meta.com>
|
2025-03-25 15:36:45 +08:00 |
|
|
|
6b3cc75be0
|
[Kernel] allow non-contiguous input for marlin kernel (#14658)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-03-24 09:21:33 -04:00 |
|
|
|
d3ccbd6350
|
Fix CUDA kernel index data type in vllm/csrc/quantization/fused_kernels/layernorm_utils.cuh +10 (#15159)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Richard Barnes <rbarnes@meta.com>
|
2025-03-21 10:01:11 +08:00 |
|
|
|
8c0d15d5c5
|
[Misc][Easy] Annotate unused vars in the csrc files (#14798)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-03-15 12:40:09 +08:00 |
|
|
|
977a16772c
|
[Bugfix][Kernel]: Fix AllSpark kernel compilation errors and enable for CUDA < 12.0 (#14430)
Signed-off-by: wyj371990 <wyj371990@alibaba-inc.com>
|
2025-03-14 09:55:14 -07:00 |
|
|
|
2a602b055a
|
forward fix PR 14245, restore build on ROCm 6.2 (#14709)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
|
2025-03-13 20:40:15 -07:00 |
|
|
|
debd6bbf09
|
[Kernel] Add ModelOpt FP4 Checkpoint Support (#12520)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-03-12 05:13:11 +00:00 |
|
|
|
e22ee1e7a2
|
[Kernel] GGUF MoE kernel (#14613)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
|
2025-03-12 03:33:27 +00:00 |
|
|
|
a1c8f3796c
|
dynamic distpatch of fp8 kernels (#14245)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
|
2025-03-11 10:54:56 -04:00 |
|
|
|
89cdaa83e7
|
[Kernel] Add more dtype support for GGUF kernels (#14043)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
|
2025-03-10 07:30:04 -07:00 |
|
|
|
7caff01a7b
|
[Build/BugFix] Fix hopper 12.8 build (#14354)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-03-08 08:11:56 +00:00 |
|