|
|
759ef49b15
|
Remove V0 Encoder-Decoder Support (#24907)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-09-15 21:17:14 -07:00 |
|
|
|
2891603efd
|
[ROCm][Bugfix] Fix the case where there's bias (#24895)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-09-15 20:05:12 -06:00 |
|
|
|
a0b26701c9
|
[Transform] Deterministic Hadacore Transforms (#24106)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-09-15 12:59:31 -06:00 |
|
|
|
59d7ffc17f
|
[CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (#24750)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-13 07:29:19 +00:00 |
|
|
|
98229db244
|
[Kernels][DP/EP] Optimize Silu Kernel for R1 (#24054)
Signed-off-by: elvircrn <elvircrn@gmail.com>
|
2025-09-13 00:17:27 -07:00 |
|
|
|
5febdc8750
|
[Chore] Remove unused batched RoPE op & kernel (#24789)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-13 00:08:20 -07:00 |
|
|
|
5fe643fc26
|
Add FLASHINFER_MLA to backend selector test (#24753)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
|
2025-09-12 22:30:07 +00:00 |
|
|
|
72fc8aa412
|
[Multi Modal] Add FA3 in VIT (#24347)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-09-12 21:27:24 +08:00 |
|
|
|
c3aea10dc8
|
[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-11 15:43:14 -07:00 |
|
|
|
074854b24f
|
[Kernel][B200] mxfp4 fused cutlass moe (#23696)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-11 17:04:56 -04:00 |
|
|
|
e26fef8397
|
fix some typos (#24616)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
|
2025-09-11 10:48:46 -07:00 |
|
|
|
9bd831f501
|
[Model] New model support for Motif-1-Tiny (#23414)
Signed-off-by: ca1207 <ca1207zzz@gmail.com>
Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com>
Co-authored-by: WyldeCat <skan1543@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-10 23:29:40 -07:00 |
|
|
|
dcb28a332b
|
[Kernel] Flashinfer MLA (trtllm-gen) decode kernel integration (#21078)
Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-10 15:31:10 -07:00 |
|
|
|
6cbd41909e
|
Feature/vit attention unification# 23880 (#23978)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-10 06:10:14 -07:00 |
|
|
|
0efdb5c3ba
|
[gpt-oss] Cache permute indices for faster MXFP4 MoE layer loading (#24154)
Signed-off-by: Wei Wei <wwei6@meta.com>
|
2025-09-10 04:27:53 +00:00 |
|
|
|
7e7db04310
|
[CI] Retry flaky fp8 cutlass mla tests (#24536)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-09-09 20:33:10 -07:00 |
|
|
|
46876dff32
|
[Doc]: fixing typos to improve docs (#24480)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-08 23:06:04 -07:00 |
|
|
|
bba1042c6f
|
[Flashinfer] Support Flashinfer TRTLLM FP8-qkv BF16/FP16-out Attention Kernel (#23647)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-09-08 20:53:07 -07:00 |
|
|
|
e041314184
|
[Bugfix] Fix mamba2 prefill chunking (#23279)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-08 11:42:41 +00:00 |
|
|
|
86173ad593
|
[Kernel] Support decode context parallelism on Blackwell with CUTLASS MLA (#24385)
Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-09-08 09:27:12 +08:00 |
|
|
|
e68dc2f014
|
[Bugfix] Fix unstable silu_mul+nvfp4 quant fusion test (#24370)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-09-06 20:39:34 +00:00 |
|
|
|
7555d6b34a
|
[Bugfix] Fix test_mixtral_moe (#24371)
|
2025-09-06 09:32:03 -07:00 |
|
|
|
adc3ddb430
|
[Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files (#23727)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-04 14:25:45 -07:00 |
|
|
|
402759d472
|
[Attention] FlashAttn MLA (#14258)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-09-04 02:47:59 -07:00 |
|
|
|
57b1ce94f7
|
[CPU] Refactor CPU unquantized linear (#24150)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-09-04 14:28:45 +08:00 |
|
|
|
a742322092
|
[Attention] Blackwell FP8 MLA support with CUTLASS_MLA backend (#23289)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-09-03 14:05:24 -04:00 |
|
|
|
e9b92dcd89
|
[Kernels] Overlap shared experts with send/recv (#23273)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-09-03 12:35:18 -04:00 |
|
|
|
d7e1e59972
|
[Doc]: fix typos in Python comments (#24093)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-02 21:05:45 -07:00 |
|
|
|
1bd007f234
|
fix some typos (#24071)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
|
2025-09-02 20:44:50 -07:00 |
|
|
|
e66ed3e675
|
[CI Failure] Skip failing nvfp4 silu test (#23959)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-02 13:18:15 -04:00 |
|
|
|
fad73be1a5
|
[Doc]: fix typos in Python comments (#24077)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-02 02:38:55 -07:00 |
|
|
|
16a45b3a28
|
[NVIDIA] Support SiluMul + NVFP4 quant fusion (#23671)
Signed-off-by: jindih <jindih@nvidia.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: jindih <jindih@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedic <lgovedic@redhat.com>
|
2025-08-28 19:36:50 +00:00 |
|
|
|
186aced5ff
|
[Kernel] cuda kernels for upcoming decode context parallel feature (#23791)
Co-authored-by: hongchao <hongchao@msh.team>
|
2025-08-28 15:29:11 +08:00 |
|
|
|
3af47c3cc6
|
[Feature] Add Hopper DeepGEMM E8M0 for DeepSeekV3.1 scale_fmt (#23666)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-08-27 14:09:08 +00:00 |
|
|
|
c37c0af990
|
[Misc] Fix comments in tests/kernels/quantization (#23675)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-08-26 19:31:20 +00:00 |
|
|
|
f66673a39d
|
[Kernel] Added flashinfer fp8 per-tensor gemms (#22895)
Signed-off-by: Julien Lin <jullin@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-26 06:54:04 -07:00 |
|
|
|
906e461ed6
|
[CI Fix] Pin deepep and pplx tags in tools/ep_kernels/, gate multigpu tests (#23568)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-25 18:29:00 -07:00 |
|
|
|
8a3cd90af5
|
[Kernel] Add fused grouped_topk kernel for MoE (#23274)
Signed-off-by: Xin Yang <xyangx@amazon.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-08-25 11:47:52 -07:00 |
|
|
|
e0329ed4b4
|
Updates to Flex + VLLm integration (#21416)
Signed-off-by: drisspg <drisspguessous@gmail.com>
|
2025-08-25 09:32:42 -04:00 |
|
|
|
e76e233540
|
[kernel] Support W4A8 on Hopper (#23198)
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
|
2025-08-24 06:18:04 +00:00 |
|
|
|
24d0c9e6ed
|
[NVIDIA][torch.compile] Support Flashinfer TRTLLM FP8-q/kv NVFP4-out Attention Kernel (#22703)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-08-22 22:09:05 +00:00 |
|
|
|
341923b982
|
fix(tests): Ensure reliable CUDA cache clearing in MoE test (#23416)
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-22 17:20:59 +00:00 |
|
|
|
19fe1a0510
|
[Kernel] Add FP8 support with FlashMLA backend (#22668)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
|
2025-08-22 02:26:32 +00:00 |
|
|
|
3bbe11cc13
|
[Perf] Small optimizations for silu_mul_fp8_quant_deep_gemm (#23265)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-21 17:56:15 -04:00 |
|
|
|
1d353b6352
|
[Core] Always use tensor cores for Flashinfer Decode Wrapper (#23214)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-08-21 16:02:11 -04:00 |
|
|
|
f8ce022948
|
add tg-mxfp4-moe-test (#22540)
Signed-off-by: siyuanf <siyuanf@nvidia.com>
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-21 17:05:47 +00:00 |
|
|
|
7be5d113d8
|
[CPU] Refactor CPU W8A8 scaled_mm (#23071)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-08-21 09:34:24 +08:00 |
|
|
|
0cdbf5e61c
|
[Kernel/Quant] Remove the original marlin format and qqq (#23204)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-20 15:13:36 -04:00 |
|
|
|
b17109beea
|
[Kernel] CUTLASS MoE FP8: Integrate cuda moe permute/unpermute (#23045)
Signed-off-by: Shixian Cui <shixian@amazon.com>
|
2025-08-20 10:35:26 -04:00 |
|
|
|
a38b8af4c3
|
[NVIDIA] Add SM100 Flashinfer Cutlass MoE fp8 backend (#22357)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
|
2025-08-19 18:01:53 -04:00 |
|