youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
ihb2032	4f02b77de4	Fix: Add explicit #include <omp.h> for OpenMP compatibility on certain toolchains (#24951 ) Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn> Signed-off-by: ihb2032 <1355790728@qq.com>	2025-09-18 17:43:23 +08:00
Lumina	81b16a2bc9	[Kernel] Better inf handling for grouped topk cu (#24886 ) Signed-off-by: lumina37 <starry.qvq@gmail.com>	2025-09-18 05:53:55 +00:00
elvischenv	e6585ddb45	[Bugfix] Fix accuracy issue for silu_mul + nvfp4 quant fusion kernel (#24833 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-17 16:37:23 -07:00
Alexander Matveev	fedb75fa27	[Bugfix][B200] Fix `cutlass_mla` hang (#24966 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-17 18:06:38 -04:00
czhu-cohere	3c068c637b	[Kernel] Faster pre-processing time for W4A8 (#23972 ) Signed-off-by: czhu-cohere <conway.zhu@cohere.com>	2025-09-17 14:35:32 -07:00
Matthew Bonanni	8f3616f422	Remove old cutlass mla (#23961 ) Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-09-17 14:31:43 +00:00
Aidyn-A	bfe9380161	Apply fixes for CUDA 13 (#24599 ) Signed-off-by: Aidyn-A <aidyn.b.aitzhan@gmail.com>	2025-09-17 09:15:42 -04:00
Li, Jiang	9fccd04e30	[Bugfix] Fix Stream usage in CPU model runner and OneDNN kernel check (#25046 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-09-17 05:54:02 -07:00
Wentao Ye	e757a629e7	[Bug] Fix Cutlass Scaled MM Compilation Error (#24887 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-15 17:21:17 -04:00
Kyle Sayers	a0b26701c9	[Transform] Deterministic Hadacore Transforms (#24106 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-09-15 12:59:31 -06:00
xiao-llm	01413e0cf5	Fp8 paged attention update (#22222 ) Signed-off-by: Xiao Yu <xiao.yu@amd.com> Signed-off-by: xiao-llm <xiao.yu.dc@outlook.com> Co-authored-by: Xiao Yu <xiao.yu@metamaterial.com> Co-authored-by: Xiao Yu <xiao.yu@amd.com> Co-authored-by: Bowen Bao <bowenbao@amd.com>	2025-09-15 10:43:26 -04:00
Michael Goin	59d7ffc17f	[CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (#24750 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-13 07:29:19 +00:00
Elvir Crnčević	98229db244	[Kernels][DP/EP] Optimize Silu Kernel for R1 (#24054 ) Signed-off-by: elvircrn <elvircrn@gmail.com>	2025-09-13 00:17:27 -07:00
elvischenv	dbeee3844c	[Perf] Use NVIDIA hardware-accelerated instruction for float to fp8_e4m3 quantization (#24757 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-09-13 00:16:24 -07:00
Woosuk Kwon	5febdc8750	[Chore] Remove unused batched RoPE op & kernel (#24789 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-13 00:08:20 -07:00
Didier Durand	bcb06d7baf	[Doc]: fix typos in various files (#24726 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-12 06:43:12 -07:00
Michael Goin	c3aea10dc8	[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-11 15:43:14 -07:00
Ilya Markov	1fdd5c42d7	[Kernels] Enable Torch Symmetric Memory All-Reduce By Default (#24111 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-11 09:45:31 -07:00
TaehyunKim	9bd831f501	[Model] New model support for Motif-1-Tiny (#23414 ) Signed-off-by: ca1207 <ca1207zzz@gmail.com> Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com> Co-authored-by: WyldeCat <skan1543@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-10 23:29:40 -07:00
Ming Yang	86173ad593	[Kernel] Support decode context parallelism on Blackwell with CUTLASS MLA (#24385 ) Signed-off-by: Ming Yang <minos.future@gmail.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-09-08 09:27:12 +08:00
mohankku	0eadaeff7e	[Bugfix] Avoid uninitialized usage of azp_val when AZP is false. (#24335 ) Signed-off-by: Mohan Kumar Kumar <mohan.cbein@gmail.com> Signed-off-by: mohankku <mohan.cbein@gmail.com>	2025-09-06 08:17:03 -07:00
yzds	ac201a0eaf	[Feature] Support Decode Context Parallel (DCP) for MLA (#23734 ) Signed-off-by: hongchao <hongchao@msh.team> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: hongchao <hongchao@msh.team> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-09-06 13:24:05 +08:00
Didier Durand	35bf193864	[Doc]: fix typos in Python comments (#24294 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-05 19:41:12 -07:00
elvischenv	adc3ddb430	[Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files (#23727 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-04 14:25:45 -07:00
Li, Jiang	57b1ce94f7	[CPU] Refactor CPU unquantized linear (#24150 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-09-04 14:28:45 +08:00
Qiming Zhang	e919d6f549	[Kernel][Bugfix] Fix grouped topk cu (#24146 ) Signed-off-by: mayuyuace <qiming1.zhang@intel.com>	2025-09-04 12:37:37 +08:00
Matthew Bonanni	a742322092	[Attention] Blackwell FP8 MLA support with CUTLASS_MLA backend (#23289 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-09-03 14:05:24 -04:00
Wentao Ye	c4ed78b14f	[Compile] Fix Compile Warning for `w4a8_mm_entry.cu` (#23660 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-02 20:45:52 -07:00
co63oc	1bd007f234	fix some typos (#24071 ) Signed-off-by: co63oc <co63oc@users.noreply.github.com>	2025-09-02 20:44:50 -07:00
Asaf Joseph Gardin	2b41cbbf03	[V1][Mamba1] - FP32 SSM Kernel Support (#23506 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-09-01 20:53:00 -07:00
yzds	0dc9532065	[BUGFIX ] fix undefined silu_and_mul_nvfp4_quant (#23929 ) Signed-off-by: hongchao <hongchao@msh.team> Signed-off-by: Richard Zou <zou3519@gmail.com> Co-authored-by: hongchao <hongchao@msh.team> Co-authored-by: Richard Zou <zou3519@gmail.com> Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>	2025-08-29 09:36:39 -07:00
Charlie Fu	006477e60b	[ROCm][Fix] Fix rocm build caused by #23791 (#23847 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-08-28 19:52:27 -07:00
elvischenv	16a45b3a28	[NVIDIA] Support SiluMul + NVFP4 quant fusion (#23671 ) Signed-off-by: jindih <jindih@nvidia.com> Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: jindih <jindih@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedic <lgovedic@redhat.com>	2025-08-28 19:36:50 +00:00
yzds	186aced5ff	[Kernel] cuda kernels for upcoming decode context parallel feature (#23791 ) Co-authored-by: hongchao <hongchao@msh.team>	2025-08-28 15:29:11 +08:00
Luka Govedič	4f35be10a9	[BugFix] Fix topk_softmax assert (#19764 ) Signed-off-by: Luka Govedic <lgovedic@redhat.com>	2025-08-27 09:47:28 -07:00
Xin Yang	8a3cd90af5	[Kernel] Add fused grouped_topk kernel for MoE (#23274 ) Signed-off-by: Xin Yang <xyangx@amazon.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-08-25 11:47:52 -07:00
czhu-cohere	e76e233540	[kernel] Support W4A8 on Hopper (#23198 ) Signed-off-by: czhu-cohere <conway.zhu@cohere.com>	2025-08-24 06:18:04 +00:00
Matthew Bonanni	19fe1a0510	[Kernel] Add FP8 support with FlashMLA backend (#22668 ) Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-08-22 02:26:32 +00:00
Elvir Crnčević	044931f97b	Make sure that vectorize_with_alignment produced vectorized global loads (#23182 )	2025-08-21 20:06:54 +00:00
Wentao Ye	f94bf9b924	[Compile] Fix Compile Warning SM100 Cutlass MLA (#23287 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-21 03:09:39 +00:00
Li, Jiang	7be5d113d8	[CPU] Refactor CPU W8A8 scaled_mm (#23071 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-08-21 09:34:24 +08:00
Michael Goin	0cdbf5e61c	[Kernel/Quant] Remove the original marlin format and qqq (#23204 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-20 15:13:36 -04:00
shixianc	b17109beea	[Kernel] CUTLASS MoE FP8: Integrate cuda moe permute/unpermute (#23045 ) Signed-off-by: Shixian Cui <shixian@amazon.com>	2025-08-20 10:35:26 -04:00
Andy Lo	b2fd0b81e0	[Bugfix][CI] Machete kernels: deterministic ordering for more cache hits (#23055 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2025-08-17 22:10:26 -07:00
Jee Jee Li	4d4061b6e7	[Kernel] Add cuda kernel for gpt_oss activation (#22951 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-17 05:03:24 +00:00
Michael Goin	4fc722eca4	[Kernel/Quant] Remove AQLM (#22943 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-08-16 19:38:21 +00:00
shixianc	7f89ed248f	[Fix] enable swap_ab for pplx problem size computation (#22991 ) Signed-off-by: Shixian Cui <shixian@amazon.com> Co-authored-by: Shixian Cui <shixian@amazon.com>	2025-08-15 14:02:12 -07:00
Woosuk Kwon	1c859a1387	[V0 Deprecation] Remove advance_step (#22969 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-15 08:22:31 -07:00
Simon Mo	f1f0d2fab8	Revert "[Kernel] Add cuda kernel for gpt_oss activation" (#22948 )	2025-08-14 17:38:10 -07:00
Jee Jee Li	81f4b96481	[Kernel] Add cuda kernel for gpt_oss activation (#22538 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-14 17:21:29 -07:00

1 2 3 4 5 ...

574 Commits