Commit Graph

9401 Commits

Author SHA1 Message Date
01efc7ef78 [ci] fix wheel names for arm wheels (#24898)
Signed-off-by: simon-mo <simon.mo@hey.com>
v0.10.2
2025-10-07 13:40:13 -07:00
26b999c71a [CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (#24750)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-13 09:30:00 -07:00
da3fa78dc9 [Compilation Bug] Fix Inductor Graph Output with Shape Issue (#24772)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
v0.10.2rc3
2025-09-12 23:03:56 -07:00
bbb70036cb Enable conversion of multimodal models to pooling tasks (#24451)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-09-12 23:02:15 -07:00
89da8d9d09 [Qwen3Next] Fixes the cuda graph capture conditions under large batch sizes (#24660) (#24667)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
2025-09-12 23:01:49 -07:00
01085b134d [Qwen3-Next] MoE configs for H100 TP=1,2 and TP2/EP (#24739)
Signed-off-by: elvircrn <elvircrn@gmail.com>
2025-09-12 23:01:25 -07:00
66160a9943 [BugFix] Fix Qwen3-Next PP (#24709)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-09-12 23:00:28 -07:00
eaca762c18 [Qwen3-Next] MoE configs for H20 TP=1,2,4,8 (#24707)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-12 23:00:09 -07:00
880c741bb6 [Bugfix] fixes the causal_conv1d_update kernel update non-speculative decoding cases (#24680)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
v0.10.2rc2
2025-09-11 18:16:43 -07:00
40b6c9122b [V1] feat:add engine v1 tracing (#20372)
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Signed-off-by: Ye Zhang <zhysishu@gmail.com>
Signed-off-by: RichardoMu <44485717+RichardoMrMu@users.noreply.github.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Mu Huai <tianbowen.tbw@antgroup.com>
Co-authored-by: Ye Zhang <zhysishu@gmail.com>
Co-authored-by: Benjamin Bartels <benjamin@bartels.dev>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: 瑜琮 <ly186375@antfin.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-09-11 17:10:39 -07:00
2e6bc46821 [Startup] Make DeepGEMM warmup scale with max-num-batched-tokens (#24693)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-09-11 20:10:19 -04:00
fcba05c435 [Bug] Fix Layer weight_block_size Assertion Issue (#24674)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-11 19:47:59 -04:00
7a30fa8708 [Doc] Clarify cudagraph capture size logic and default behavior in scheduler (#18698)
Signed-off-by: Zazzle516 <2405677060@qq.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-11 23:18:09 +00:00
f82f7a8990 [Qwen3-Next] MOE configs for H100 TP4 (#24699)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-09-11 15:45:52 -07:00
c3aea10dc8 [Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-11 15:43:14 -07:00
d4fd2768ef [Bugfix][Attention] Fix FlashInfer MLA block size logic (#24692)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-09-11 22:39:42 +00:00
7a70a71892 [Qwen3-Next] Add B200 MoE configs for Qwen3-next (#24698)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-09-11 15:34:58 -07:00
7d4651997a [CI/Build] Add bc-linter to vLLM CI (#21234)
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-09-11 15:34:36 -07:00
569bf1c9c0 [Qwen3-Next] MoE configs for H200 TP=1,2,4 (#24695)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-11 14:38:16 -07:00
1ec20355f5 [Bugfix] Set VLLM_ALLREDUCE_USE_SYMM_MEM default to False (#24696)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-11 14:32:27 -07:00
e42af78b18 [flashinfer] [kernel] support for fp8 kv cache for trtllm prefill attention (#24197)
Signed-off-by: Xiaozhu <mxz297@gmail.com>
2025-09-11 14:20:09 -07:00
074854b24f [Kernel][B200] mxfp4 fused cutlass moe (#23696)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-11 17:04:56 -04:00
79ac59f32e Update Spec Decode metrics to include drafted and accepted token throughput (#24127)
Signed-off-by: Andrew Xia <axia@meta.com>
2025-09-11 19:58:43 +00:00
b971f91504 [BugFix] Fix tokenize asyncio task leak (#24677)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-09-11 19:44:04 +00:00
c733bd5e87 [Qwen3-Next] Add MoE Config for H200 (#24688)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-11 12:40:15 -07:00
a892b259b4 [Doc] Remove Useless Comments (#24687)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-11 12:25:47 -07:00
127ded0a9e [Ultravox] Use wrapped_model_config to instantiate inner model (#24679)
Signed-off-by: Peter Salas <peter@fixie.ai>
2025-09-11 18:52:24 +00:00
bb2b5126da [VLM] Migrate remain DP-supported ViT models to use disable_tp (#24363)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-11 18:30:41 +00:00
361ae27f8a [Docs] Fix formatting of transcription doc (#24676)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-11 11:18:06 -07:00
e26fef8397 fix some typos (#24616)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-09-11 10:48:46 -07:00
c1eda615ba Fix model name included in responses (#24663)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-11 10:47:51 -07:00
4aa23892d6 [Bugfix] Fix platform-specific routing in CustomOp implementations (#24444)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
2025-09-11 17:15:01 +00:00
1fdd5c42d7 [Kernels] Enable Torch Symmetric Memory All-Reduce By Default (#24111)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-11 09:45:31 -07:00
bcbe2a4d9e [VLM] Optimize GLM4.5-V-style video processing to only decode necessary frames (#24161)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-11 09:44:34 -07:00
51d41265ad [Docs] Fix typos in EP deployment doc (#24669)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-11 09:07:23 -07:00
4984a291d5 [Doc] Fix Markdown Pre-commit Error (#24670)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-11 09:05:59 -07:00
404c85ca72 [Docs] Add transcription support to model (#24664)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-09-11 07:39:01 -07:00
817beef7f3 [Bugifx] Fix qwen-next packed_modules_mapping (#24656)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-11 22:26:17 +08:00
4f6593b058 [HybridKVCache][Platform] Add support_hybrid_kv_cache for platform (#24646)
Signed-off-by: MengqingCao <cmq0113@163.com>
2025-09-11 21:47:58 +08:00
94e6b2d55f Allow users to specify kv cache memory size (#21489)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-11 13:41:07 +00:00
fd1ce98cdd [CI] Split mteb test from Language Models Test (#24634)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-11 06:37:51 -07:00
d11ec124a0 [Bench] Add qwen-next in benchmark_moe.py (#24661)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-11 21:29:43 +08:00
f510715882 [build] add torch to tool.uv no-build-isolation-package (#24303)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-11 13:19:44 +00:00
f946197473 [Docs] Fixes a typo in the qwen3next model name. (#24654)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
2025-09-11 19:35:14 +08:00
0cd72a7b72 [XPU] add missing dependency tblib for XPU CI (#24639)
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
2025-09-11 11:22:33 +00:00
5f5271f1ee Move LoRAConfig from config/__init__.py to config/lora.py (#24644)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-11 11:01:38 +00:00
d6249d0699 Fix typing for safetensors_load_strategy (#24641)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-11 10:41:39 +00:00
25bb9e8c65 [CI Failure] fix models/language/pooling/test_auto_prefix_cache_support.py (#24636)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-11 03:31:23 -07:00
a1213fae5f [Misc] Add @NickLucche to codeowners (#24647)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-09-11 17:18:09 +08:00
a8b0361c92 [CI] Split pooling from entrypoints Test (#24632)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-11 01:53:09 -07:00