|
|
9689be1e8e
|
[ROCm] Add skinny gemm bias support for dtypes fp16,bf16,fp8 (#24988)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
Signed-off-by: Hashem Hashemi <159079214+amd-hhashemi@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
65c4513ad8
|
[Core] Ensure LoRA linear respect the base_layer's tp_size and tp_rank (#25487)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
5acda4cc71
|
[Spec Decode][CI] Add e2e test for examples/spec_decode.py and prevent breaking Acceptance Length (#24531)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
78f892c373
|
[Misc] Reduce initialization time of auto_tune (#23682)
Signed-off-by: Weida Hong <wdhongtw@google.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
26da2c6244
|
[V1][Kernel] Add triton implementation for reshape_and_cache_flash (#24503)
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
0081c6956a
|
Use macro guard CUDA functions for back compatibility in grouped_topk_kernel.cu (#25346)
Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Co-authored-by: Rahul Tuli <rtuli@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
6462feef65
|
[Log] Optimize kv cache memory log from Bytes to GiB (#25204)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
e9a74500e5
|
[BugFix] Fix UB in per_token_group_quant.cu (#24913)
Signed-off-by: Shreeasish Kumar <shreeasish@rivosinc.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
02a3ce2230
|
[Kernels] Support blocked fp8 quantization for compressed tensors MoE (#25219)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
9cae377a16
|
Add backward compatibility for GuidedDecodingParams (#25422)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
8c5c35c027
|
[Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support (#24845)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
f97da2c732
|
[V1] Remove V0 code paths for Hybrid models (#25400)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
02134245a9
|
[UX] Change kv-cache-memory log level to debug (#25479)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
2ab27b70f5
|
[XPU] Fix MOE DP accuracy issue on XPU (#25465)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
a500f7cc09
|
[Docs] NixlConnector quickstart guide (#24249)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
1b75f784b8
|
[P/D] Support NIXL connector to disconnect during a clean shutdown (#24423)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
0eddd2b528
|
[BugFix] Register expert_map as named buffer for wake_up and sleep (#25458)
Signed-off-by: wuxibin <wuxibin@bytedance.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
030774abcf
|
[CI/Build] Fix disabled v1 attention backend selection test (#25471)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
77389d87b2
|
[docs] Benchmark Serving Incorrect Arg (#25474)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
59659b74c4
|
[Core] Optimize LoRA weight loading (#25403)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
3b96eafdb0
|
[Bugfix] Fix idefics3 tie_word_embeddings (#25454)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
fb64e67533
|
[Test]: Hermes tool parser stream output error in Qwen3 case (#25203)
Signed-off-by: Andreas Hartel <andreas.hartel@aleph-alpha.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
215da8510d
|
[Misc] Move DP for ViT code inside model executor dir (#25459)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
c4a15ee240
|
[Frontend] Add a new xml-based tool parser for qwen3-coder (#25028)
Signed-off-by: Zhikaiiii <1658973216@qq.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
3a640b8f74
|
Handle triton kernel import exception (#25319)
Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
0a1397c7df
|
[Model] Enable DP for ViT in Qwen2-VL (#25445)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
921945c81e
|
[NIXL][OOT platform] support nixl_connector with oot platform and other nixl_backend (#25121)
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
675fc471bf
|
[DP/EP][GPTOSS] Use triton matmul-ogs kernels for GPTOSS DP/EP (#24588)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
b0ae0ad935
|
[Docs] Fix griffe warnings in vllm/lora/ops (#25369)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
e99b286f01
|
[Bugfix] Remove contiguous output req for context parallel MLA (#25414)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
23a7805022
|
[benchmarks]allow skip ready check for bench serve (#25420)
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
e3a3c738b0
|
[XPU] Fix compile_size is None case. (#25433)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
e41946ecdb
|
[feat] Support MRoPE + YaRN (#25384)
Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com>
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
f071a31ede
|
[Bug] Fix Long Context OOM Issue (#25290)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
1b30043f0d
|
[V0 deprecation] Remove _set_default_args_v0 function (#25409)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
a0b5617263
|
[V0 deprecation] Remove platform v1 controling interface (#25410)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
e6c22d2b2f
|
[Perf] Apply torch.compile for per_block_cast_to_fp8 (#24611)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
dbb029cfe1
|
[Performance] Remove input pads in cutlass_mla and optimize v_proj output handling (#25184)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
25dd155e60
|
[BugFix] [DP/EP] Fix slow execution when BS <= DP (#25407)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Chris Bamford <chrisbam4d@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
864bbe36f0
|
[Bugfix] Fix missing clear_connector_metadata (#25397)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
e97cf2e32b
|
[Core] Drop overly aggressive whisper assertion (#25408)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
d96a3fc653
|
[Bugfix] fix custom op test (#25429)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
aac85cc6d6
|
[Frontend] Responses API MCP tools for built in tools and to pass through headers (#24628)
Signed-off-by: Alec Solder <alecs@fb.com>
Signed-off-by: Alec S <10566873+alecsolder@users.noreply.github.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
f1e3d031e4
|
[TPU] update torch_xla dependency for PyPI compatibility (#25278)
Signed-off-by: Johnny Yang <johnnyyang@google.com>
Co-authored-by: Chengji Yao <chengjiyao@google.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
6e9229e919
|
[CI/Build] Skip Qwen3-VL initialization tests until models are actually released (#25394)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
ff54b6bfe3
|
[KV offload][5/N] Add CPUOffloadingSpec (#24251)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
6dbbecd5b2
|
[torch.compile] Cleanup compilation tests and custom passes, add debug utils, fix DCE bug (#23091), fix test (#24376), and prep for custom op matching (#24604) (#24542)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: luka <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
6850bfe15c
|
[misc] Remove RFC review hours reference (#25416)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
d988b84e8e
|
[DP] support torchrun external launcher with Data Parallelism (#24899)
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
7337ec6c9f
|
[CI Failure] Fix fp8 kv cache on <SM90 (#25396)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|