920db41128
[Quantization/NVFP4] Speed up TRTLLM NVFP4 MOE weight loading and fix K/V scale loading for MLA Attn ( #25968 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
9ea82ecd25
Fix V1 engine serialization error with Ray distributed executor ( #26148 )
...
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
13e211bbbc
Avoid division by zero in cache DS MLA kernel ( #26174 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
2d68bba3cd
Stop mergify from keeping stale PRs alive ( #26169 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
e45271b09c
[BugFix][QWEN-VL]fix wrong apply_rotary_emb_torch selection introduced by #24642 ( #26123 )
...
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com >
Signed-off-by: Chendi.Xue <chendi.xue@intel.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
84135b1489
Fix undefined symbol: cutlass_moe_mm_sm100 ( #26098 )
...
Signed-off-by: Jun Jiang <jasl9187@hotmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
611c23b68f
[Renderer] Move Processor out of LLMEngine ( #26165 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
c40c0d9c82
[Model] Fixed stream generator for gpt-oss + spec-decoding ( #26027 )
...
Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
d8b1f9ccc3
[CI/Build] do not enforce precompilation on tpu ci tests ( #25992 )
...
Signed-off-by: Xiang Si <sixiang@google.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
fac9b430ec
[Model] Supplement to PR 24862: Pass param prefix to LLMHead ( #25805 )
...
Signed-off-by: whx-sjtu <2952154980@qq.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
c6f384dafd
[backends][short_conv] CUDA graph piecewise edits ( #24215 )
...
Signed-off-by: Paul Pak <paulpak58@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
7faf51f1cc
[Bugfix] Re-enable prefill of max model length ( #24446 )
...
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
ff1daf6c8a
[Renderer] Move Processor out of AsyncLLM ( #24138 )
...
Signed-off-by: Yang <lymailforjob@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
f376868620
Quick fix for IMA with the Prefix Prefill kernel during graph capture ( #25983 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
564233d550
[Doc] Fixed shape description for fused_batched_moe.py ( #25668 )
...
Signed-off-by: Egor <e.a.krivov@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
2bcc745042
[Multi Modal] Configurable MM Profiling ( #25631 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
fa29d31f0d
[openai] Fix missing tool usage check (system message) ( #24768 )
...
Signed-off-by: kyt <eluban4532@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
2168fc8fae
[NIXL][Misc] Expose metrics from NIXL for logging to CLI ( #25388 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
8d332b3cf6
[CI] Fix distributed hybrid tests in CI ( #26155 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
c634415273
[test utils] correct wrong typing ( #26159 )
...
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
c81dc099a3
[Model] Use merge_by_field_config for MM models (InternVL family) ( #26153 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
edaae1825f
add(v1): RequestStatesStats to RequestOutput ( #24947 )
...
Signed-off-by: huijjj <huijong.jeong@squeezebits.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
5b80f22087
[Perf] Optimize reshape_and_cache CUDA Kernel ( #25955 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Co-authored-by: Liu-congo <1502632128@qq.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
ae03f4c010
[Input] Remove unused prompt field ( #26097 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
7e4b1861c3
[Misc] Remove typing.List ( #26150 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
d628fa1e56
[BUG] Reorder model config creation ( #26124 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
6b12b2ee38
FusedMoE support for the Transformers backend (#22650 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
bbeace233b
[Model] Use merge_by_field_config for MM models (G) ( #26117 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
09b1a5676d
[Bugfix] Fix import gemm_afp4wfp4 failure on AMD ( #26068 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
f35f896e3a
[ROCm] [VL] [Bugfix] Fix vit flash attn dispatcher logic for ROCm ( #26104 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
218349d760
[Build/CI] Revert back to Ubuntu 20.04, install python 3.12 with uv ( #26103 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
79b2fe7f19
[gpt-oss] disable tool server initialization if no tool in request ( #25790 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
56d0073f2a
[Bug]: Limit num_reqs in dummy_run when max_num_seqs is small ( #26144 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
a06bb9bf36
[DeepSeek] Improve performance of DS MLA cache kernel ( #26132 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
173c8a9520
[CI/Build] Conditionally register cutlass_fp4_group_mm to fix building on Hopper ( #26138 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
2ea7d48656
[Attention] Move Backend enum into registry ( #25893 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
8db7b7f39c
[Bug][Benchmark] Fix duplicate req in oversampling ( #26140 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
587b30c571
[Log] Optimize DeepGEMM Missing Log ( #26106 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
0c76bb2de1
[Bugfix] Disable cascade attention with FlashInfer ( #26130 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
72c5dd0310
Fix MTP with deepep_low_latency ( #25904 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
abc55b1fe5
[Perf] Fix and reapply move apply w8a8 block fp8 linear to class ( #25696 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
d737c66b95
[Mamba][KVCacheManager] Simplify kv cache manage logic for mamba + MTP ( #25119 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
da3a188bdb
EAGLE 3: Fix preamble so that measured speedup over Eagle 1 becomes 32% instead of 5% on MTBench ( #25916 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:58 -07:00
77e958752b
[Deepseek v3.2] Support indexer prefill chunking ( #25999 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
c5880cfa4c
[Small] Prevent bypassing media domain restriction via HTTP redirects ( #26035 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
01888b5cbf
[BugFix] Fix FI accuracy issue when used for MLA prefill ( #26063 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
fa179abde3
[CI/Build] Replace vllm.entrypoints.openai.api_server entrypoint with vllm serve command ( #25967 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
5c8a4a2208
[CI] Add Blackwell DeepSeek FP8 FlashInfer MoE tests ( #26040 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
06d102ecc8
[Qwen][ROCm] Flash Attention Rotary Embeddings ( #24642 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
422f2cca4b
[Platform][CI] Added OOT platform interface e2e test that running on Ascend NPU ( #25470 )
...
Signed-off-by: leo-pony <nengjunma@outlook.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
3884dce376
[Model] Use merge_by_field_config for MM models (D-F) ( #26076 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
00c0b25e82
[Model] Use merge_by_field_config for MM models (A-C) ( #26073 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
0655b90d80
[FA/Chore] Bump vllm-flash-attention ( #25537 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
83fa298682
Change size of single CUDA graph for CI to 4 ( #26089 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
5a083ce2ea
Update base image to 22.04 (jammy) ( #26065 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
115019045d
Run:ai model streamer add GCS package support ( #24909 )
...
Signed-off-by: Peter Schuurman <psch@google.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
93d2be10b6
[Misc] Make handling of SamplingParams clearer in n>1 case ( #26032 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
91e10c725c
[ROCm][Bugfix] Add missing parameter to ROCm backend ( #26029 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
2ae74a80af
Support RL online quantization with torchao ( #23014 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
ac1598d166
[BugFix] ChunkedLocalAttention is currently not CG compatible ( #26034 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
ce8ee3d9e7
[Bug] Fix Negative Cuda Memory Usage ( #25683 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
d4a83e01bb
[ROCm][Build] Add support for AMD Ryzen AI MAX / AI 300 Series ( #25908 )
...
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
90529cec41
[BugFix][DP/EP] Fix CUTLASS MLA hang under load ( #26026 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
bba7623426
[CI] Tweaks to GPT-OSS Eval (Blackwell) for stability ( #26030 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
d2f544018f
Fix test_mamba_ssm_ssd.py due to missing _query_start_loc_to_chunk_indices_offsets ( #25995 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
ed7eb771a3
[NVIDIA] Blackwell Family ( #24673 )
...
Signed-off-by: Johnny <johnnynuca14@gmail.com >
Signed-off-by: johnnynunez <johnnynuca14@gmail.com >
Signed-off-by: Johnny <johnnync13@gmail.com >
Signed-off-by: Salvatore Cena <cena@cenas.it >
Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com >
Co-authored-by: Salvatore Cena <cena@cenas.it >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
0944358a90
[Bugfix] Apply same sampling parameters for both n=1 and n>1 ( #26005 )
...
Signed-off-by: Kenichi Maehashi <maehashi@preferred.jp >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
aeff0604bb
[Benchmark] Finish documented v0.11.0 deprecation of --endpoint-type ( #26007 )
...
Signed-off-by: Nathan Scott <nathans@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
a561b9832d
[MISC] Fix misleading batch_size_capture_list when cuda_graph_sizes < 4 ( #25829 )
...
Signed-off-by: billishyahao <bill.he@amd.com >
Co-authored-by: Luka Govedic <ProExpertProg@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
e8773e620f
[CI] Only capture a single CUDA graph size in CI by default ( #25951 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
63c56cbb25
[Misc] Factor out common _apply_feature_select_strategy ( #26003 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
25e5b9ccec
[BugFix][MM] Fix Nonetype error when video is cache in qwen2.5-omni-thinker ( #26004 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
b9ed8c9679
[Doc] updating torch.compile doc link ( #25989 )
...
Signed-off-by: nadathurv <work.vnadathur@gmail.com >
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com >
Co-authored-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
9506409fc6
[Misc]allow disable pynccl ( #25421 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
fda819837e
Update to Transformers v4.56.2 ( #24638 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
7c795fdf41
[BugFix] Fix default kv-cache-dtype default for DeepseekV3.2 ( #25988 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
6444f65a2b
[Bugfix] Fix __syncwarp on ROCM ( #25996 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
4c094b339e
[MM] Add text-only mode for Qwen3-VL ( #26000 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
cd0bbf5de2
Fix INT8 quantization error on Blackwell GPUs (SM100+) ( #25935 )
...
Signed-off-by: padg9912 <phone.and.desktop@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
2b6b859916
[Log] Optimize Log for FP8MOE ( #25709 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
04cb503fda
Update launch_bounds_utils.h for correct compile on Multiple Cuda Arch - PTXAS out of range Warning ( #25843 )
...
Signed-off-by: Salvatore Cena <cena@cenas.it >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
d437ba32fd
[Model] MTP fallback to eager for DeepSeek v32 ( #25982 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
e734a2a085
[Misc] Make EP kernels install script support uv ( #25785 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
fd56f2e644
[gpt-oss] use vLLM instead of openai types for streaming ( #25186 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
1690954497
[Docs] Remove API Reference from search index ( #25949 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
b3e1846da6
Add explicit pooling classes for the Transformers backend ( #25322 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
8328d39d40
[V1] [P/D] Add Support for KV Load Failure Recovery ( #19330 )
...
Signed-off-by: David Ben-David <davidb@pliops.com >
Co-authored-by: David Ben-David <davidb@pliops.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
ef318228e7
[Bench] Add DeepSeekV32 to MoE benchmark ( #25962 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
8ecccdd15f
[Llama4] [multimodal] Fix misplaced dtype cast of cos_sin_cache in Llama4VisionRotaryEmbedding ( #25889 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
bb2e04e41e
OffloadingConnector: Fix GPU block tracking bug ( #25856 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
6083b4d926
[Docs] Add moe kernel features doc ( #25297 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
493acdb7e2
[Doc] Improve MM Pooling model documentation ( #25966 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
3c75d3b00c
[Bug] Fix AttributeError: 'QKVParallelLinear' object has no attribute 'orig_dtype' ( #25958 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
206ab1f0df
[bugfix][deepseek] fix flashmla kernel selection ( #25956 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
e33579cd96
[Bugfix] Token type and position embeddings fail to be applied to inputs_embeds ( #25922 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
8c52fccb1a
[Bugfix] Fix accuracy issue of TRTLLM FP8 MOE and improve logging ( #25895 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
ea6144a019
[Bugfix][Model] Fix inference for Hunyuan dense models ( #25354 )
...
Signed-off-by: anion <1005128408@qq.com >
Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
b6ea29b721
Add Hugging Face Inference Endpoints guide to Deployment docs ( #25886 )
...
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
d9f8ded136
[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 ( #25858 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
02776c0386
[Fix] Improve CPU backend compatibility for RISC-V ( #25816 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn >
Signed-off-by: ihb2032 <1355790728@qq.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
8914d52869
[CI] Move applicable tests to CPU ( #24080 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
bf8bb7e250
[NIXL] Add support for MLA caches with different latent dim ( #25902 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
eea2536a35
[perf] Use CPU tensor to reduce GPU->CPU sync ( #25884 )
...
Signed-off-by: Lehua Ding <lehuading@tencent.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
a1898466a6
[Model] Move vision_feature_select_strategy into resolve_visual_encoder_outputs ( #25938 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:57 -07:00
9dce93e07c
[Bugfix][Model]fix ernie45 moe gate&bias dtype to float32 ( #25936 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
c0734fc51a
Updated TRL integration docs ( #25684 )
...
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
034f3a4980
[Doc] Add Cambricon MLU support ( #25942 )
...
Signed-off-by: a120092009 <zhaoty0121@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
0230cd0afb
[New Model] DeepSeek-V3.2 (Rebased to Main) ( #25896 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com >
Signed-off-by: Lucia Fang <fanglu@meta.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com >
Co-authored-by: Lucia Fang <fanglu@meta.com >
Co-authored-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com >
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
da71651386
[Bugfix]: Clean up chunked prefill logging when using whisper ( #25075 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
0da98ff2eb
[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect logical_not ( #25925 )
...
Signed-off-by: zhoukz <me@zhoukz.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
db4a03e2e2
[BugFix] Pass config_format via try_get_generation_config ( #25912 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
e165f980d9
[BugFix] Fix DP/EP hang ( #25906 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
ea7cf8db35
MoveVllmConfig from config/__init__.py to config/vllm.py ( #25271 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
1108ffb3e6
[Benchmark] Support benchmark throughput for external launcher DP ( #25913 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
0c7cc69e29
[Bug] Fix Weight Loading for Block FP8 Cutlass SM90 ( #25909 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
6941d53c0c
Test Prompt Embeds/LoRA compatibility and Enable LoRA Support for OPT Models ( #25717 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
97f1312f8c
[V0 Deprecation] Remove vllm.worker and update according imports ( #25901 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
09b01cd395
[NIXL] Increase default KV block eviction timeout on P ( #25897 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
4deb9c88ca
[Doc] Polish example for torchrun dp ( #25899 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
b7973eabe5
[Kernel] Chunk-aligned mamba2 ( #24683 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
e7203c2338
[Bugfix][ROCm] Fixing trying to import non-existent symbols from libnccl.so ( #25605 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
ae0c35923f
[Doc] Add documentation for vLLM continuous benchmarking and profiling ( #25819 )
...
Signed-off-by: Naman Lalit <nl2688@nyu.edu >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
c692506e10
[BugFix][torch.compile] KV scale calculation issues with FP8 quantization ( #25513 )
...
Signed-off-by: adabeyta <aabeyta@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
9555929e13
[Bugfix] Use correct key "ignore" for config.json non-quantized layers ( #25706 )
...
Signed-off-by: Lee Nau <lnau@nvidia.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
2405817748
[Model] Remove MotifForCausalLM ( #25866 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
616bce15ce
[CI/Build] Include Transformers backend test in nightly transformers test ( #25885 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
c33992154a
[Bugfix][Speculative Decoding] Fix Eagle3 quantization config issue ( #25883 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
f84b2a0dd0
[Nixl][P/D] Add cuda2cpu support (HD->DH transfer) ( #24690 )
...
Signed-off-by: Chenxi Yang <cxyang@fb.com >
Co-authored-by: Chenxi Yang <cxyang@fb.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
9f78b9ca84
[torch.compile] serialize cudagraph_mode as its enum name instead of value ( #25868 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
4e2774f5c3
[Model][Bugfix] Fix issues in MiDashengLM implementation for quantized models ( #25854 )
...
Signed-off-by: zhoukz <me@zhoukz.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
85d4306047
[Bugfix] Fix requirements paths in install instructions ( #25827 )
...
Signed-off-by: yingjun-mou <renzomou@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
770a2cf7ae
update to latest deepgemm for dsv3.2 ( #25871 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
ea55445b8d
[Misc] Remove more get_input_embeddings_v0 ( #25857 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
b765adccd7
[V0 Deprecation][Models] Remove all V0 condition for mm embeddings merge ( #25331 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: isotr0py <2037008807@qq.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
4079a63a86
[Bugfix] Fallback ViT attn backend to SDPA for blackwell ( #25851 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
00eba10dd1
[XPU]Fix xpu spec decoding UTs, avoid using cuda graph ( #25847 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
20d1d0e38b
Add Phi4FlashForCausalLM to _PREVIOUSLY_SUPPORTED_MODELS ( #25832 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
70ba2d1ec9
[P/D] NIXL Updates ( #25844 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: simon-mo <simon.mo@hey.com >
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Chenheli Hua <huachenheli@outlook.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
eb447aff56
[Misc] fix tests failure by using current_platform ( #25825 )
...
Signed-off-by: Juechen Liu <jueliu@meta.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
cf0a7912ca
Remove redundant cudagraph dispatcher warning ( #25841 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
0b343e3218
[Bugfix] fix Qwen3VLMoe load when pp > 1 ( #25838 )
...
Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com >
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
e40c12696a
Update GLM-4.5 Doc transformers version ( #25830 )
...
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
02ab3860a6
Fix random dataset mismatched token length with config. ( #24937 )
...
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
6dee906d2c
[VLM] Update Qwen3-VL max_num_video_tokens calculation for configurable video profiling ( #25557 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
495f368238
[Bugfix] Fix Qwen3-VL regression from #24982 ( #25814 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
02e87f1893
[MM] Optimize memory profiling for scattered multimodal embeddings ( #25810 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
32cb65b2b6
[Bugfix][NIXL] Fix Async Scheduler timeout issue ( #25808 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
04384cb9da
[Core] GC Debug callback ( #24829 )
...
Signed-off-by: Jialin Ouyang <jialino@meta.com >
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Co-authored-by: Jialin Ouyang <jialino@meta.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
942fba3823
[Bug]: Set LD_LIBRARY_PATH to include the 'standard' CUDA location ( #25766 )
...
Signed-off-by: Clayton Coleman <smarterclayton@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
d8fc00d623
[torch.compile]: Add VLLM_DEBUG_DUMP_PATH environment variable ( #25651 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
7b28ef2bc1
[Core] Refactor self.model() to call a helper for subclassing. ( #25084 )
...
Signed-off-by: Patrick Toulme <ptoulme@meta.com >
Signed-off-by: Patrick Toulme <pctoulme+1@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
9b4c752106
[env] default nixl side port conflicts with kv-event zmq port ( #25056 )
...
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
7d92e508b4
[docs] transcriptions API audio upload ( #25446 )
...
Signed-off-by: zxw <1020938856@qq.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
e94aabe03d
[Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models ( #24982 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
1e5e5d757e
[Bugfix] Fix triton import precommit failure ( #25803 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
c7ae7edb33
Fix GPTQ model loading in Transformers backend ( #25770 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
1cb6005627
Add filtering for chat template kwargs ( #25794 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
3e7f33c801
Validate API tokens in constant time ( #25781 )
...
Signed-off-by: rentianyue-jk <rentianyue-jk@360shuke.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: rentianyue-jk <rentianyue-jk@360shuke.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
0b8166aa8f
[Bugfix] Merge MM embeddings by index instead of token IDs ( #16229 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
6970fa9937
[Bugfix] Add missing image_size for phi4_multimodal ( #25796 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
d7cf378359
[Misc] Update openai client example file for multimodal ( #25795 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
1171480d88
[Misc] Fix codeowners override for v1 sample and attention ( #25037 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
0f97a2e1db
[CI/Build] Reorganize root-level V1 tests ( #25767 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
a8913725a1
[CI/Build] Add timing to Model Executor Test ( #25799 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
0a4674c871
[CI/Build] Consolidate model loader tests and requirements ( #25765 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
1a893d188c
[Bugfix] Allow Only SDPA Backend for ViT on B200 for Qwen3-VL ( #25788 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
38c2df831a
[Multimodal][Speculative Decoding]Eagle Eagle3 mm support, enablement on qwen2.5vl ( #22872 )
...
Signed-off-by: Junhong <liujunhong11@huawei.com >
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com >
Co-authored-by: Junhong <liujunhong11@huawei.com >
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
55971f85c9
Add flashinfer-build.sh and register precompiled cu128 wheel in Dockerfile ( #25782 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
dbb7782d5b
Add option to restrict media domains ( #25783 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
806b292c0e
[Core] Don't count preempted tokens in prefix cache hit rate ( #25787 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
93ba7648d0
[Spec decode] automatically disable mm for text-only draft models ( #25667 )
...
Signed-off-by: Jonas Kuebler <kuebj@amazon.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
e7cba8f6b1
[Bugfix] Optimize CpuGpuBuffer initialization ( #25447 )
...
Signed-off-by: Naman Lalit <nl2688@nyu.edu >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
c4b9864e22
Kernel-override Determinism [1/n] ( #25603 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
dbdea93f46
Reduce the Cuda Graph memory footprint when running with DBO ( #25779 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
1356ae0aa8
[spec decode] Consolidate speculative decode method name for MTP ( #25232 )
...
Signed-off-by: zixi-qi <qizixi@meta.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
dc191cc5d9
[CI] Fix FlashInfer AOT in release docker image ( #25730 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
ceb346015c
[V1] address post issues related to #20059 (part 1) ( #23046 )
...
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
b6f16d37b0
[CI] Add E2E Blackwell Quantized MoE Test ( #25723 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
5157781987
[Docs] Add Toronto Meetup ( #25773 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
f16c440c9f
[Bugfix] Improve GLM4 MoE Reasoning Parser's is_reasoning_end Condition ( #25355 )
...
Signed-off-by: frankwang28 <frank.wbb@hotmail.com >
Signed-off-by: Frank Wang <41319051+frankwang28@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:56 -07:00
8c1b61bd77
[Doc]: improve CPU(x86) build-wheel-from-source section ( #25617 )
...
Signed-off-by: Kosseila (CloudThrill) <klouddude@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
e0175fbf01
Eagle3 that supports the Minicpm3 model ( #24243 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: liudan <adan@minicpm.com >
Co-authored-by: liudan <liudan@qq.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
c72298213d
[Misc] fix unique_filepath ( #25732 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
41174e2803
[ray][metrics] Replace ':' with '_' for OpenTelemetry compatibility in Ray ( #25439 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com >
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
6ca8d9753c
[BugFix] Fix using dbo_decode_token_threshold always (and ignoring dbo_prefill_token_threshold) ( #25622 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
d70c154975
[Quantization] Add field to skip unquantized modules for GPTQ config ( #25455 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
129a643b4c
[CI/Build] Fix some V1 tests not being run ( #25569 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
d3c732e985
[CI/Build] Split up Distributed Tests ( #25572 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
fb0eece290
[Bugfix] Properly abort pooling request. ( #25734 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
515e30b023
[CI] Fix test_shared_storage_connector_hashes ( #25748 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
62ae26c870
[Model] Mamba2 varlen refactor ( #21467 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
87ee8535a6
[Doc] Update Batch-level DP docs ( #25757 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
ced693e845
Support LongCat-Flash-Chat tool call ( #24083 )
...
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
fa55373af1
[Bugfix] Fix Shared Expert/Zero expert code in FusedMoE.process_chunk ( #25698 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
c761b84d5f
[misc] refactor speculative config ( #25657 )
...
Signed-off-by: zxw <1020938856@qq.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
bc37468b3c
Remove cuda hard-code in compute_causal_conv1d_metadata ( #25555 )
...
Signed-off-by: Icey <1790571317@qq.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
067fe8b10e
[Qwen3-Next][GDN] fixes cuda graph capturing bug in GDN metadata and a stride bug in causal_conv_1d. ( #25743 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
0aea9348cc
fix: print outputt offline_inference/base/chat.py example ( #25744 )
...
Signed-off-by: Iceber Gu <caiwei95@hotmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
79586c5449
[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI300X ( #25703 )
...
Signed-off-by: xaguilar <Xavier.AguilarFruto@amd.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
b2d5d42337
perf: Avoid copying inputs_embeds tensors to GPU unless prompt_embeds is enabled ( #25739 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
74ea69f413
fix: revert cast to cpu in MsgpackEncoder._encode_tensor to avoid hidden performance regressions ( #25738 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
e82e3b55f6
[CI/Build] fix doc build warning: Failed to get 'name: description' pair ( #25733 )
...
Signed-off-by: yiting.jiang <yiting.jiang@daocloud.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
9e6628ccfc
EVS Support (Video tokens pruning) ( #22980 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Eugene Khvedchenya <ekhvedchenya@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
6ada221271
[Misc] Remove unnecessary memoryviews in shm_broadcast.py ( #25721 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
ef160aa08e
[Core] Force PIECEWISE CUDAGraph mode for encoder-decoder ( #25701 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
c064c82674
Llamas 3.1 405B fp4 changes upstreaming from 355_wip ( #25135 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Doug Lehr <douglehr@amd.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
6f97de4e47
[Misc] Don't log shm dequeue delay warning on worker side ( #25720 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
3a32aa8a6b
[Refactor] Remove DeepGEMM OP Register ( #25710 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
1d21080118
Fix routing_bias dtype ( #25711 )
...
Signed-off-by: Shu Wang. <shuw@nvidia.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
1d1436c3f7
[Model] rename NemotronH_Nano_VL -> NemotronH_Nano_VL_V2 ( #25708 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
37d836081a
[Core] Enable command line logging for LLMEngine ( #25610 )
...
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
f3a478b55e
[Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. ( #24986 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
b558c3a8b7
[Optimization] Use a cheaper cache key in get_model_architecture ( #25682 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
745b204ddc
[Optimization] Streamline InputPreprocessor ( #25702 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
b0e9f04bbd
[Misc] Simplify test_argsort_mm_positions ( #25690 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
80385959af
[V0 deprecation] Clean up LoRA ( #25686 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
a355561291
[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names ( #25489 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
9659b7e78f
[V0 deprecation] Clean up V0 fallback in compilation config ( #25675 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
34e6a31e40
[Model] Define merge_by_field_config MM interface ( #25676 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
c7ca3c5d2f
[Model] Add optional parameter to reasoning parser constructor ( #25554 )
...
Signed-off-by: taohui <taohui3@gmail.com >
Signed-off-by: Tao Hui <taohui3@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
fe6357a780
[BugFix] Fix DBO hang ( #25625 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
0cee734ab4
Revert "[Bug] Dynamo Unsupported due to BasevLLMParameter.torch_function calling disabled super()" ( #25681 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
252a0ff8c3
[BUGFIX] Fix crash in Eagle Speculative Decoding models when exceedin… ( #24662 )
...
Signed-off-by: AlonKejzman <alonkeizman@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
2655d7ab83
[Logging] Remove TORCH_NCCL_AVOID_RECORD_STREAMS to squash a warning ( #25532 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
91d4299774
[Misc] Remove cruft file in repo ( #25678 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
f7f76a8668
[Bugfix] Fix InternS1 video processing after Transformers v4.56 ( #25644 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
054c8b526f
[ux] Switch a warning to debug about a pytorch fallback ( #23750 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
2469b8291b
[CPU] update torch 2.8 and fix missing fields in TorchSDPAMetadata ( #25652 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
18c20257bf
[torch.compile] Make Query Quantization Fusable ( #24914 )
...
Signed-off-by: Jonas Kuebler <kuebj@amazon.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
a5fa821b96
[misc] log info messages by default for hanging / busy / idle ( #25627 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
af10a37c6c
[mypy] Fix wrong type annotations related to tuple ( #25660 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
a88371f84e
[Hardware][RISC-V] Add riscv64 support for vLLM with scalar ( #22112 )
...
Signed-off-by: chenlang <chen.lang5@zte.com.cn >
Co-authored-by: chenlang <10346245@zte.com.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
d7f6489f50
[XPU][Triton]add xpu config in triton_reshape_and_cache_flash ( #25643 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
222411313d
[CI/Build] Fix flaky entrypoints test ( #25663 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
22114ffebb
Add backward compatibility for guided_... API ( #25615 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
f3d9099b44
[V0 deprecation] Remove unreachable model_config.supported_tasks ( #25642 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
3d940e2c3f
[Bugfix] Parse SpeculativeConfig Error ( #25142 )
...
Signed-off-by: zxw <1020938856@qq.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
686cfd91e3
[mypy] Further improve MM type annotations ( #25654 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
f17d37b006
[Bugfix] Fix Qwen3-VL max_num_video_tokens calculation for video profiling ( #25648 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
034c0152db
[Bugfix] Add triton.language.tensor placeholder ( #25649 )
...
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
fd28c58825
[Misc] Fix Qwen3-VL video_grid_thw typing ( #25646 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
5e16b8c552
[fix] Update torch version in cpu-build.txt for AArch64/ppc64le and Darwin ( #25579 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
6c6e553644
Revert "[Performance] Move apply_w8a8_block_fp8_linear to an op class… ( #25607 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
6a437a4178
typo: remove duplicate is ( #25641 )
...
Signed-off-by: nicole-lihui <nicole.li@daocloud.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
004eed39ff
Map CwmForCausalLM to llama and LlamaForCausalLM ( #25611 )
...
Signed-off-by: Jacob Kahn <jacobkahn1@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
8b17d2554c
[Misc] Simplify PoolerOutput and move to v1/outputs ( #25629 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
94b78f576c
[Bugfix] fix apply_temperature to avoid nan in probs ( #24734 )
...
Signed-off-by: courage17340 <courage17340@163.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
d8ffa3c5f4
optimize: eliminate duplicate split_enc_dec_inputs calls ( #25573 )
...
Signed-off-by: nicole-lihui <nicole.li@daocloud.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
c26e7b14d7
[Model] Add LongCat-Flash ( #23991 )
...
Signed-off-by: yangxurui <yangxurui@meituan.com >
Co-authored-by: yangxurui <yangxurui@meituan.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
12c21d28c1
Enable Fbgemm NVFP4 on Dense models ( #25609 )
...
Signed-off-by: Saman Keon <samanamp@outlook.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
517a857166
[Bug] Dynamo Unsupported due to BasevLLMParameter.torch_function calling disabled super() ( #25613 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
b839194931
[Kernel] Support DCP for Triton backend ( #25132 )
...
Signed-off-by: Wei Wei <wwei6@meta.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
1d6f767dc4
[Model] Improve DotsOCRForCausalLM ( #25466 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
b95429c920
[MISC] replace c10::optional with std::optional ( #25602 )
...
Signed-off-by: Shiyan Deng <dsy842974287@meta.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
7319686692
Improve --help for enhanced user experience ( #24903 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
b3fd4ed80c
[Refactor] Use DeepGEMM Col Major TMA Aligned Tensor ( #25517 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
461aa1463b
feat: BF16 FlashInfer Fused Cutlass MOE for Hopper and Blackwell Expert Parallel ( #25503 )
...
Signed-off-by: Duncan Moss <djm.moss@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
b4a80dad98
[Logging] Improve log for when DeepEP HT disables CUDA Graphs ( #25531 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
61a6443bc3
[V0 Deprecation] Remove unused classes in attention ( #25541 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
c8071faa5d
fix compile error
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
46ed215d6b
[Docs] Enable fail_on_warning for the docs build in CI ( #25580 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
0e0d51c9c6
Suppress benign cuBLAS warning when capturing cudagraphs with DBO ( #25596 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
72a5101c7a
Support mnnvl all2allv from Flashinfer ( #21003 )
...
Signed-off-by: Shu Wang <shuw@nvidia.com >
Signed-off-by: Shu Wang. <shuw@nvidia.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
7d9f44ad2a
[Bugfix] add cache model when from object storage get model ( #24764 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
984bfb4ba7
Fixes and updates to bench_per_token_quant_fp8 ( #25591 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
b1f9a1f46a
[ROCm][Build][Bugfix] Fix ROCm base docker whls installation order ( #25415 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
3331ced61b
[ROCm][Bugfix] Only enable +rms_norm based on aiter if not explicitly disabled ( #25275 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
b614e0f82b
[Misc] Improve type annotations for jsontree ( #25577 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
44d6701f70
Move DeviceConfig, ObservabilityConfig, SpeechToTextConfig to their own files ( #25564 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
71566e8afc
[Bugfix] Fix DeepSeekV31ToolParser to correctly parse multiple tools in non-streaming output ( #25405 )
...
Signed-off-by: taohui <taohui3@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
88d8c72d5f
[docs] fix nixl kv_connector_extra_config.backends key ( #25565 )
...
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io >
Signed-off-by: Peter Pan <peter.pan@daocloud.io >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:55 -07:00
0cb913b0a2
[Benchmark] Fix regression in structured output benchmark ( #25500 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
f98d4d38c0
[Bug] fix import and unit test ( #25558 )
...
Signed-off-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
d5c0f43b86
[Bugfix] Fix dummy video number of frames calculation ( #25553 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
54174c67f8
[misc] update the warning message ( #25566 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
d1e2d17b57
[BugFix] Potential Fix for FA3 full-cudagraph IMA ( #25490 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
9914857f2b
[V0 Deprecation] Remove max_seq_len_to_capture ( #25543 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
7441d07360
[CI/Build] add nightly prime-rl integration tests ( #25207 )
...
Signed-off-by: Jackmin801 <ongjackm@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
4ca175ea0b
[Misc]] Move processing context to multimodal directory ( #25548 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
c39befcead
[CI/Build] Fix v1 OOT registration test ( #25547 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
c8ef8a50d2
[Bugfix][CPU] Skip unsupported custom op register on CPU ( #25534 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
fc90ce79f0
[Misc] Retry HF processing if "Already borrowed" error occurs ( #25535 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
5b4ba2e1e1
[TPU][Bugfix] fix the missing apply_model in tpu worker ( #25526 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
d7fb5a4ae8
[Bugfix] [Frontend] Cleanup gpt-oss non-streaming chat tool calls ( #25514 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
f52b991db6
[Perf] Fix jit compiles at runtime of fla gated delta rule ( #25432 )
...
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
177c37e960
[Spec Decode] Enable FlashInfer Spec Decoding ( #25196 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
Co-authored-by: lhsjohn <huashuoli@tencent.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
0e54bbe108
[KV sharing] Re-land Gemma3n model changes from #22628 ( #24357 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
6b87ce2ecd
[fix]: add Arm 4bit fused moe support ( #23809 )
...
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
a986f17028
[BugFix] Fix MLA assert with CUTLASS MLA ( #25478 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
faa58fa791
[Compile] Fix AMD Compile Error ( #25518 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
4ed6b67da3
[Core] Support weight_loader_v2 for UnquantizedLinearMethod ( #23036 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
cb825af948
[Bugfix] Use a separate FlashInfer workspace buffer for trtllm-gen ( #25520 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
342d17fb7f
[V1][Metrics] Add per-request TPOT histogram ( #24015 )
...
Signed-off-by: baxingpiaochong <771405853@qq.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
3c62d28bb9
[Model] Support SeedOss Reason Parser ( #24263 )
...
Signed-off-by: Yan Lu <luyan@nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
9596fbd6e5
[BUG] Allows for RunAI Streamer and Torch.compile cache to be used together ( #24922 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
03585bc79d
[Bug] Fix AttributeError: 'FusedMoE' object has no attribute 'w13_weight_scale'. Did you mean: 'w13_weight_scale_inv' ( #25519 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
770cb2e1f8
Add CUTLASS FP8 MOE benchmark scripts and kernel config ( #25302 )
...
Signed-off-by: Chenxi Yang <cxyang@fb.com >
Co-authored-by: Chenxi Yang <cxyang@fb.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
b50fa00537
Improve output when failing json.loads() on structured output test ( #25483 )
...
Signed-off-by: dougbtv <dosmith@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
8e6a5e7dd4
[BugFix] AssertionError: Do not capture num_reqs > max_num_reqs for uniform batch ( #25505 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
faae7a7eab
[Bugfix] [B200] cutlass_mla - ensure kv_split == 1 for batch size > 1 ( #25509 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
d562c2ea09
[Perf] Increase default max splits for FA3 full cudagraphs ( #25495 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
81ee45298d
[ROCm] Small functional changes for gptoss ( #25201 )
...
Signed-off-by: jpvillam <jpvillam@amd.com >
Co-authored-by: jpvillam <jpvillam@amd.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
d12433adfc
[Kernel] [Mamba] Remove BLOCK_H=1 from list of tuneable configurations for _chunk_cumsum_fwd_kernel ( #25197 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Chih-Chieh-Yang <chih.chieh.yang@ibm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
4ebc513fc1
Add VLLM_NVTX_SCOPES_FOR_PROFILING=1 to enable nvtx.annotate scopes ( #25501 )
...
Signed-off-by: Corey Lowman <clowman1993@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
7a8f0a3548
[BugFix] Fix OOM in vLLM replicas by ensuring consistent NCCL memory accounting ( #25359 )
...
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
907bbca7b7
Remove redundant mutates_args and dispatch_key for direct_register_custom_op ( #25512 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
eb1f43bc82
[gpt-oss][bugfix] remove logic to require resp_ in ResponseAPI ( #25428 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
99eaeebe66
Fix triton_reshape_and_cache_flash.py triton import ( #25522 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
715e24e1b3
Add VLLM_ENABLE_INDUCTOR_MAX_AUTOTUNE & VLLM_ENABLE_INDUCTOR_COORDINA… ( #25493 )
...
Signed-off-by: rouchenzi <ruochenwen@gmail.com >
Signed-off-by: rouchenzi <40842833+rouchenzi@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
cf0e250200
[V0 Deprecation] Remove placeholder attn ( #25510 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
0c11617ff1
[Core] Use KVCacheBlock as much as possible instead of dict[block_id, KVCacheBlock] ( #24830 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
930e691c65
[CI/Build] Fix and re-enable v1 PP test on CI ( #25496 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
c0f11557e1
[Bugfix] Fix for the import error from #24588 ( #25481 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
0438c65376
[Build] Update Xgrammar to 0.1.25 ( #25467 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
d8fda7420a
[Bugfix] gpt-oss container tool output bug ( #25485 )
...
Signed-off-by: Alec Solder <alecs@fb.com >
Co-authored-by: Alec Solder <alecs@fb.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
86e5b73d71
[CI] Fix Pre-commit Issue ( #25497 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
e49561cd91
Enable symmetric memory all reduce by default only enabling for TP ( #25070 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
0e30643147
[Bugfix] Lower gpt-oss max cudagraph size to 992 to be compatible with FA3 ( #25508 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
8ba3b17cc1
[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue ( #25406 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
8222e2651d
[Perf] Change default CUDAGraphMode from PIECEWISE to FULL_AND_PIECEWISE ( #25444 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
b672b8c3b8
[Performance] Move apply_w8a8_block_fp8_linear to an op class ( #24666 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
56201cfb01
[core] add nccl symmetric memory for all reduce ( #24532 )
...
Signed-off-by: Amir Samani <asamani@nvidia.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
9689be1e8e
[ROCm] Add skinny gemm bias support for dtypes fp16,bf16,fp8 ( #24988 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
Signed-off-by: Hashem Hashemi <159079214+amd-hhashemi@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
65c4513ad8
[Core] Ensure LoRA linear respect the base_layer's tp_size and tp_rank ( #25487 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
5acda4cc71
[Spec Decode][CI] Add e2e test for examples/spec_decode.py and prevent breaking Acceptance Length ( #24531 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
78f892c373
[Misc] Reduce initialization time of auto_tune ( #23682 )
...
Signed-off-by: Weida Hong <wdhongtw@google.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
26da2c6244
[V1][Kernel] Add triton implementation for reshape_and_cache_flash ( #24503 )
...
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com >
Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
0081c6956a
Use macro guard CUDA functions for back compatibility in grouped_topk_kernel.cu ( #25346 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
Co-authored-by: Rahul Tuli <rtuli@redhat.com >
Co-authored-by: Claude <noreply@anthropic.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
6462feef65
[Log] Optimize kv cache memory log from Bytes to GiB ( #25204 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
e9a74500e5
[BugFix] Fix UB in per_token_group_quant.cu ( #24913 )
...
Signed-off-by: Shreeasish Kumar <shreeasish@rivosinc.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
02a3ce2230
[Kernels] Support blocked fp8 quantization for compressed tensors MoE ( #25219 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
9cae377a16
Add backward compatibility for GuidedDecodingParams ( #25422 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
8c5c35c027
[Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support ( #24845 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
f97da2c732
[V1] Remove V0 code paths for Hybrid models ( #25400 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
02134245a9
[UX] Change kv-cache-memory log level to debug ( #25479 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
2ab27b70f5
[XPU] Fix MOE DP accuracy issue on XPU ( #25465 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
a500f7cc09
[Docs] NixlConnector quickstart guide ( #24249 )
...
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io >
Signed-off-by: Peter Pan <peter.pan@daocloud.io >
Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
1b75f784b8
[P/D] Support NIXL connector to disconnect during a clean shutdown ( #24423 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
0eddd2b528
[BugFix] Register expert_map as named buffer for wake_up and sleep ( #25458 )
...
Signed-off-by: wuxibin <wuxibin@bytedance.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
030774abcf
[CI/Build] Fix disabled v1 attention backend selection test ( #25471 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
77389d87b2
[docs] Benchmark Serving Incorrect Arg ( #25474 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
59659b74c4
[Core] Optimize LoRA weight loading ( #25403 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
3b96eafdb0
[Bugfix] Fix idefics3 tie_word_embeddings ( #25454 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
fb64e67533
[Test]: Hermes tool parser stream output error in Qwen3 case ( #25203 )
...
Signed-off-by: Andreas Hartel <andreas.hartel@aleph-alpha.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
215da8510d
[Misc] Move DP for ViT code inside model executor dir ( #25459 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
c4a15ee240
[Frontend] Add a new xml-based tool parser for qwen3-coder ( #25028 )
...
Signed-off-by: Zhikaiiii <1658973216@qq.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
3a640b8f74
Handle triton kernel import exception ( #25319 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
0a1397c7df
[Model] Enable DP for ViT in Qwen2-VL ( #25445 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
921945c81e
[NIXL][OOT platform] support nixl_connector with oot platform and other nixl_backend ( #25121 )
...
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
675fc471bf
[DP/EP][GPTOSS] Use triton matmul-ogs kernels for GPTOSS DP/EP ( #24588 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
b0ae0ad935
[Docs] Fix griffe warnings in vllm/lora/ops ( #25369 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
e99b286f01
[Bugfix] Remove contiguous output req for context parallel MLA ( #25414 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
23a7805022
[benchmarks]allow skip ready check for bench serve ( #25420 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
e3a3c738b0
[XPU] Fix compile_size is None case. ( #25433 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
e41946ecdb
[feat] Support MRoPE + YaRN ( #25384 )
...
Signed-off-by: liuye.hj <liuye.hj@alibaba-inc.com >
Co-authored-by: liuye.hj <liuye.hj@alibaba-inc.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
f071a31ede
[Bug] Fix Long Context OOM Issue ( #25290 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
1b30043f0d
[V0 deprecation] Remove _set_default_args_v0 function ( #25409 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
a0b5617263
[V0 deprecation] Remove platform v1 controling interface ( #25410 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
e6c22d2b2f
[Perf] Apply torch.compile for per_block_cast_to_fp8 ( #24611 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
dbb029cfe1
[Performance] Remove input pads in cutlass_mla and optimize v_proj output handling ( #25184 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
25dd155e60
[BugFix] [DP/EP] Fix slow execution when BS <= DP ( #25407 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Chris Bamford <chrisbam4d@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
864bbe36f0
[Bugfix] Fix missing clear_connector_metadata ( #25397 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
e97cf2e32b
[Core] Drop overly aggressive whisper assertion ( #25408 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
d96a3fc653
[Bugfix] fix custom op test ( #25429 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:54 -07:00
aac85cc6d6
[Frontend] Responses API MCP tools for built in tools and to pass through headers ( #24628 )
...
Signed-off-by: Alec Solder <alecs@fb.com >
Signed-off-by: Alec S <10566873+alecsolder@users.noreply.github.com >
Co-authored-by: Alec Solder <alecs@fb.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
f1e3d031e4
[TPU] update torch_xla dependency for PyPI compatibility ( #25278 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com >
Co-authored-by: Chengji Yao <chengjiyao@google.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
6e9229e919
[CI/Build] Skip Qwen3-VL initialization tests until models are actually released ( #25394 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
ff54b6bfe3
[KV offload][5/N] Add CPUOffloadingSpec ( #24251 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
6dbbecd5b2
[torch.compile] Cleanup compilation tests and custom passes, add debug utils, fix DCE bug ( #23091 ), fix test ( #24376 ), and prep for custom op matching ( #24604 ) ( #24542 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: luka <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
6850bfe15c
[misc] Remove RFC review hours reference ( #25416 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
d988b84e8e
[DP] support torchrun external launcher with Data Parallelism ( #24899 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
7337ec6c9f
[CI Failure] Fix fp8 kv cache on <SM90 ( #25396 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
90ba32a0bf
[Compiler] Disable Inductor standalone compile by default ( #25391 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
2a8bd2b93b
[CLI env var] Add VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH in env variables ( #25274 )
...
Signed-off-by: qqma <qqma@amazon.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: qqma <qqma@amazon.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
3968ae72ed
[EPLB] Reduce EPLB Inference Overhead ( #24573 )
...
Signed-off-by: Bowen Wang <abmfy@icloud.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
e55ffe3595
[V1][Attention] Split triton_attn in triton-only and rocm specific backends ( #24648 )
...
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
4057e2b162
[Bugfix] Fix several issues with p2p xPyD in GET type ( #23993 )
...
Signed-off-by: Csrayz <jover@cmbchina.com >
Signed-off-by: ivyilike <pww123@cmbchina.com >
Co-authored-by: ivyilike <pww123@cmbchina.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
cc494282a9
[Kernel] MI-300X triton moe configs ( #23445 )
...
Signed-off-by: Sara Kokkila Schumacher <saraks@ibm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
44be2b7349
Make mypy behave like a proper pre-commit hook ( #25313 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
104e62fbc8
Make pickle import check fast ( #25379 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
ddf4e1f56f
[Misc] Remove unused encoder-decoder error strings ( #25374 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
cbba9bd0b0
refactor: abstract graph mode support into platform interface ( #25161 )
...
Signed-off-by: Yizhou Liu <liu_yizhou@outlook.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
4bc6b5d2c3
[TPU] Deprecate xm.mark_step in favor of `torch_xla.sync ( #25254 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
8d8de42790
[TPU][Bugfix][CI] Fix broken tests/build dependency ( #25255 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
ef85a438da
Enable Eagle3 speculative decoding for GPT-OSS model ( #25246 )
...
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
2f237d3df4
[V0 Deprecation] Remove MultiModalPlaceholderMap ( #25366 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
243c358fa8
[V0 Deprecation] Remove V0-only methods in multi-modal registry ( #25362 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
1b3aa0f297
[Bugfix] Fix hermes tool parser handling of non-string argument types ( #22002 )
...
Signed-off-by: wangzi <3220100013@zju.edu.cn >
Signed-off-by: David Chen <530634352@qq.com >
Co-authored-by: wangzi <3220100013@zju.edu.cn >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
dba6db9937
[Docs] GSM8K Accuracy Evaluation doc update ( #25360 )
...
Signed-off-by: David Chen <530634352@qq.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
5322390f1d
[Model] Support Dots OCR ( #24645 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: yinz-aizip <yinz@aizip.ai >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
5f6a36054a
Multimodal - audio tests ( #25285 )
...
Signed-off-by: Debolina Roy <debroy@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
e348e1027c
[Bugfix][V0 Deprecation][CI] use async mock and await for async method ( #25325 )
...
Signed-off-by: Yang <lymailforjob@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
a815d820ee
Remove V0 attention backends ( #25351 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
319966a678
[Perf] Further optimization for Qwen3-VL fast_pos_embed_interpolate ( #25347 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
b81364a7cd
[V0 Deprecation] Remove V0 sampling metadata ( #25345 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
791089df20
feat: Enable engine-level arguments with speculators models ( #25250 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
Co-authored-by: Claude <noreply@anthropic.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
71f2b5ddea
[V0 Deprecation] Remove async_output_proc, preemption mode, delay factor ( #25334 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
81e17a1e26
[V0 Deprecation] Remove V0 Sequence class & Sampler ( #25332 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
ed84bda7a5
fix cub helpers
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
c7b1c0cf8b
fix cub_helpers
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
a31d353b71
[Optimization] Cache chat template result when processor fails to be loaded ( #25341 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
80cad257da
[Bugfix] Typos in error message for missing model config file ( #25339 )
...
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
5fd95c77af
[MM][Perf] Minor Optimization on Qwen3-VL fast_pos_embed_interpolate ( #25337 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
f6278e3065
[V1] Add sliding window support to Flex Attention backend ( #24089 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
9e9b3b4ff9
[V0 Deprecation] Remove V0 MP executor ( #25329 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
20235c1822
[V0 Deprecation] Remove from_seq_group methods ( #25330 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
059a13a3bc
[Multi Modal][Performance] Fused Q,K's apply_rope in more models ( #25005 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
a6cf307fa8
[V0 Deprecation] Remove V0 model runner base & simplify worker base ( #25328 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
b18dde7478
[Doc] improve test-pipeline.yaml documentation ( #25305 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
7cdd90211b
[V0 Deprecation] Remove V0 core ( #25321 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
86fdd686be
[CI] Skip tests failing on main ( #25326 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
171592330b
[Chore] Remove unused sampler in models ( #25324 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
4bb2eb42d4
[V0 Deprecation] Remove V0 Output Processor ( #25320 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
32d43a5a9e
[V0 Deprecation] Remove LLMEngine ( #25033 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
d9ba479eee
[Docs] Fix warnings in vllm/profiler and vllm/transformers_utils ( #25220 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
9cfa7697c1
[V0 Deprecation] Enable the remaining multimodal tests in V1 ( #25307 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
9fc86d2802
[Core] Enable sharded state loader for V1 engine and enhance test coverage ( #25308 )
...
Signed-off-by: pengdrumli <pengdrumli@tencent.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
bc76128565
[Model] Cleanup InternViT's data parallel implementation ( #25306 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
af4dedf6d3
Generate _ModelInfo properties file when loading to improve loading speed ( #23558 )
...
Signed-off-by: Manoel Marques <manoel.marques@ibm.com >
Signed-off-by: Manoel Marques <manoelmrqs@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
dad5f4d16d
[Docs] Fix warnings in mkdocs build (continued) ( #25042 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
c2fdc71c91
[CI Failure] Disable FlashInfer RoPE to unblock CI ( #25299 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
e33af1e0c2
[V1] Support LLM.apply_model ( #18465 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
0ac65d171b
[Bugfix] Fix Qwen3-VL-MoE weight loading for EP ( #25300 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
267b4421b7
[Hybrid Allocator] Support full attention with different hidden size ( #25101 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
8f3edbd93f
[Optimization] Avoid repeated model architecture conversion for pooling models ( #25261 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
239aef5c9f
[Bugfix] fix tool call arguments is empty ( #25223 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Co-authored-by: xin.li <xin.li@daocloud.io >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
9d70c103aa
[BUG FIX][NON-CUDA]quick fix to avoid call cudagraph_unsafe in attention ( #25298 )
...
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
d897924b45
[BugFix] Exclude self when checking for port collision ( #25286 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
b7c986673d
[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (AutoGPTQ and AutoRound-GPTQ) ( #25268 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
14e1e9b09a
Improve weight loading for encoder models in Transformers backend ( #25289 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
ea01b17b6f
[Misc] Support more collective_rpc return types ( #25294 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
123e7ad492
[BugFix] Ensure appropriate guards in destructors ( #25284 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
ce65ce2d61
[torch.compile] CUDAGraph Inductor partition integration ( #24281 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Boyuan Feng <fby.1994@gmail.com >
Signed-off-by: boyuanfeng <boyuan@meta.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
d4006bd84d
[docs] Prompt Embedding feature support ( #25288 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
7493472a9b
test: Remove vestigial skip for prompt embeds tests after landing v1 Prompt Embeds support ( #25291 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
937ab7e85e
Don't skip special tokens with hermes-style tool calling ( #25281 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
bc997c18ca
[Bugfix] Remove VLLM_TEST_DYNAMO_FULLGRAPH_CAPTURE #2969 ( #25090 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
d55c6010ac
[BugFix] Fix async scheduling CPU tensor race take 2 ( #25279 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
5051270200
allow disable flashinfer prefill ( #25276 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
6e94161f94
Enable modelopt gemma3 nvfp4/fp8, make workflow more robust ( #22771 )
...
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
e54a476058
[Compile] Fix Compile Warning for Ignoring MIN_BLOCK_PER_SM ( #25193 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
8da7b98366
[Frontend] Responses API messages out, just harmony for now ( #24985 )
...
Signed-off-by: Alec Solder <alecs@fb.com >
Co-authored-by: Alec Solder <alecs@fb.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
9da51c77a9
Fix: Correct FusedMoE layer reference in auto_round quantization ( #24818 )
...
Signed-off-by: David-Wen <18927700430@163.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
d0a1364188
[BugFix] Make FlashInferMetadataBuilder non-blocking ( #25040 )
...
Signed-off-by: Julien Lin <jullin@nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
2c3ba7362f
[Perf] Use FlashInfer RoPE for RotaryEmbedding.forward_cuda when available ( #21126 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:53 -07:00
bfd32678e6
Specify platform in pip-compile pre-commit hook so it runs on MacOS ( #25273 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:52 -07:00
e29f599d30
[Bugfix] Fix chunked a2_scales in modular kernels ( #25264 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:52 -07:00
b6724e95f8
[Bugfix] GPT OSS Attritbute error on H100 ( #25228 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:52 -07:00
17b9f3a83d
Optimize triton unified attention performance for sliding window attention ( #24390 )
...
Signed-off-by: zixi-qi <qizixi@meta.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:52 -07:00
378c68bead
[KV offload][4/N] Offloading KV connector ( #22595 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:52 -07:00
67f0418b1d
[bugfix] fix structured outputs key missing issue from #24929 ( #25195 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:52 -07:00
779ed75310
[Docs] add __init__.py to vllm/model_executor/layers/quantization/compressed_tensors/transform ( #24974 )
...
Signed-off-by: samzong <samzong.lu@gmail.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:52 -07:00
abb448b457
Update vllm/model_executor/layers/quantization/kernels/scaled_mm/cutlass.py
...
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:52 -07:00
ae36150ec2
test
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-03 13:35:52 -07:00