|
|
d628fa1e56
|
[BUG] Reorder model config creation (#26124)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:58 -07:00 |
|
|
|
6b12b2ee38
|
FusedMoE support for the Transformers backend (#22650)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:58 -07:00 |
|
|
|
bbeace233b
|
[Model] Use merge_by_field_config for MM models (G) (#26117)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:58 -07:00 |
|
|
|
09b1a5676d
|
[Bugfix] Fix import gemm_afp4wfp4 failure on AMD (#26068)
Signed-off-by: zhewenli <zhewenli@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:58 -07:00 |
|
|
|
f35f896e3a
|
[ROCm] [VL] [Bugfix] Fix vit flash attn dispatcher logic for ROCm (#26104)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:58 -07:00 |
|
|
|
218349d760
|
[Build/CI] Revert back to Ubuntu 20.04, install python 3.12 with uv (#26103)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:58 -07:00 |
|
|
|
79b2fe7f19
|
[gpt-oss] disable tool server initialization if no tool in request (#25790)
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:58 -07:00 |
|
|
|
56d0073f2a
|
[Bug]: Limit num_reqs in dummy_run when max_num_seqs is small (#26144)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:58 -07:00 |
|
|
|
a06bb9bf36
|
[DeepSeek] Improve performance of DS MLA cache kernel (#26132)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:58 -07:00 |
|
|
|
173c8a9520
|
[CI/Build] Conditionally register cutlass_fp4_group_mm to fix building on Hopper (#26138)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:58 -07:00 |
|
|
|
2ea7d48656
|
[Attention] Move Backend enum into registry (#25893)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:58 -07:00 |
|
|
|
8db7b7f39c
|
[Bug][Benchmark] Fix duplicate req in oversampling (#26140)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:58 -07:00 |
|
|
|
587b30c571
|
[Log] Optimize DeepGEMM Missing Log (#26106)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:58 -07:00 |
|
|
|
0c76bb2de1
|
[Bugfix] Disable cascade attention with FlashInfer (#26130)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:58 -07:00 |
|
|
|
72c5dd0310
|
Fix MTP with deepep_low_latency (#25904)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:58 -07:00 |
|
|
|
abc55b1fe5
|
[Perf] Fix and reapply move apply w8a8 block fp8 linear to class (#25696)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:58 -07:00 |
|
|
|
d737c66b95
|
[Mamba][KVCacheManager] Simplify kv cache manage logic for mamba + MTP (#25119)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:58 -07:00 |
|
|
|
da3a188bdb
|
EAGLE 3: Fix preamble so that measured speedup over Eagle 1 becomes 32% instead of 5% on MTBench (#25916)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:58 -07:00 |
|
|
|
77e958752b
|
[Deepseek v3.2] Support indexer prefill chunking (#25999)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
c5880cfa4c
|
[Small] Prevent bypassing media domain restriction via HTTP redirects (#26035)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
01888b5cbf
|
[BugFix] Fix FI accuracy issue when used for MLA prefill (#26063)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
fa179abde3
|
[CI/Build] Replace vllm.entrypoints.openai.api_server entrypoint with vllm serve command (#25967)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
5c8a4a2208
|
[CI] Add Blackwell DeepSeek FP8 FlashInfer MoE tests (#26040)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
06d102ecc8
|
[Qwen][ROCm] Flash Attention Rotary Embeddings (#24642)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
422f2cca4b
|
[Platform][CI] Added OOT platform interface e2e test that running on Ascend NPU (#25470)
Signed-off-by: leo-pony <nengjunma@outlook.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
3884dce376
|
[Model] Use merge_by_field_config for MM models (D-F) (#26076)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
00c0b25e82
|
[Model] Use merge_by_field_config for MM models (A-C) (#26073)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
0655b90d80
|
[FA/Chore] Bump vllm-flash-attention (#25537)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
83fa298682
|
Change size of single CUDA graph for CI to 4 (#26089)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
5a083ce2ea
|
Update base image to 22.04 (jammy) (#26065)
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
115019045d
|
Run:ai model streamer add GCS package support (#24909)
Signed-off-by: Peter Schuurman <psch@google.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
93d2be10b6
|
[Misc] Make handling of SamplingParams clearer in n>1 case (#26032)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
91e10c725c
|
[ROCm][Bugfix] Add missing parameter to ROCm backend (#26029)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
2ae74a80af
|
Support RL online quantization with torchao (#23014)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
ac1598d166
|
[BugFix] ChunkedLocalAttention is currently not CG compatible (#26034)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
ce8ee3d9e7
|
[Bug] Fix Negative Cuda Memory Usage (#25683)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
d4a83e01bb
|
[ROCm][Build] Add support for AMD Ryzen AI MAX / AI 300 Series (#25908)
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
90529cec41
|
[BugFix][DP/EP] Fix CUTLASS MLA hang under load (#26026)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
bba7623426
|
[CI] Tweaks to GPT-OSS Eval (Blackwell) for stability (#26030)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
d2f544018f
|
Fix test_mamba_ssm_ssd.py due to missing _query_start_loc_to_chunk_indices_offsets (#25995)
Signed-off-by: Huamin Li <3ericli@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
ed7eb771a3
|
[NVIDIA] Blackwell Family (#24673)
Signed-off-by: Johnny <johnnynuca14@gmail.com>
Signed-off-by: johnnynunez <johnnynuca14@gmail.com>
Signed-off-by: Johnny <johnnync13@gmail.com>
Signed-off-by: Salvatore Cena <cena@cenas.it>
Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com>
Co-authored-by: Salvatore Cena <cena@cenas.it>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
0944358a90
|
[Bugfix] Apply same sampling parameters for both n=1 and n>1 (#26005)
Signed-off-by: Kenichi Maehashi <maehashi@preferred.jp>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
aeff0604bb
|
[Benchmark] Finish documented v0.11.0 deprecation of --endpoint-type (#26007)
Signed-off-by: Nathan Scott <nathans@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
a561b9832d
|
[MISC] Fix misleading batch_size_capture_list when cuda_graph_sizes < 4 (#25829)
Signed-off-by: billishyahao <bill.he@amd.com>
Co-authored-by: Luka Govedic <ProExpertProg@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
e8773e620f
|
[CI] Only capture a single CUDA graph size in CI by default (#25951)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
63c56cbb25
|
[Misc] Factor out common _apply_feature_select_strategy (#26003)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
25e5b9ccec
|
[BugFix][MM] Fix Nonetype error when video is cache in qwen2.5-omni-thinker (#26004)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
b9ed8c9679
|
[Doc] updating torch.compile doc link (#25989)
Signed-off-by: nadathurv <work.vnadathur@gmail.com>
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com>
Co-authored-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
9506409fc6
|
[Misc]allow disable pynccl (#25421)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|
|
|
fda819837e
|
Update to Transformers v4.56.2 (#24638)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:57 -07:00 |
|