|
|
c7021f1270
|
AITER MHA off by default
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-10-22 14:49:01 -07:00 |
|
|
|
2072fdc044
|
update base image for RC
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-10-22 14:49:01 -07:00 |
|
|
|
6eefda507a
|
RC specific config changes and docker changes
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-10-22 14:49:00 -07:00 |
|
|
|
a0003b56b0
|
[Chore] Separate out system utilities from vllm.utils (#27201)
Signed-off-by: dongbo910220 <1275604947@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-22 20:25:25 +00:00 |
|
|
|
5beacce2ea
|
[BugFix] bugfix for Flash Attention MLA with full cuda graph IMA following pr-25490 (#27128)
Signed-off-by: qqma <qqma@amazon.com>
Co-authored-by: qqma <qqma@amazon.com>
|
2025-10-22 19:36:39 +00:00 |
|
|
|
8669c69afa
|
[Feature] publisher default set zmq in kv_event config (#26915)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-22 19:19:33 +00:00 |
|
|
|
1651003c35
|
[Prefix Cache] Use LoRA name for consistent KV-cache block hashing (#27211)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
|
2025-10-22 18:13:03 +00:00 |
|
|
|
1cb8c6c5fe
|
[Doc] Fix numbering sequence in prefix caching (#27357)
Signed-off-by: William Song <jinwook@umich.edu>
|
2025-10-22 17:35:47 +00:00 |
|
|
|
e05a6754a8
|
[Model] Revert PR #26715: Restore custom PaliGemma and Gemma3-MM impl… (#27309)
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com>
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com>
|
2025-10-22 10:05:34 -07:00 |
|
|
|
084a9dae80
|
[Bugfix] Disable FlexAttention direct block mask building for encoder-only models (#27344)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-22 16:39:08 +00:00 |
|
|
|
c9461e05a4
|
Support Anthropic API /v1/messages Endpoint (#22627)
Signed-off-by: liuli <ll407707@alibaba-inc.com>
Co-authored-by: liuli <ll407707@alibaba-inc.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
v0.11.1rc2
|
2025-10-22 09:13:18 -07:00 |
|
|
|
4dfdb821c8
|
[P/D] Dynamic kv_output_aggregator collect size (#26734)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-10-22 18:07:58 +02:00 |
|
|
|
58fab50d82
|
[Frontend] Require flag for loading text and image embeds (#27204)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-22 15:52:02 +00:00 |
|
|
|
db6f28d898
|
[Bugfix] Fix HF format InternVL large variants video processing (#27330)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-22 08:39:23 -07:00 |
|
|
|
14e2f1231e
|
[Bugfix] Make get_mrope_input_positions instance methods (#27342)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-22 08:38:34 -07:00 |
|
|
|
7c4767f1eb
|
[NIXL] use Host buffer to support TP_ratio > 1 for XPU (#27140)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
|
2025-10-22 15:28:13 +00:00 |
|
|
|
9771e0b432
|
[Bugfix] Add missing 'is_internal_router' attribute to FusedMoEWithLoRA (#27351)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-22 08:19:12 -07:00 |
|
|
|
980de31ca0
|
[bugfix] remove unused parameters to reduce unnecessary vram usage (#26789)
Signed-off-by: Reinforce-II <fate@eastal.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-10-22 08:16:09 -07:00 |
|
|
|
1c160841ea
|
[Bug] Fix DeepSeek-V2.5-1210-FP8 issue (#27267)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-22 11:00:10 -04:00 |
|
|
|
4ca13a8667
|
[NIXL] Terminate handshake listener thread in shutdown (#26404)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-10-22 16:59:53 +02:00 |
|
|
|
675aa2ec64
|
[Model] Upstream Deepseek-OCR model (#27247)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-22 07:59:15 -07:00 |
|
|
|
3ae082c373
|
[Chore] Separate out optional dependency checks from vllm.utils (#27207)
Signed-off-by: dongbo910220 <1275604947@qq.com>
Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-22 10:44:21 -04:00 |
|
|
|
49c00fe304
|
Mirroring changes in test-pipeline.yaml into test-amd.yaml (#27242)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-10-22 09:59:45 -04:00 |
|
|
|
141d3b9fc5
|
[docs] Update v1 metrics design doc (#27332)
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: atalhens <sneh.lata@nutanix.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: atalhens <sneh.lata@nutanix.com>
|
2025-10-22 06:29:15 -07:00 |
|
|
|
abf3db40ef
|
[Core] Handle MoE LoRA edge cases (#27335)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-22 13:14:33 +00:00 |
|
|
|
8e4ca4d14e
|
Bugfix - pass 'max_num_tokens_padded' into 'moe_lora_align_block_size' (#27311)
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-22 12:23:57 +00:00 |
|
|
|
1a0f4defb7
|
[Log] Add Warning for LLM(data_parallel_size=k) single-process DP Usage (#27282)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-22 12:12:21 +00:00 |
|
|
|
843af7f7fc
|
[Bugfix][CPU] Disable dual stream execution for experts on CPU (#27320)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-10-22 11:02:27 +00:00 |
|
|
|
1f633b8632
|
[Frontend][3/N] Improve all pooling task | Support binary embedding response (#27066)
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-10-22 18:38:57 +08:00 |
|
|
|
a4c29e6e82
|
fixed reasoning streaming with tool_choice="required" (#24108)
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com>
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-10-22 09:42:55 +00:00 |
|
|
|
8f18feb191
|
Remove last level references not removed in #26355 (#27260)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-22 09:18:17 +00:00 |
|
|
|
ed540d6d4c
|
Update release pipeline for PyTorch 2.9.0 (#27303)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-10-22 09:18:01 +00:00 |
|
|
|
f6027b2855
|
[1/N][Platform] Cleanup useless function (#26982)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-10-22 09:04:57 +00:00 |
|
|
|
ab3e80042e
|
[torch.compile] Enable silu_mul_fp8_quant fusion without custom ops enabled (#27146)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-10-22 00:22:39 -04:00 |
|
|
|
ceacedc1f9
|
[Benchmark] Add plot utility for parameter sweep (#27168)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-21 20:30:03 -07:00 |
|
|
|
bfa59be8f1
|
[CI] Nixl integration tests DP-EP (#27199)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-10-22 11:17:48 +08:00 |
|
|
|
265ecb05fb
|
[DOC] [ROCm] Add ROCm quickstart guide (#26505)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-10-22 03:10:48 +00:00 |
|
|
|
09a7e6f617
|
[Deepseek v3.2] Remove extra logics in indexer (#26465)
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Lain <siyuanf@nvidia.com>
Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-10-21 23:34:03 +00:00 |
|
|
|
6c2eef5a5d
|
[P/D] KVConnector for decode benchmarking (#25986)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-10-21 16:30:47 -07:00 |
|
|
|
19748806f0
|
[Bugfix] skip cuda graph for drafter when running with eager (#26821)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-10-21 15:39:09 -07:00 |
|
|
|
4a8a567e16
|
Updated xgrammar backend to not deny supported string formats (#27253)
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com>
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-21 22:25:23 +00:00 |
|
|
|
344a0017c0
|
[Performance] Dual stream execution of "shared_experts" and "selected_experts" inside FusedMoE (#26440)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-10-21 21:38:29 +00:00 |
|
|
|
becb7de40b
|
Update PyTorch to 2.9.0+cu129 (#24994)
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-21 17:20:18 -04:00 |
|
|
|
250fb1b8ea
|
[Bugfix] fixes the decoding metadata of dense mla's fp8 kvcache. (#27144)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-10-21 18:27:03 +00:00 |
|
|
|
647214f3d5
|
[V0 Deprecation] Remove V0 executors (#27142)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-21 11:09:37 -07:00 |
|
|
|
ddeec11ba9
|
[Bugfix][P/D] Reduce num_threads used by nixl ucx backend (#27196)
Signed-off-by: David Whyte-Gray <40244437+dagrayvid@users.noreply.github.com>
|
2025-10-21 13:41:52 -04:00 |
|
|
|
86ed77022d
|
[Feature] Batch Invariant for R1 TP 8 on Blackwell (#27229)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-21 10:25:55 -07:00 |
|
|
|
aa1356ec53
|
[ROCm] Update Triton, Torch, and AITER branches for ROCm base Dockerfile (#27206)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-10-21 12:01:23 -04:00 |
|
|
|
ecc3c0940a
|
Add @pavanimajety to .github/codeowners for Flashinfer, ModelOpt related code (#27213)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-10-21 22:59:53 +08:00 |
|
|
|
ba09652de2
|
[ROCM] Enable CompressedTensorsWNA16 (#27187)
Signed-off-by: JartX <sagformas@epdcenter.es>
|
2025-10-21 10:43:23 -04:00 |
|