|
|
eef921f45e
|
AOT Compilation for torch.compile (Bundled) (#24274)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2025-10-10 19:02:11 -04:00 |
|
|
|
e317414ce1
|
Cache the environment variable check for batch invariance (#26510)
Signed-off-by: Bram Wasti <bwasti@meta.com>
|
2025-10-10 22:47:34 +00:00 |
|
|
|
949cb0170d
|
[BugFix] Fix async scheduling + request preemption (#26385)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-10 20:29:57 +00:00 |
|
|
|
e94cfd51da
|
[BUG] Qwen3-next MTP. Fix attn metadata build bug (#26564)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-10-10 14:59:03 -04:00 |
|
|
|
7c12763b24
|
Fix some typing issues found by mypy==1.18.2 (#26596)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-10 18:21:25 +00:00 |
|
|
|
3b780a4bbb
|
Update CUDA architecture list in build pipeline for 12.9.1 wheels (#26592)
Signed-off-by: Will Eaton <wseaton@users.noreply.github.com>
|
2025-10-10 11:15:27 -07:00 |
|
|
|
30f78af147
|
Update pre-commit hook versions (#26591)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-10 17:03:44 +00:00 |
|
|
|
19a9b169bf
|
Add Qwen3-Omni moe thinker (#25550)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Xiong Wang <feizi.wx@alibaba-inc.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-10 17:00:56 +00:00 |
|
|
|
96ad65b7fe
|
[Transform] [Quantization] Add QuTLASS support to vLLM (#24440)
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Signed-off-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Andrei Panferov <andrei@panferov.org>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-10 09:43:40 -07:00 |
|
|
|
8d2b8c0ff2
|
[Model] Add FlexOlmo model implementation (#24923)
Signed-off-by: Shane A <shanea@allenai.org>
|
2025-10-10 09:43:15 -07:00 |
|
|
|
b2155ed317
|
[Model][Qwen3VL] Compute cu_seqlens on CPU to remove (#26496)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-10 09:42:17 -07:00 |
|
|
|
910abdbd08
|
[Bugfix] fixed top_logprobs: -1 does not appear to work as intended (#26470)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-10-11 00:41:17 +08:00 |
|
|
|
cddce79fda
|
[torch.compile] Make inductor partition rules respect splitting_ops #25691 (#25845)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-10 16:35:28 +00:00 |
|
|
|
e519281920
|
[Metrics] Add test for multi-modal cache stats logging (#26588)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-10-10 16:00:50 +00:00 |
|
|
|
7b03584de8
|
Silu v2 (#25074)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: elvircrn <elvircrn@gmail.com>
Signed-off-by: Elvir Crnčević <elvircrn@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
|
2025-10-10 15:19:53 +00:00 |
|
|
|
ae9d0e7da5
|
[Bugfix] Make DP padding optional in coordinate_batch_across_dp (#26375)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-10-10 10:53:33 -04:00 |
|
|
|
0e67102d93
|
Added test_top_k_per_row to test-pipeline.yaml. (#26569)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-10-10 10:48:33 -04:00 |
|
|
|
f4ba2061cf
|
[BugFix][torch.compile] Fix fused_scaled_matmul_reduce_scatter signature for PyTorch 2.8 (#26038)
Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com>
Signed-off-by: <>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-10 07:42:13 -07:00 |
|
|
|
1e6848a65d
|
[CI] fix test_run_batch.py::test_completions - AssertionError (#26578)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-10-10 22:16:28 +08:00 |
|
|
|
67661375fa
|
[BugFix] Fix noop elimination edge case (#26394)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2025-10-10 13:33:04 +00:00 |
|
|
|
213b64452a
|
[Bugfix] Convert untraceable GroupShape to list for AMD impl (#26535)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2025-10-10 13:32:29 +00:00 |
|
|
|
784c231151
|
[NIXL] Ignore abort on already-finished request (#25067)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-10-10 12:21:56 +02:00 |
|
|
|
606b00e80f
|
[bugfix][DCP] fix block_size of hash in DCP prefix caching (#26296)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-10-10 03:02:49 -07:00 |
|
|
|
720d3cd0f0
|
[CI] fix ruff format (#26579)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-10-10 03:02:12 -07:00 |
|
|
|
ab196edefb
|
Remove LoRA bias support (#25807)
Signed-off-by: Ashwin Phadke <ashwinphadke12@rediffmail.com>
Signed-off-by: Ashwin Phadke <23502062+ashwin-phadke@users.noreply.github.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-10 09:50:33 +00:00 |
|
|
|
3ee202ea1e
|
[GPT-OSS] Add support for arrays at tool message content (#25593)
Signed-off-by: Luis Tomas Bolivar <ltomasbo@redhat.com>
|
2025-10-10 09:00:45 +00:00 |
|
|
|
ad430a67ca
|
[Metrics] Log multi-modal cache stats and fix reset (#26285)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-10 01:45:55 -07:00 |
|
|
|
6f0f570c43
|
[deepseek] kernel block size for UniformTypeKVCacheSpecs (#26559)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-10-10 16:40:41 +08:00 |
|
|
|
b545a0b207
|
fix test_simple_inductor_graph_partition (#26522)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-10-10 06:39:19 +00:00 |
|
|
|
29255cfc3b
|
[Spec-Decode] Support piecewise cudagraphs for Eagle head (#25109)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2025-10-10 01:20:31 -04:00 |
|
|
|
da4455609d
|
[Chore]: One pythonic tool parser test uses the wrong parser (#26515)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
|
2025-10-10 04:03:55 +00:00 |
|
|
|
aafb99a4d4
|
[Core] Small simplification in GPUModelRunner._update_states() (#26508)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-10 10:53:58 +08:00 |
|
|
|
757fa4a4da
|
[DP][ray] Support different VLLM_RAY_DP_PACK_STRATEGY (#23849)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-10-09 19:53:43 -07:00 |
|
|
|
c6187f55f7
|
Refactor MistralTokenizer (#26358)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
|
2025-10-09 22:48:58 +00:00 |
|
|
|
8983e0216f
|
[CI] Fix Pre-commit Issue Cannot determine type of "rank" and "world_size" (#26448)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-09 15:16:48 -07:00 |
|
|
|
1ee35382cb
|
[Bug] Fix modular_kernel: ZeroDivisionError: integer division or modulo by zero (#26528)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-09 15:13:27 -07:00 |
|
|
|
6e783bc54b
|
[Bugfix] Fix CUDA graph selection bug in FlashInfer at high concurrency (#26499)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-10-09 17:12:34 -04:00 |
|
|
|
c9d33c60dc
|
[UX] Add FlashInfer as default CUDA dependency (#26443)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-10-09 14:10:02 -07:00 |
|
|
|
2e54db4d2b
|
[Core] Remove unused prev_sampled_token_ids_invalid_indices input batch field (#26514)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-09 20:22:14 +00:00 |
|
|
|
44f633dba1
|
[Flashinfer][gpt-oss] Support FP8-qkv Flashinfer TRTLLM Sinks Attention (#25674)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-10-09 16:13:39 -04:00 |
|
|
|
a462331e36
|
[Bugfix] Disable moe inplace for torch >= 2.9 (#26497)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-10-09 18:07:38 +00:00 |
|
|
|
4069db3f2e
|
[Bugfix] Enable padded FP4 quantization (#25947)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2025-10-09 10:59:41 -07:00 |
|
|
|
0d37450eb7
|
[BUGFIX] Add cu_tokens_across_sp to DPMetadata (#26457)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-10-09 17:13:56 +00:00 |
|
|
|
47e66c24e2
|
[Model] Apply shared experts overlap optimization to all models with shared experts (#26145)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-10-09 11:31:04 -04:00 |
|
|
|
3b736e1c38
|
[Attention][DCP] Support DCP with query length > 1 (MTP) with FA3 (#25049)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-10-09 08:06:29 -07:00 |
|
|
|
2c1c7dfb35
|
[Models][Qwen] Replace pad with cat for better performance (#26486)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2025-10-09 14:51:26 +00:00 |
|
|
|
e246ad6f0c
|
Upgrade Pydantic to v2.12.0 and remove hack for Python 3.13 (#26481)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-09 06:02:40 -07:00 |
|
|
|
5728da11ea
|
Revert #26113 "[Frontend] CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops" (#26472)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-10-09 05:43:55 -07:00 |
|
|
|
92be3f3517
|
[Feature] Use pydantic validation in parallel.py config (#26417)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-09 12:41:31 +00:00 |
|
|
|
d1ddf340c8
|
[V0 deprecation] Remove QKVCrossParallelLinear implementation (#26475)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-09 10:52:27 +00:00 |
|