1c5c866559
uint64
2025-10-30 16:54:10 -07:00
5c8049d990
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-10-30 16:40:09 -07:00
5666a25efb
fix
2025-10-30 16:38:16 -07:00
09e4b2f6eb
update
2025-10-30 16:30:06 -07:00
110770170f
Merge branch 'main' into woosuk/model-runner-v2
2025-10-30 22:19:50 +00:00
e7acb20076
[Feature] Batch invariant torch.compile ( #27660 )
...
Signed-off-by: PaulZhang12 <paulzhan@fb.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-30 13:11:29 -07:00
4b68c4a55b
[Core][Perf] Only invoke save_new_computed_blocks when computed blocks are not empty ( #27799 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-30 19:47:30 +00:00
a8141fa649
[Refactor] Remove VLLM_DEEPEP_LOW_LATENCY_ALLOW_NVLINK ( #27750 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 15:32:39 -04:00
4917002523
[Fix] Skip record_sleep_state logic in PrometheusStatsLogger if not in dev mode ( #27789 )
...
Signed-off-by: SumanthRH <sumanthrh99@gmail.com >
2025-10-30 19:26:27 +00:00
a2981c4272
[EP/DP][API Server] Enable DP-aware routing in OpenAI API requests ( #24945 )
...
Co-authored-by: Cong Chen <prowindy@gmail.com >
2025-10-30 12:10:16 -07:00
4574d48bab
[Core][Bookkeeping] Update cu_num_accepted_tokens for all req_index ( #27629 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-30 11:52:36 -07:00
ab98f6556f
[Bugfix] Fix 2 precommit issues - (mamba_block_size, kv_cache_config) ( #27811 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-10-30 11:52:18 -07:00
2918c1b49c
[Model] Use the same fused_moe configs for all H200 devices ( #23642 )
...
Signed-off-by: Roger Meier <r.meier@siemens.com >
2025-10-30 17:36:56 +00:00
1004205795
[MTP] Refactor mtp predictor to avoid d2h operation ( #27643 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-10-30 17:27:39 +00:00
ba33e8830d
Reapply "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" ( #27768 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-10-30 10:22:30 -07:00
33a0ea5f32
[Docs] add Shanghai Meetup - 2025/10 ( #27545 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: esmeetu <jasonailu87@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: esmeetu <jasonailu87@gmail.com >
2025-10-31 00:33:13 +08:00
60f76baa66
[Misc] Replace CUDA_VISIBLE_DEVICES in DP with torch.cuda.set_device for device selection on cuda-like devices ( #27564 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-10-30 11:41:44 -04:00
e5e076cad7
[BugFix] Stopgap - Flashinfer Autotuner + GPT-OSS + DP/TP ( #27762 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-30 08:24:31 -07:00
eebf00cb0c
[Bugfix][CPU] Fix MRoPE dispatch on the CPU backend ( #27800 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-10-30 15:12:05 +00:00
9956aae4ea
[Model][Ouro] Support Ouro Model ( #27794 )
...
Signed-off-by: yinfan.1024 <yinfan.1024@bytedance.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: yinfan.1024 <yinfan.1024@bytedance.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-30 22:34:41 +08:00
0fe0140408
[KV offload] Enable CPU KV offload on CUDA alike Platforms ( #27770 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-30 22:10:29 +08:00
4e68cc9b6a
[Model] Introduce Kimi Linear to vLLM ( #27809 )
...
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn >
Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com >
2025-10-30 21:02:27 +08:00
1994de99ea
[CI Failure] Fix test_kv_cache_model_load_and_run ( #27717 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-30 12:27:53 +00:00
4464723f22
[Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. ( #25524 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-30 12:13:05 +00:00
74374386e2
[Bugfix] Improve GPU validation logging in Ray fallback scenarios ( #25775 )
...
Signed-off-by: Sairam Pillai <sairam.pillai61@gmail.com >
2025-10-30 11:57:59 +00:00
c01f6e525f
[CI] Fix mypy for vllm/v1/core and vllm/v1/engine ( #27108 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 11:32:17 +00:00
c7d2a554ba
[CI Failure] fix test_default_mm_loras ( #27795 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-30 18:13:03 +08:00
af826e0820
[V0 deprecation] Remove VLLM_USE_V1 usage in config module ( #27784 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-30 09:42:49 +00:00
e806178d2a
[BugFix][VL] Fix FA selection on Qwen2.5-VL ( #27790 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-30 07:54:44 +00:00
5be1bed790
[CI/Build]Add eval config for Qwen3-235B-A22B-Instruct-2507-FP8 ( #27113 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-30 07:50:56 +00:00
31b55ffc62
use stringData in secret yaml to store huggingface token ( #25685 )
...
Signed-off-by: yiting.jiang <yiting.jiang@daocloud.io >
2025-10-30 00:47:36 -07:00
ded8ada86a
Add more dims for batch invariant shims ( #27489 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Bram Wasti <bwasti@fb.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-30 05:28:45 +00:00
8bff831f0a
[Benchmark] Cleanup deprecated nightly benchmark and adjust the docstring for performance benchmark ( #25786 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-10-30 04:43:37 +00:00
b5d70751d8
[BugFix] Reordering extend logic fix ( #27739 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-29 21:39:34 -07:00
b8c48c5d72
kernels/moe test pruning ( #27053 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-30 12:10:34 +08:00
17d055f527
[Feat] Adds runai distributed streamer ( #27230 )
...
Signed-off-by: bbartels <benjamin@bartels.dev >
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev >
Co-authored-by: omer-dayan <omdayan@nvidia.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-29 21:09:10 -07:00
2ce5c5d3d6
[BugFix] Handle unscheduled requests properly when async scheduling ( #27756 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-29 21:04:25 -07:00
b5bae42f91
[XPU] Update latest IPEX 2.8 release ( #27735 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-10-30 11:17:13 +08:00
d7fb10c574
[Bugfix] mamba-block-size is set for vision language model ( #27773 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-10-29 19:39:57 -07:00
b798e39f93
[XPU][bugfix] fix rope for llama4 and deepseek ( #25145 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2025-10-30 09:43:13 +08:00
48eb8eba58
[Temp fix] Disable torch.compile for Qwen2.5 VL's VisionBlock temporarily. ( #27760 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 23:17:48 +00:00
b5d90f7400
[Bug] Fix DBO IMA issue for DeepEPHT ( #27666 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-29 16:28:27 -04:00
d4aa144343
[BugFix] Fix handling of resumed reqs in SharedStorageConnector ( #27719 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-29 20:16:52 +00:00
fcb1d570bb
[Bug] Fix DeepEP low latency assert self.batched_router_logits.size(-1) == full_router_logits.size(-1) Bug ( #27682 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-29 14:50:39 -04:00
accb8fab07
[KVConnector] Add metrics to Prometheus-Grafana dashboard ( #26811 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
2025-10-29 18:44:49 +00:00
5b0448104f
[Bug] Raise error explicitly if using incompatible backend ( #27424 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-29 13:29:20 -04:00
f7a6682872
[CI/Build] Test torchrun with 8 cards ( #27548 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-10-29 10:26:06 -07:00
a9fe0793f2
use_aot_compile should respect VLLM_DISABLE_COMPILE_CACHE (#27698 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-29 17:08:54 +00:00
7568a282b9
[FIXBUG] Qwen3VL hallucinations without Contiguous on Torch.SDPA ( #27744 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-29 16:55:35 +00:00
1da3309ace
[Core] Exposing engine sleep & wake_up state as prometheus metrics ( #24176 )
...
Signed-off-by: Braulio Dumba <Braulio.Dumba@ibm.com >
2025-10-29 09:32:01 -07:00
5522fb274b
[Chore] Optimize P2PNCCLEngine http_address ( #27488 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-30 00:05:09 +08:00
0f95a1c3f2
[CI] Fix flaky test_two_responses_with_same_prev_id test ( #27745 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-29 15:10:35 +00:00
ded24e3e54
[ROCm][Platform] Add MI308X device id in _ROCM_DEVICE_ID_NAME_MAP ( #27623 )
...
Signed-off-by: Xiake Sun <xiake.sun@amd.com >
2025-10-29 14:44:03 +00:00
d6704dd099
Fix MiniMax-M2 rmsnorm precision and remove useless code ( #27627 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com >
Co-authored-by: xuebi <xuebi@minimaxi.com >
2025-10-29 21:01:05 +08:00
ecca3fee76
[Frontend] Add vllm bench sweep to CLI ( #27639 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-29 05:59:48 -07:00
9a0d2f0d92
[CI/Build] Skip cpu offloading test on AMD ( #27690 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-29 12:55:51 +00:00
ad3ec89532
[VLM] Add Qwen3-VL generation test ( #25185 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 12:19:37 +00:00
3481e40743
[chore] Remove models weight on S3 logic ( #27725 )
...
Signed-off-by: kevin <kevin@anyscale.com >
2025-10-29 10:29:49 +00:00
5e72216d17
Feature/video support in random mm dataset ( #25963 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Eugene Khvedchenya <ekhvedchenia@nvidia.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 18:24:52 +08:00
1a33aacf82
[Misc] Raise error for missing video metadata in MultiModalDataParser ( #27664 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-29 10:06:42 +00:00
7ba6aa8f56
[Fix] import get_kv_cache_torch_dtype error in LMCacheConnector integration ( #27670 )
...
Signed-off-by: KevinCheung2259 <2651309292@qq.com >
2025-10-29 10:03:54 +00:00
ab2eb27b74
[Frontend] [gpt-oss] Mcp type bug ( #27689 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Signed-off-by: Alec Solder <alecs@fb.com >
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Co-authored-by: Alec Solder <alecs@fb.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-29 10:01:32 +00:00
3c7fefdeba
[Frontend] [gpt-oss] Tool json call parsing error retry ( #27675 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Signed-off-by: Alec Solder <alecs@fb.com >
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
Co-authored-by: Alec Solder <alecs@fb.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-29 09:42:44 +00:00
1891cf605a
[Bugfix] Fix modular kernel tests ( #27707 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-10-29 16:14:33 +08:00
8df98c2161
[perf] Enable concurrent execution of "shared_experts" and "selected_experts" in qwen3-next ( #27578 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-29 08:12:54 +00:00
4fb8771cc0
[CI/Build] Move pre-commit only scripts to tools/pre_commit ( #27657 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-29 08:04:33 +00:00
413ef7a3b4
[Speculators] Move tests + fix integration ( #27308 )
...
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
Signed-off-by: rahul-tuli <rtuli@redhat.com >
Co-authored-by: Rahul Tuli <rtuli@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-10-29 00:54:21 -07:00
8b62495076
[Bugfix] Fix non-contiguous tensor error in rocm_unquantized_gemm_impl ( #27605 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-29 00:00:15 -07:00
83fd49b1fc
[CI/Build][Bugfix]Fix Quantized Models Test on AMD ( #27712 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-29 06:27:30 +00:00
a4a4f0f617
[KV Connector] Update lmcache connector with latest compatibility ( #27681 )
...
Signed-off-by: Samuel Shen <slshen@uchicago.edu >
Co-authored-by: Samuel Shen <slshen@uchicago.edu >
2025-10-29 05:38:37 +00:00
0d8161b075
[Model] Fix Qwen3VL and Qwen3Omni after torch.compile changes ( #27705 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-29 05:28:20 +00:00
d2c33c397a
[NIXL][XPU] update name of nixl wheel ( #27631 )
...
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com >
2025-10-29 12:43:29 +08:00
f6d5f5888c
[Build] Revert triton_kernels requirements ( #27659 )
2025-10-28 21:07:09 -07:00
9007bf57e6
Revert "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" ( #27714 )
2025-10-28 20:58:01 -07:00
f257544709
Install pre-built xformers-0.0.32.post2 built with pt-2.9.0 ( #27598 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-28 19:39:15 -07:00
0b51c9bd8b
[Core] Early return in SlidingWindowManager.remove_skipped_blocks ( #27673 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-29 01:32:33 +00:00
d3ab240f39
[Bug] Fix deepep low latency use nvlink by default ( #27677 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-28 23:53:12 +00:00
94666612a9
[Misc][qwen2_5_vl][torch.compile] Enable supports_torch_compile on generic nn.Module and demonstrate speedup on Qwen Vision model ( #23207 )
...
Signed-off-by: Lucas Kabela <lucaskabela@meta.com >
Signed-off-by: Lucas Kabela <lucasakabela@gmail.com >
2025-10-28 22:36:43 +00:00
4fe5895361
[AsyncScheduling] Make async overlap work with logprobs ( #27615 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-28 22:35:54 +00:00
111faf1118
[Core] Scheduler: Publish connector events after output ( #25875 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com >
2025-10-28 21:01:33 +00:00
6afc28a9ba
[Test] Batch Invariant: Unit test using parameterized backend ( #27478 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-28 13:51:35 -07:00
141e6a0505
[Misc] Make reorder batch also separate extends ( #27367 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-28 10:55:10 -07:00
130aa8cbcf
Add load pattern configuration guide to benchmarks ( #26886 )
...
Signed-off-by: Matvei Pashkovskii <mpashkov@amd.com >
Signed-off-by: Matvei Pashkovskii <matvei.pashkovskii@amd.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-28 10:49:15 -07:00
e3d8186666
[compile] Add fallback path to AOT compile when serialization fails. ( #27350 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 12:54:26 -04:00
f5710ef02a
[Misc] Make LayerBlockType a Literal instead of Enum ( #27658 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-28 16:23:35 +00:00
a8c02fb5bf
[Bugfix][CI] Fix v1 attention backend tests and add CI coverage ( #26597 )
...
Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu >
Signed-off-by: Mohammad Miadh Angkad <mangkad.bsdsba2027@aim.edu >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-28 11:42:05 -04:00
02af36df36
[Bugfix] Fix allocation & free logic of SingleWriterShmRingBuffer ( #27117 )
...
Signed-off-by: Kero Liang <kerorek@outlook.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: donglu <donglu@cohere.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-28 15:01:24 +00:00
e88bdd60d9
[FLA] Introduce Kimi Delta Attention(KDA) to VLLM ( #27654 )
...
Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn >
2025-10-28 22:56:28 +08:00
05e034f085
[nit]: Fix import for the lmcache integration ( #27600 )
...
Signed-off-by: Samuel Shen <slshen@uchicago.edu >
Co-authored-by: Samuel Shen <slshen@uchicago.edu >
2025-10-28 14:40:55 +00:00
936643a868
[BugFix] Also consider RAY_EXPERIMENTAL_NOSET_* when storing compilation cache ( #27294 )
...
Signed-off-by: Hollow Man <hollowman@opensuse.org >
2025-10-28 10:22:28 -04:00
b186149e8e
[Bugfix][Frontend] validate arg priority in frontend LLM class before add request ( #27596 )
...
Signed-off-by: Junpu Fan <junpufan@gmail.com >
2025-10-28 14:02:43 +00:00
2abbd351ef
[Core] Enable async scheduling for external_launcher mode ( #27394 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com >
2025-10-28 13:52:47 +00:00
446912d1cb
fix: allow HuggingFace standard chat template params via **kwargs ( #27622 )
...
Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com >
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-28 21:12:34 +08:00
a00d6254e9
[compile] Disable dynamo guards check for AOT compilation. ( #27288 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 12:58:12 +00:00
05181cc57f
[Hybrid] Add mamba_block_size to Engine Args ( #27289 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
2025-10-28 12:54:24 +00:00
259504e147
[compile] Add enable_prompt_embeds to compile hash. ( #27285 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 20:46:03 +08:00
0484b64248
[Bug] Fix shape issue for eplb expert weights ( #27589 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-28 20:44:05 +08:00
f58d9b6404
[Misc] Separate out utils.counter and move utils.Device to engine ( #27588 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-28 12:20:46 +00:00
44b5ce956d
[Bugfix] In LongRoPE, decide short vs long based on max_model_len ( #27431 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-28 12:00:56 +00:00
7a865f2325
[V0 Deprecation] Remove vestigial V0 logits_processors.py file ( #27601 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-28 19:17:45 +08:00
2fa90bda27
Fix a robust parsing issue in KimiK2ToolParser that causes IndexError ( #27565 )
...
Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local >
2025-10-28 11:11:50 +00:00
0291fbf65c
[CI/Build] Fix amd model executor test ( #27612 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-28 08:58:11 +00:00
b46e4a06f1
[Core][Bookkeeping Optimization] Update against numpy view of is_token_ids tensor ( #27618 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-28 08:13:10 +00:00
d34f5fe939
[Bugfix][CPU] Fallback oneDNN linear to torch linear to fix half gemm support on legecy platforms ( #27526 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-27 23:25:44 -07:00
bdb01a38fe
[Hardware][AMD][Model] Triton MoE tuning configs for GLM-4.6 for MI300X ( #27323 )
...
Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com >
2025-10-27 22:58:06 -07:00
5b3c35a68e
[ROCm] [Doc] Update ROCm installation docs ( #27327 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-28 13:00:50 +08:00
61fbfe5274
[Bugfix] fixed inconsistent finish_reason handling between V0 and V1 engines ( #27555 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-28 02:18:08 +00:00
255e34ca50
[Stability fix] turn off HMA allocator when connector is set ( #27592 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
Signed-off-by: Kuntai Du <kuntai@uchicago.edu >
2025-10-27 18:32:23 -07:00
a8d2e326ec
[Bugfix][CI] Fix config resolving logic with remote models ( #27610 )
2025-10-28 00:48:32 +00:00
53a56e658b
[gpt-oss][2/N] Support input_messages in responsesRequest ( #26962 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-10-27 23:15:49 +00:00
69f064062b
Code quality improvements: version update, type annotation enhancement, and enum usage simplification ( #27581 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
2025-10-27 17:50:22 +00:00
921e78f4bb
[ROCm] Update AITER branch for ROCm base docker ( #27586 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-10-27 17:22:33 +00:00
6ebffafbb6
[Misc] Clean up more utils ( #27567 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 15:30:38 +00:00
3b96f85c36
[Chore]: Stream tokens vs characters in tool call parser tests ( #26513 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2025-10-27 23:06:25 +08:00
23ad820553
fixing mm placeholder replacement issue with gemma3 ( #27538 )
...
Signed-off-by: tingtingtang1992 <streamttt@gmail.com >
2025-10-27 14:34:01 +00:00
5d3be3ba4c
[Bugfix][LoRA][FusedMoE] Select MxFP4 Backend based on LoRA Enablement ( #27487 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-27 07:32:50 -07:00
4f882be4a0
[Model] Siglip2 Model Support ( #27566 )
...
Signed-off-by: piood <2477084691@qq.com >
2025-10-27 06:57:37 -07:00
9273754222
[Hybrid] Added supports_mamba_prefix_caching Protocol ( #27339 )
...
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com >
2025-10-27 13:05:20 +00:00
f4e8154076
[Kernel] Enable moe LoRA kernel support FP16 ( #27468 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-27 19:48:37 +08:00
a663f6ae64
[cpu][perf] Fix low CPU utilization with VLLM_CPU_OMP_THREADS_BIND on AArch64 ( #27415 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-10-27 11:14:55 +00:00
a4fc21895e
[Bugfix] Fixed when return_token_ids=False, the first event still contains prompt_token_ids. ( #27561 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-27 11:06:43 +00:00
a3e8611da5
[Bugfix] Limit the default value of max_model_len when it is not specified by users ( #27556 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-10-27 10:16:20 +00:00
7c2bdb83dc
[Misc] Clean up utils ( #27552 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 09:05:40 +00:00
9932ed6a83
[Kernel] Adding split_K implementation for fused_moe_lora ( #27291 )
...
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com >
Signed-off-by: Danielle Robinson <dcmaddix@gmail.com >
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-27 02:05:24 -07:00
2d631d28c6
[Doc] Slight improvement to M2 and beyond ( #27554 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-27 09:02:10 +00:00
b368382964
[Model] Deprecate merge_by_field_config=False ( #27551 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 16:43:00 +08:00
a806c14cc7
[Performance][LoRA] add context varying params to 'do_not_specialize' in fused moe lora ( #27445 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
2025-10-27 06:31:55 +00:00
181bf5bbde
[Docs] reemove the incorrect enable_reasoning parameter ( #27550 )
...
Signed-off-by: zxw <1020938856@qq.com >
2025-10-26 23:17:19 -07:00
cbd5e07a51
[Model] Use merge_by_field_config for MM models (Qwen series) ( #27546 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-27 05:38:05 +00:00
63b22e0dbb
[Model][Bugfix] fix ernie45 moe 300B SharedFusedMoE output tuple ( #27316 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-10-26 20:53:31 -07:00
5980604c44
Fix MiniMax-M2 copyright ( #27537 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com >
Co-authored-by: xuebi <xuebi@minimaxi.com >
2025-10-27 03:29:51 +00:00
361a7463d3
fix m2 test ( #27536 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-10-27 01:04:36 +08:00
720af6ab79
[Model][MiniMax-M2] Support MiniMax-M2 Model ( #27535 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com >
Co-authored-by: xuebi <xuebi@minimaxi.com >
2025-10-27 00:59:11 +08:00
55cba4a05c
[CI/Build] Update causal-conv1d installation ( #27529 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 22:14:22 +08:00
c7abff2990
Revert "[CI/Build] Use CPU for mm processing test on CI ( #27522 )" ( #27531 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 04:44:27 -07:00
71b1c8b667
[Chore]:Extract math and argparse utilities to separate modules ( #27188 )
...
Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com >
Signed-off-by: Yeshwanth N <yeshsurya@gmail.com >
Signed-off-by: yeshsurya <yeshsurya@gmail.com >
2025-10-26 04:03:32 -07:00
8fb7b2fab9
[Doc] Fix links to GH projects ( #27530 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 17:55:51 +08:00
be7b55a83d
[Doc] Remove Molmo warning ( #27527 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-26 16:22:52 +08:00
315b860abe
[bugfix]fix empty prompts for async-engine mode in benchmark throughput ( #27494 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
2025-10-26 08:16:35 +00:00
87c41c26ad
[Bugfix] Fix processor initialization for model from modelscope instead of HF ( #27461 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-26 07:44:31 +00:00
65d2cf9511
[BUGFIX][ROCM] ViT FlashAttention on ROCm (no GFX9) and contiguous on qwen3vl ROCm TORCH_SDPA ( #27190 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-10-26 15:08:52 +08:00
d63cd9ff10
[CI/Build] Use CPU for mm processing test on CI ( #27522 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-26 13:09:18 +08:00
66a168a197
[CI/Build] Refactor processing tests ( #27470 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-25 16:14:30 +00:00
a99564ac5b
[Attention] Add missing kv cache scale setup ( #27490 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-25 00:12:49 -07:00
4c5f632165
[Misc] Simplify max tokens in multimodal registry ( #27500 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-24 23:56:01 -07:00
b853540388
[Core][Hybrid allocator + kv connector 1/n] Enable hybrid allocator + KV cache connector ( #25712 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
Signed-off-by: Kuntai Du <kuntai@uchicago.edu >
2025-10-24 23:34:18 -07:00
56ed7609a9
Revert "[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selectio… ( #27502 )
2025-10-25 05:31:43 +00:00
29c9cb8007
[CI] Add tests for cudagraph ( #27391 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-25 02:37:33 +00:00
83f478bb19
[KVConnector] Migrate the LMCache integration code to be vLLM native ( #25542 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
2025-10-25 00:23:53 +00:00
269c4db0a4
[Misc][DP] Guard mxfp4 implementation selection ( #27484 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-24 23:29:24 +00:00
52efc34ebf
[Log] Optimize Startup Log ( #26740 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-24 19:27:04 -04:00
d95d0f4b98
[Distributed] Basic set of configuration for large EP deployment on GB200 ( #27328 )
...
Signed-off-by: Pengchao Wang <wpc@fb.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2025-10-24 14:16:44 -07:00
0402428200
[Perf][Async Scheduling] Remove CPU->GPU sync in dummy_run ( #27455 )
...
Signed-off-by: Lehua Ding <lehuading@tencent.com >
2025-10-24 20:45:36 +00:00
17af6aa0da
[Document] Add ms-swift library to rlhf.md ( #27469 )
2025-10-24 20:31:50 +00:00
fc168c33f3
[CI/Build] Fix test_torch_utils in AMD CI ( #27317 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-24 12:26:00 -07:00
acc78aeb88
[Bugfix] Fix interns1-vit qk norm code path ( #27480 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-24 17:43:45 +00:00
0f67d4d962
[Attention] Add MLA prefill backend: trtllm_ragged_attention_deepseek ( #26397 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com >
2025-10-24 10:24:08 -07:00
7e1d697b56
[Bugfix] Fix MultiConnector stats reconstruction across process boundaries ( #27366 )
...
Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2025-10-24 17:08:05 +00:00
699d62e6cf
[NIXL][BUGFIX] delay done_recving queue cleanup to bottom of get_finished ( #27297 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-24 17:01:41 +00:00
cd390b609d
[compile] Turn standalone_compile back on ( #27460 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2025-10-24 16:30:27 +00:00
2080b05099
[cpu][fix] Fix onednn_mm crash on consecutive matmuls with same M,K,N and different dtype ( #27472 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-10-24 15:57:48 +00:00
6454afec90
[Doc] Fix minor issues in docs/design/metrics.md ( #27436 )
...
Signed-off-by: Lifan Shen <lifans@meta.com >
2025-10-24 05:40:54 -07:00
41a62564a7
Fix test named tool use ( #27458 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-24 20:27:45 +08:00
284cc92275
[MISC] cudagraph_capture_sizes related improvements ( #26016 )
...
Signed-off-by: fhl <2410591650@qq.com >
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-24 05:11:05 -07:00
435be10db9
Fix AArch64 CPU Docker pipeline ( #27331 )
...
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com >
2025-10-24 05:11:01 -07:00
b7030d962b
[Benchmark] Enable benchmark to run with encoding_format="bytes" ( #27467 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-24 11:16:50 +00:00
3567816932
[Refactor] move tool parsing logic from protocol.py to the tool parser ( #27383 )
...
Co-authored-by: Aaron Pham <contact@aarnphm.xyz >
2025-10-24 09:53:23 +00:00
e0ef8a2920
[BugFix] Fix torchrun DP with LLM class ( #27395 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-10-24 08:11:37 +00:00
42efe609ba
[MM][Bugfix] Replace PatchEmbed's conv3d to linear layer ( #27418 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-24 07:32:47 +00:00
88d3141ec6
[Docs] remove v1 column for embedding models ( #27446 )
...
Signed-off-by: piood <2477084691@qq.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-23 23:55:03 -07:00
09a6a49eaf
[Misc] Avoid "PyTorch non-writable tensors" warning in RayPPCommunicator ( #27443 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-24 14:53:09 +08:00
074475541a
[Bugfix] Fix Pydantic union resolution for ResponseFunctionToolCall in Responses API ( #26706 )
...
Signed-off-by: Shai Trinczer <strinczer@icloud.com >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-23 22:53:42 -07:00
d4c574c39f
[Chore] remove structural tags logging lines ( #27451 )
2025-10-24 05:35:45 +00:00
c528b9006a
Fix EventPublisherFactory logic for disabled KV cache events ( #27419 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
2025-10-24 05:00:01 +00:00
85fee74b33
[Bugfix][CI] Move resolving cudagraph_mode before initializing attn_metadata_builder ( #27427 )
...
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com >
2025-10-23 20:31:14 -07:00
8dbe0c527f
[Misc] Add TPU usage report when using tpu_inference. ( #27423 )
...
Signed-off-by: Hongmin Fan <fanhongmin@google.com >
2025-10-23 20:29:37 -07:00
5cc6bddb6e
[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm ( #26092 )
2025-10-23 23:26:13 -04:00
1f9460c4c1
Fix pooling adapters for Transformers backend ( #27338 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-23 20:23:55 -07:00
70022ffc00
Granite 4.0 quark quantization support ( #26944 )
...
Signed-off-by: Xiao YU <Xiao.YU@xilinx.com >
Signed-off-by: Xiao Yu <xiao.yu.dc@outlook.com >
Co-authored-by: Xiao YU <Xiao.YU@xilinx.com >
2025-10-24 02:14:03 +00:00
f417746ad7
[Hardware][POWERPC] Disable oneDNN path in vllm/model_executor/layers/utils.py for Powerpc ( #27422 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
2025-10-23 21:21:36 +00:00
0552cfb195
[Model] Siglip Embedding Support ( #27324 )
...
Signed-off-by: piood <2477084691@qq.com >
2025-10-23 20:19:48 +00:00
51dd14ac2b
[Bugfix][DP] Fix creating too many DP Placement Groups ( #26880 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Co-authored-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-23 20:16:51 +00:00
dbfbf9f324
[Attention] Fix FlashMLA metadata builder arguments for q_len > 1 ( #27368 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-23 15:58:15 -04:00
ca76486a16
[Chore] Separate out vllm.utils.platform_utils.py ( #27374 )
...
Signed-off-by: Jonathan <chenleejonathan@gmail.com >
2025-10-23 19:08:06 +00:00
a9f55dc588
[Misc] Add triton_kernels dependency ( #27370 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-23 12:04:14 -07:00
81d5bb765a
[Bugfix] Fix AWQ marlin layer skipping ( #27416 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-23 18:30:28 +00:00
0825197bee
[Bugfix][ROCm][DeepSeek] Fix for forward_hip in rope for DeepSeek ( #27373 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-10-23 17:43:53 +00:00
9ef3d5b875
[Bugfix] Fix dp_chunking enablement logic in FusedMoE layer ( #27220 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-10-24 00:03:14 +08:00
295c7f0267
Mirroring the test definitions (2025-10-22) ( #27362 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-10-24 00:02:26 +08:00
3fa2c12185
[Frontend][4/N] Improve all pooling task | Add plugin pooling task ( #26973 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Signed-off-by: Christian Pinto <christian.pinto@ibm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com >
2025-10-23 14:46:18 +00:00
fe2016de2d
[CI/Build] Remove unnecessary flags from test registry ( #27353 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-23 14:42:40 +00:00
237cf6d32a
[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selection (fix DP slow startup time &c) ( #26709 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-10-23 20:58:39 +08:00
faee3ccdc2
[Feature] Pydantic validation for speculative.py ( #27156 )
...
Signed-off-by: Navya Srivastava <navya.srivastava1707@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-23 12:19:33 +00:00
570c3e1cd4
[Bugfix] Honor --mm_encoder_attn_backend when used ( #27124 )
...
Co-authored-by: Bradley D <4551889+bradleyhd@users.noreply.github.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-23 20:09:52 +08:00
3a4255c7c4
Run mypy on the lowest supported Python version instead of system Python ( #27048 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-23 05:07:44 -07:00
61089465a6
[Model] Add MoE support for NemotronH ( #25863 )
...
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com >
2025-10-23 10:27:23 +00:00
88afa11010
[Metrics] [KVConnector] Add connector prefix cache hit rate stats ( #26245 )
...
Signed-off-by: tovam <tovam@pliops.com >
2025-10-23 12:21:08 +02:00
d00ce29d89
[CI] Reorganize entrypoints tests ( #27403 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-23 10:10:06 +00:00
3b7bdf983b
add SLA information into comparison graph for vLLM Benchmark Suite ( #25525 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
Signed-off-by: louie-tsai <louie.tsai@intel.com >
Signed-off-by: Louie Tsai <louie.tsai@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-23 08:04:59 +00:00
50b788a17a
[CI/Build] Fix AMD CI: test_cpu_gpu.py ( #27388 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-23 07:55:00 +00:00
fc059c7061
[Bugfix] Fix args settings for guided decoding args ( #27375 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
2025-10-23 07:34:06 +00:00
bfb240cc49
[CI/Build] Fix Prithvi plugin test ( #27393 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-23 07:30:44 +00:00
e255d92990
[Chore] Remove duplicate has_ functions in vllm.utils ( #27372 )
...
Signed-off-by: Jonathan <chenleejonathan@gmail.com >
2025-10-23 06:11:59 +00:00
3729ed00ba
[Model] Add num_cached_tokens for PoolingRequestOutput ( #27378 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-23 14:03:42 +08:00
6644796bf4
[V1][spec decode] return logprobs for spec decoding ( #26060 )
...
Signed-off-by: Giancarlo Delfin <gdelfin@meta.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-10-22 22:59:59 -07:00
ff93cc8c84
[CORE] Support Prefix Caching with Prompt Embeds ( #27219 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
2025-10-22 22:18:07 -07:00
243ed7d32e
[Bugfix][Core] running queue index leakage exception ( #26754 )
...
Signed-off-by: CLFutureX <chenyongqyl@163.com >
2025-10-22 21:40:12 -07:00
7e0941055f
[Bugfix] Fix incorrect kv cache metrics in grafana.json ( #27133 )
...
Signed-off-by: Fangping Shi <fangping_shi@apple.com >
Co-authored-by: Fangping Shi <fangping_shi@apple.com >
2025-10-22 20:58:36 -07:00
6738e4a093
[Bugfix] Fix SLA tuner initialization ( #27355 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-22 20:43:04 -07:00
2566dca2a9
[Bugfix] Fix deepseek-ocr multi-image inference and add merge_by_field_config=True with tensor schema support ( #27361 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-22 17:15:38 -07:00
b4fda58a2d
[MLA] Bump FlashMLA ( #27354 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-22 15:48:37 -07:00
a0003b56b0
[Chore] Separate out system utilities from vllm.utils ( #27201 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-22 20:25:25 +00:00
5beacce2ea
[BugFix] bugfix for Flash Attention MLA with full cuda graph IMA following pr-25490 ( #27128 )
...
Signed-off-by: qqma <qqma@amazon.com >
Co-authored-by: qqma <qqma@amazon.com >
2025-10-22 19:36:39 +00:00
8669c69afa
[Feature] publisher default set zmq in kv_event config ( #26915 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-22 19:19:33 +00:00
1651003c35
[Prefix Cache] Use LoRA name for consistent KV-cache block hashing ( #27211 )
...
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com >
2025-10-22 18:13:03 +00:00
1cb8c6c5fe
[Doc] Fix numbering sequence in prefix caching ( #27357 )
...
Signed-off-by: William Song <jinwook@umich.edu >
2025-10-22 17:35:47 +00:00
e05a6754a8
[Model] Revert PR #26715 : Restore custom PaliGemma and Gemma3-MM impl… ( #27309 )
...
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com >
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com >
2025-10-22 10:05:34 -07:00
084a9dae80
[Bugfix] Disable FlexAttention direct block mask building for encoder-only models ( #27344 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-22 16:39:08 +00:00
c9461e05a4
Support Anthropic API /v1/messages Endpoint ( #22627 )
...
Signed-off-by: liuli <ll407707@alibaba-inc.com >
Co-authored-by: liuli <ll407707@alibaba-inc.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-22 09:13:18 -07:00
4dfdb821c8
[P/D] Dynamic kv_output_aggregator collect size ( #26734 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-22 18:07:58 +02:00
58fab50d82
[Frontend] Require flag for loading text and image embeds ( #27204 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-22 15:52:02 +00:00
db6f28d898
[Bugfix] Fix HF format InternVL large variants video processing ( #27330 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-22 08:39:23 -07:00
14e2f1231e
[Bugfix] Make get_mrope_input_positions instance methods ( #27342 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-22 08:38:34 -07:00
7c4767f1eb
[NIXL] use Host buffer to support TP_ratio > 1 for XPU ( #27140 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
Signed-off-by: Chendi.Xue <chendi.xue@intel.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2025-10-22 15:28:13 +00:00
9771e0b432
[Bugfix] Add missing 'is_internal_router' attribute to FusedMoEWithLoRA ( #27351 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-22 08:19:12 -07:00
980de31ca0
[bugfix] remove unused parameters to reduce unnecessary vram usage ( #26789 )
...
Signed-off-by: Reinforce-II <fate@eastal.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-22 08:16:09 -07:00
1c160841ea
[Bug] Fix DeepSeek-V2.5-1210-FP8 issue ( #27267 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-22 11:00:10 -04:00
4ca13a8667
[NIXL] Terminate handshake listener thread in shutdown ( #26404 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-10-22 16:59:53 +02:00
675aa2ec64
[Model] Upstream Deepseek-OCR model ( #27247 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-22 07:59:15 -07:00
3ae082c373
[Chore] Separate out optional dependency checks from vllm.utils ( #27207 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
Signed-off-by: dongbo910220 <32610838+dongbo910220@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-22 10:44:21 -04:00
49c00fe304
Mirroring changes in test-pipeline.yaml into test-amd.yaml ( #27242 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-10-22 09:59:45 -04:00
141d3b9fc5
[docs] Update v1 metrics design doc ( #27332 )
...
Signed-off-by: Simon Mo <simon.mo@hey.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: atalhens <sneh.lata@nutanix.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: atalhens <sneh.lata@nutanix.com >
2025-10-22 06:29:15 -07:00
abf3db40ef
[Core] Handle MoE LoRA edge cases ( #27335 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-22 13:14:33 +00:00
8e4ca4d14e
Bugfix - pass 'max_num_tokens_padded' into 'moe_lora_align_block_size' ( #27311 )
...
Signed-off-by: gnovack <gnovack@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-22 12:23:57 +00:00
1a0f4defb7
[Log] Add Warning for LLM(data_parallel_size=k) single-process DP Usage ( #27282 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-22 12:12:21 +00:00
843af7f7fc
[Bugfix][CPU] Disable dual stream execution for experts on CPU ( #27320 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-10-22 11:02:27 +00:00
1f633b8632
[Frontend][3/N] Improve all pooling task | Support binary embedding response ( #27066 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-22 18:38:57 +08:00
a4c29e6e82
fixed reasoning streaming with tool_choice="required" ( #24108 )
...
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com >
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-10-22 09:42:55 +00:00
8f18feb191
Remove last level references not removed in #26355 ( #27260 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-22 09:18:17 +00:00
ed540d6d4c
Update release pipeline for PyTorch 2.9.0 ( #27303 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-10-22 09:18:01 +00:00
f6027b2855
[1/N][Platform] Cleanup useless function ( #26982 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-22 09:04:57 +00:00
ab3e80042e
[torch.compile] Enable silu_mul_fp8_quant fusion without custom ops enabled ( #27146 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-22 00:22:39 -04:00
ceacedc1f9
[Benchmark] Add plot utility for parameter sweep ( #27168 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-21 20:30:03 -07:00
bfa59be8f1
[CI] Nixl integration tests DP-EP ( #27199 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-22 11:17:48 +08:00
265ecb05fb
[DOC] [ROCm] Add ROCm quickstart guide ( #26505 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-22 03:10:48 +00:00
09a7e6f617
[Deepseek v3.2] Remove extra logics in indexer ( #26465 )
...
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com >
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
Signed-off-by: Lain <siyuanf@nvidia.com >
Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
2025-10-21 23:34:03 +00:00
6c2eef5a5d
[P/D] KVConnector for decode benchmarking ( #25986 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-10-21 16:30:47 -07:00
19748806f0
[Bugfix] skip cuda graph for drafter when running with eager ( #26821 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-21 15:39:09 -07:00
4a8a567e16
Updated xgrammar backend to not deny supported string formats ( #27253 )
...
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com >
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-21 22:25:23 +00:00
344a0017c0
[Performance] Dual stream execution of "shared_experts" and "selected_experts" inside FusedMoE ( #26440 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-10-21 21:38:29 +00:00
becb7de40b
Update PyTorch to 2.9.0+cu129 ( #24994 )
...
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-21 17:20:18 -04:00
250fb1b8ea
[Bugfix] fixes the decoding metadata of dense mla's fp8 kvcache. ( #27144 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-21 18:27:03 +00:00
647214f3d5
[V0 Deprecation] Remove V0 executors ( #27142 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-21 11:09:37 -07:00
ddeec11ba9
[Bugfix][P/D] Reduce num_threads used by nixl ucx backend ( #27196 )
...
Signed-off-by: David Whyte-Gray <40244437+dagrayvid@users.noreply.github.com >
2025-10-21 13:41:52 -04:00
86ed77022d
[Feature] Batch Invariant for R1 TP 8 on Blackwell ( #27229 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-21 10:25:55 -07:00
aa1356ec53
[ROCm] Update Triton, Torch, and AITER branches for ROCm base Dockerfile ( #27206 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2025-10-21 12:01:23 -04:00
ecc3c0940a
Add @pavanimajety to .github/codeowners for Flashinfer, ModelOpt related code ( #27213 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-10-21 22:59:53 +08:00
ba09652de2
[ROCM] Enable CompressedTensorsWNA16 ( #27187 )
...
Signed-off-by: JartX <sagformas@epdcenter.es >
2025-10-21 10:43:23 -04:00
bd66b8529b
[CI] Install pre-release version of apache-tvm-ffi for flashinfer ( #27262 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-21 14:23:56 +00:00
6c728f7771
[Chore] Separate out NCCL utilities from vllm.utils ( #27197 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-21 06:18:23 -07:00
80e9452984
[Deepseek v3.2] Optimize top_k_per_row ( #26763 )
...
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com >
2025-10-21 08:30:07 +00:00
c3a2c6ac5f
[MM][Core] Decouple ViT backend from LM backend ( #27061 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-10-21 00:30:10 -07:00
72f431e709
[Nixl] Minor refactor to handshake related metadata ( #26410 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-21 09:07:47 +02:00
be4445072c
[Fix][Spec Decode] Fix llama4 draft loading with different quantization ( #27136 )
...
Signed-off-by: linzebing <linzebing1995@gmail.com >
2025-10-20 23:19:00 -07:00
f381cf2302
[Bugfix] Fix broken MTP weight loading for FP8 KV Scales ( #27227 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2025-10-20 22:51:44 -07:00
5ff5d94e77
[Bugfix] Fix gpt-oss w4a8 DP/EP on B200 ( #26729 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-21 01:51:14 -04:00
f95da13c3d
[ModelOpt] Load w13/w2_input_scale for all experts, nvfp4 ( #26135 )
...
Signed-off-by: Shu Wang <shuw@nvidia.com >
Signed-off-by: Shu Wang. <shuw@nvidia.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-21 01:50:31 -04:00
aef368aa08
[BugFix] GPT-OSS Attention DP + MoE TP weight loading issue ( #24032 )
...
Signed-off-by: Po-Han Huang <pohanh@nvidia.com >
2025-10-21 04:03:47 +00:00
5f6cbf60d6
[Feature][Kernel]FusedMoE LoRA ( #21229 )
...
Signed-off-by: wuchen <cntryroa@gmail.com >
Signed-off-by: banjuede <lmklhc@163.com >
Signed-off-by: Chen Wu <cntryroa@gmail.com >
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: bk-201 <joy25810@foxmail.com >
Co-authored-by: wuchen <wuchen@zetyun.com >
Co-authored-by: Nathan Van Gheem <vangheem@gmail.com >
Co-authored-by: banjuede <lmklhc@163.com >
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: bk-201 <joy25810@foxmail.com >
2025-10-21 03:01:37 +00:00
3ada34f9cb
[Frontend] Enforce tokenize=False when applying chat template ( #27205 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-21 02:57:34 +00:00
0eb8f2b880
create is_in_the_same_node on cpu ( #26832 )
...
Co-authored-by: Lunwen He <lunwenh@meta.com >
2025-10-21 02:04:14 +00:00
163965d183
[cpu] Dispatch un-quantized linear to oneDNN/ACL by default for AArch64 ( #27183 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
Co-authored-by: Michael Yang <Michael.Yang@arm.com >
2025-10-21 02:02:58 +00:00
a03cf9bc70
[V0 Deprecation] Remove V0 metrics code ( #27215 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-21 02:02:10 +00:00
352c0c8a28
[Quantization] Automatically infer AWQ modules_to_not_convert field ( #26909 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-21 01:49:28 +00:00
bfe0b4bd2a
[ez] add uv lock to gitignore ( #27212 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-10-21 00:37:44 +00:00
58fbbcb2f5
[ROCm] enable some tests in entrypoints test groups on AMD ( #26725 )
...
Signed-off-by: Yida <yida.wu@amd.com >
2025-10-21 00:37:16 +00:00
87778d5f00
[Feature][Quantization] auto_round support for mixed bits quantization ( #23812 )
...
Signed-off-by: n1ck-guo <heng.guo@intel.com >
Signed-off-by: Heng Guo <heng.guo@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-20 22:23:30 +00:00
f9e7ad5400
[Bugfix][CI] Fix Distributed Tests (4 GPUs) async_sched+ray test ( #27195 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-20 16:34:54 +00:00
4d0f266113
[Kernel][Model] Tune fused_moe Triton configs for Qwen3-30B A3/A3B on H100 (FP8/BF16) ( #26268 )
...
Signed-off-by: Shivam <shivampr.dev@gmail.com >
2025-10-20 07:48:01 -07:00
e93ff6c8b9
Nemotron Nano V2 VL + EVS Video Support ( #27107 )
...
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com >
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Natan Bagrov <nbagrov@nvidia.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-20 22:19:11 +08:00
1c691f4a71
AArch64 CPU Docker pipeline ( #26931 )
2025-10-20 07:09:40 -04:00
9fce7bee74
[Kernel] Accelerate solve_tril with TMA ( #26746 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2025-10-20 05:39:02 +00:00
b63f2143f8
[LoRA] LoRA cuda graph specialization ( #25914 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-20 04:21:09 +00:00
f32bf7582e
[Model][VLM] Support Bee-8B Model ( #27012 )
...
Signed-off-by: uyzhang <yi.zhang.4096@gmail.com >
Signed-off-by: Yi Zhang <zhangyi970819@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-10-20 02:31:26 +00:00
8a81d776ce
Fix typo in ValueError message: use kv_role instead of kv_disagg_role ( #27166 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com >
2025-10-19 19:47:19 +00:00
f6fdacd82c
[Bugfix] Fix error with penalties when speculative decoding and structural output are enabled ( #26586 )
...
Signed-off-by: southfreebird <yvorott@gmail.com >
2025-10-19 19:24:46 +00:00
d31f7844f8
[Misc] Move utils to avoid conflicts with stdlib, and move tests ( #27169 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-19 05:20:55 -07:00
7a6c8c3fa1
[Chore] Separate out vllm.utils.network_utils ( #27164 )
...
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com >
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com >
2025-10-19 03:06:32 -07:00
221bf72577
output type conversion fix ( #27159 )
2025-10-19 08:10:07 +00:00
b3aba04e5a
[Benchmark] Convenience script for multiple parameter combinations ( #27085 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-18 23:57:01 -07:00
8a297115e2
[Chore] Separate out hashing utilities from vllm.utils ( #27151 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-19 11:09:38 +08:00
191eed0bb9
[BugFix] Fix lazy imports involving outlines_core ( #27158 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-10-19 02:35:32 +00:00
fb860670da
[Minor] Remove unused env variable ( #27161 )
2025-10-18 18:48:35 -07:00
83e760c57d
[V1][Metrics][Plugin] Add plugin support for custom StatLoggerBase implementations ( #22456 )
...
Signed-off-by: tovam <tovam@pliops.com >
2025-10-18 15:12:46 -07:00
c2bba69065
[BugFix] Disable fp8 kv-cache by default for DeepSeek V3.2 ( #27121 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-18 22:05:23 +00:00
e133d6d218
[BugFix] fix graph partition signature ( #27139 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-18 17:34:36 -04:00
a1946c9f61
[Chore] Separate out profiling utilities from vllm.utils ( #27150 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-18 19:12:01 +00:00
9f020f4f31
[BugFix] Fix failing gemma-3-1b-it test: test_lm_eval_accuracy_v1_engine[google/gemma-3-1b-it] ( #27111 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-10-18 12:44:39 -06:00
3b45075206
[Minor] Add some clarifying comments to recent changes ( #27130 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-18 09:52:45 -07:00
168e578efc
Fix incorrect string formatting in barrier timeout exceptions ( #27149 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com >
2025-10-18 09:51:57 -07:00
6ac5e06f7c
[Chore] Clean up pytorch helper functions in vllm.utils ( #26908 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: isotr0py <2037008807@qq.com >
2025-10-18 09:48:22 -07:00
5c2acb270a
[Models][QwenVL] Remove unnecessary .contiguous() calls ( #27106 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-18 07:05:05 -07:00
b26b70bec4
[Misc] Refactor get_kv_cache_spec into AttentionLayerBase ( #26587 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-18 13:51:21 +00:00
ab4be40fc5
[fix][cpu] fix prefill attention in CPU attention backend ( #27035 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com >
2025-10-18 13:30:21 +00:00
245e4f2c01
[Feature] Batch Invariant: Support DeepGEMM and Blackwell ( #27127 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-18 09:28:05 -04:00
1d165d6d85
[Chore] Separate out vllm.utils.mem_utils ( #27143 )
...
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com >
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com >
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-18 10:06:59 +00:00
83004020fd
[Test] Add test for /health endpoint on engine failure ( #26074 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com >
2025-10-18 09:59:05 +00:00
12e21701e7
[DOC][FEATURES][CPU]update cpu feature for v1 ( #27135 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-18 01:10:45 -07:00
30a33b92ee
[Misc] Rev DeepEP ( #27122 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-18 14:54:29 +08:00
7c572544e4
[GPT-OSS] Structure_Tag support for gpt-oss tool-call in cot ( #25515 )
...
Signed-off-by: Hanchenli <lihanc2002@gmail.com >
Signed-off-by: Hanchenli <61769611+Hanchenli@users.noreply.github.com >
Signed-off-by: Wei Wei <wwei6@meta.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Wei Wei <wwei6@meta.com >
Co-authored-by: Wei Wei <weiweinpu@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-10-17 21:55:54 -07:00
c312320764
[CI/Build] tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache layout in backend-correctness tests ( #26663 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-17 21:11:26 -07:00
c981f0ea78
[Perf] Add H100 fused MoE config ( #25398 )
...
Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com >
2025-10-18 02:21:27 +00:00
6367bde739
[BugFix][Core] Fix error when enable async-scheduling in multi-node env ( #25887 )
...
Signed-off-by: Lehua Ding <lehuading@tencent.com >
Signed-off-by: Lehua Ding <lehuading@qq.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
2025-10-17 22:16:18 +00:00
f50cc221ea
[Test] Make test_failure more stable for batch invariance ( #27054 )
2025-10-17 16:59:08 -04:00
acedc74b1a
[V1][Spec Decode] Fix greedy temperature detection after sampler refactor ( #27077 )
...
Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com >
Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com >
2025-10-17 13:27:47 -07:00
d29483b58a
[Minor] Remove unnecessary error message ( #27115 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-10-17 20:02:12 +00:00
950cf9e58e
[Bugfix] Use PIECEWISE cudagraphs on Blackwell if max_model_len > 131072 ( #27114 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-17 19:47:18 +00:00
3125d79950
[Chore] Remove unused PolyNorm layer ( #27110 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-17 19:03:43 +00:00
e33ee23ee3
[Bugfix] [AITER] [ROCm] Fix Quark MoE Quant Config and AITER Fused MoE quant type logic ( #27029 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-17 12:51:10 -06:00
b10c64c834
[ROCm][Bugfix][Model] Fix illegal memory access when running qwen3_moe models with rms_norm (Qwen3-235B-A22B, Qwen3-30B-A3B, etc.) ( #26192 )
...
Signed-off-by: Randall Smith <ransmith@amd.com >
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
Signed-off-by: rasmith <Randall.Smith@amd.com >
Co-authored-by: Randall Smith <ransmith@amd.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-17 14:17:18 -04:00
0925b28a8e
[ROCM] MoE fp4 CK kernel ( #26545 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
2025-10-17 14:06:33 -04:00
99722d5f0e
[CI] Remove forbidden slash ( #27112 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-17 09:38:00 -07:00
4c91a28e30
[bugfix] Qwen3-VL fix video incorrect timestamp calculations while do_sample_frames=True ( #27104 )
...
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com >
2025-10-17 16:26:33 +00:00
b038d9c40c
[Data-parallel] Allow DP>1 for world_size > num_gpus on node (8) ( #26367 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Rui Qiao <ruisearch42@gmail.com >
2025-10-17 08:24:42 -07:00
2ba60ec7fe
[CI] Nixl integration tests ( #27010 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-10-17 07:13:31 -07:00
bd7157a071
[torch.compile] Enable attention and allreduce fusion without custom ops enabled ( #24604 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-17 08:10:23 -06:00
be429d0cfd
Fix incorrect docstring for stop_profile() method ( #27101 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com >
2025-10-17 06:30:23 -07:00
c253745eb8
[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI350 and MI355 ( #25586 )
...
Signed-off-by: Reima Karhila <reima.karhila@amd.com >
Signed-off-by: xaguilar <Xavier.AguilarFruto@amd.com >
Co-authored-by: xaguilar <Xavier.AguilarFruto@amd.com >
2025-10-17 04:56:12 -07:00
daec4d2624
[Model]Improve Qwen3VLMoeForConditionalGeneration packed_modules_mapping ( #27096 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-17 04:47:00 -07:00
6c9fdbf725
[Docs] Replace rst style double-backtick with md single-backtick ( #27091 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-17 02:47:34 -07:00
483ea64611
[Docs] Replace all explicit anchors with real links ( #27087 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-17 02:22:06 -07:00
e20eba753b
[VLM][Refactor] Remove useless func get_input_positions in MRotaryEmbedding ( #27088 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-10-17 02:00:30 -07:00
bbc1b29665
Update troubleshooting.md and remind VLLM_TRACE_FUNCTION usage ( #27069 )
...
Signed-off-by: cong-meta <prowindy@hotmail.com >
2025-10-17 01:53:06 -07:00
acb1bfa601
[CI] fix docs build failed ( #27082 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-17 07:53:40 +00:00
75c7ad9918
[Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel ( #26717 )
...
Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
Signed-off-by: izhuhaoran <izhuhaoran@qq.com >
2025-10-17 07:30:35 +00:00
5550ff9c25
[CI/Build] Update compressed tensor test path to fix CPU CI ( #27068 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-10-16 22:34:56 -07:00
3aeb19a39e
[Model] Add support for LightOnOCR ( #26916 )
...
Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com >
Signed-off-by: Said Taghadouini <84044788+staghado@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-17 05:05:24 +00:00
8c017b3490
[Model] Always use Transformers backend for PaliGemma and Gemma3-MM ( #26715 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-17 05:03:35 +00:00
9c2c2287a0
[CI/Build] Update Llama4 eval yaml ( #27070 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-17 04:59:47 +00:00
fec2b341ad
[Kernel] Lazy import FlashInfer ( #26977 )
2025-10-17 04:48:18 +00:00
87bc0c492f
[Bugfix] Fix ReplicatedLinearWithLoRA ( #27065 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-17 04:43:16 +00:00
fe3b9372ad
[Core] Change execute_model_with_error_logging() to be a ctx manager ( #27060 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-17 11:45:32 +08:00
bde9e2272a
[Bugfix][Qwen] fixes the weights dtype in qwen3_next: it is actually a bfloat16 ( #27030 )
...
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com >
2025-10-17 03:37:52 +00:00
08405609cc
disable graph partition in custom op ( #26952 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Boyuan Feng <fby.1994@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-17 11:08:47 +08:00
ab81379ea6
[Perf] Exploit out-of-band buffers in shm_broadcast ( #26961 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-16 20:08:03 -07:00
4ffd6e8942
[Docs] Reduce custom syntax used in docs ( #27009 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 20:05:34 -07:00
965c5f4914
vllm bench serve shows num of failed requests ( #26478 )
...
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com >
2025-10-16 19:55:09 -07:00
4d055ef465
Remove unused imports ( #26972 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-16 19:51:17 -07:00
17c540a993
[torch.compile] fix simple inductor graph partition test ( #27050 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-16 21:09:36 -04:00
4d4d6bad19
[Chore] Separate out vllm.utils.importlib ( #27022 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-17 00:48:59 +00:00
11ae016bd7
[torch.compile] Passing only necessary compilation config to inductor pass config ( #27041 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com >
2025-10-17 00:01:52 +00:00
41d3071918
[NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel ( #26714 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-16 16:20:25 -07:00
fb5e10d3fb
Refactor Transformers backend to use mixins ( #26906 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 21:50:39 +00:00
b2f78cbad4
[small][batch invariance] Rename the env and internal flags to simplify usage ( #26855 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
2025-10-16 21:40:25 +00:00
23583ee28c
[Bug] Add Assertion for random-input-len / random-output-len ( #26834 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-16 21:36:39 +00:00
01c977e96d
[CI] Prune Quantization Tests and skip compilation ( #27038 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-16 17:26:35 -04:00
b3dda72c23
[Feature] Migrate DeepGEMM API from get_m_alignment_for_contiguous_layout to get_mk_alignment_for_contiguous_layout ( #26935 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-16 16:46:48 -04:00
fb0571b077
[GPTOSS][DP/EP][Marlin] Enable GPTOSS Batched DP/EP using Marlin kernels ( #25997 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-16 12:53:11 -07:00
2ed8b6b3d0
[Bug] Fix batch invariant test has to is ( #27032 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-16 19:45:14 +00:00
013abde6ef
Adding Warmup to Benchmark Serving ( #26943 )
...
Signed-off-by: Kimbo Chen <chentenghung@gmail.com >
2025-10-16 12:44:32 -07:00
a5464dcf92
[Compressed Tensors] Always clone output for compile robustness ( #26849 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-16 19:29:59 +00:00
ac3ed5a815
Support block size of 256 used by Intel HPU ( #26883 )
...
Signed-off-by: mandy-li <mandy.j.li@intel.com >
2025-10-16 15:10:57 -04:00
e6ba2000ae
[gpt-oss][1/N] EZ: refactor serving_responses for modularity ( #26948 )
...
Signed-off-by: Andrew Xia <axia@meta.com >
2025-10-16 18:44:06 +00:00
aa255ff55a
Support set in the CLI generation ( #27031 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 18:07:18 +00:00
7bb736d00e
Fix Qwen2.5 VL image grid docstring ( #27033 )
...
Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com >
2025-10-16 09:57:36 -07:00
9f4e30904b
[Model] Fix Qwen3VL mm mapping ( #27027 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-16 09:45:59 -07:00
5afd3276df
[Feature] Add process_weights_after_loading to AttentionImpl ( #26870 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-10-16 08:02:30 -07:00
43721bc67f
[CI] Replace large models with tiny alternatives in tests ( #24057 )
...
Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-16 15:51:27 +01:00
02d709a6f1
[docs] standardize Hugging Face env var to HF_TOKEN (deprecates HUGGING_FACE_HUB_TOKEN) ( #27020 )
...
Signed-off-by: Kay Yan <kay.yan@daocloud.io >
2025-10-16 15:31:02 +01:00
4a510ab487
[NIXL] Improve request_finished() debug logs ( #25665 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-10-16 15:55:17 +02:00
314fa8abbf
[Attention] Tune CUTLASS MLA num_splits ( #26846 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-16 06:36:09 -07:00
334535b6fb
[Benchmark] Show E2EL by default for pooling models ( #27014 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 12:47:09 +00:00
dcbb3f1871
[Bugfix] Correct LayerNorm epsilon parameter in modernbert.py ( #27008 )
...
Signed-off-by: bogdanm <152898065+bogdan01m@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-16 12:27:44 +00:00
00417f4e44
[MISC] fix import violations for re and triton modules ( #26654 )
...
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com >
Co-authored-by: Mengqing Cao <cmq0113@163.com >
2025-10-16 03:38:27 -07:00
ed344f4116
Cleanup code after Python 3.10 upgrade ( #26520 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-10-16 03:38:23 -07:00
e51928793e
[Model][Bugfix] fix ernie45 vl run failed from shared experts optimization ( #26885 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-10-16 03:37:35 -07:00
d2740fafbf
[Chore] Separate out vllm.utils.collections ( #26990 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 08:35:35 +00:00
17838e50ef
[Benchmark] Use truncation by default for pooling benchmarks ( #26992 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 16:02:39 +08:00
44c8555621
[CI/Build] Fix AMD import failures in CI ( #26841 )
...
Signed-off-by: zhewenli <zhewenli@meta.com >
2025-10-16 07:28:20 +00:00
f7d318de2b
[Hardware][CPU][PowerPC]Disable torch.compile() in toptopk sampling ( #26987 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
2025-10-15 22:36:59 -07:00
76f0d05bc6
[CI/Build] Update expected beam search output for Phi3V ( #26978 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 05:12:44 +00:00
7d8975de84
Deepseek-v3 Batch Invariant on 8xH100 ( #26609 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-15 22:06:02 -07:00
785d8b6410
[PERF] Qwen3-next MTP speedup (change bool mask indexing to index_select / index_copy to reduce d2h) ( #26437 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2025-10-16 12:18:31 +08:00
f6cdc9a02f
[Chore] Rename utils submodules ( #26920 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-16 03:58:13 +00:00
509cdc0370
[DOC][XPU]update feature parity with Intel GPU ( #26954 )
...
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com >
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-15 20:07:10 -07:00
9b6504c307
[BugFix] Work around graph partition x torch.compile cache issue ( #26956 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2025-10-15 20:06:11 -07:00
e19b16dde6
[bugfix] Fix SP + PP without specifying compile size ( #26955 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-15 20:05:33 -07:00
582f2c6be7
[BUG] Allow runai_streamer_sharded in config check ( #26958 )
...
Signed-off-by: ahao-anyscale <ahao@anyscale.com >
2025-10-15 20:05:14 -07:00
f8a0acbdbe
[CI] Enable Blackwell Llama4 MoE tests ( #26731 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-15 21:02:57 -06:00
1317034379
[ROCm][FEAT] Fuse DeepSeek shared experts into AITER fused_moe ops ( #24097 )
...
Signed-off-by: chenjun <junchen2@amd.com >
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com >
Co-authored-by: valarLip <103567126+valarLip@users.noreply.github.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2025-10-16 10:41:34 +08:00
0ecc553ee6
[Bugfix] reasoning_parser parameter handling in run_batch.py ( #26225 )
...
Signed-off-by: inc-jeong <inc.jeong@navercorp.com >
Signed-off-by: InChang Jeong <inc.jeong@navercorp.com >
Co-authored-by: USER <user@AL02367916.local >
2025-10-16 10:24:05 +08:00
f96bc3649c
[Qwen3-Next] Add tuned MoE config for Qwen3-Next FP8 on H100 tp2 ( #26887 )
...
Signed-off-by: Felix Zhu <felixzhu555@gmail.com >
2025-10-15 18:55:05 -07:00
938c43ea7f
[ci] Adjusting AMD test composition 2025-10-14 ( #26852 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-10-15 23:52:13 +00:00
0a9ef0cfce
Move query quantization to attention layer for Flashinfer & Triton. ( #26534 )
...
Signed-off-by: adabeyta <aabeyta@redhat.com >
Signed-off-by: Adrian Abeyta <aabeyta@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-15 19:01:38 -04:00
e5b438a247
[Bug] Temporally Disable VLLM_ALLREDUCE_USE_SYMM_MEM by Default ( #26925 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-15 16:18:50 -04:00
0b99f5d302
support flashinfer_fp4 moe for 5090 gpu ( #26669 )
...
Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-15 15:06:47 -04:00
1f491aa0c8
Vectorize RMS norm variance using vectorize_read_with_alignment ( #26234 )
...
Signed-off-by: Benji Beck <benjibeck@meta.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-15 11:54:41 -07:00
de92d916fe
[NVIDIA] Add support for cudnn fp4 gemm via flashinfer ( #26107 )
...
Signed-off-by: kaixih <kaixih@nvidia.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-10-15 13:53:00 -04:00
a1063628a4
[Chore] Clean up CODEOWNERS ( #26923 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-10-15 10:52:54 -07:00
d796375258
[ModelOpt] Remove NVFP4 MoE K%16==0 constraint ( #26891 )
...
Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com >
2025-10-15 13:06:17 -04:00
14f8456344
[Feature]: Use pydantic validation in observability.py config ( #26637 )
...
Signed-off-by: Samuel Wu <cernunnos1710@gmail.com >
Signed-off-by: Sam/Samuel <57896620+cern1710@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-15 16:44:03 +00:00
4794c2bd92
Olmo 3 tool parser and tests ( #26143 )
...
Signed-off-by: Pradeep Dasigi <pradeepd@allenai.org >
2025-10-15 16:36:12 +00:00
d3cbaa08dc
Lower sevarity of log when model info cache misses due to exception ( #26917 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-15 09:01:09 -07:00
828523ad8e
[Chore] Separate out vllm.utils.async_utils ( #26913 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 15:33:00 +00:00
136a17fe6e
[Chore] Separate out vllm.utils.func ( #26904 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 13:03:58 +00:00
f57438338d
[BugFix] Patch inductor memory plan logic ( #26878 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-15 12:51:45 +00:00
5d598680e3
chore: remove unused marker ( #26890 )
...
Signed-off-by: Max Wittig <max.wittig@siemens.com >
2025-10-15 05:40:33 -07:00
8f4b313c37
[Misc] rename torch_dtype to dtype ( #26695 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-15 12:11:48 +00:00
f93e348010
[Misc] Remove isort and yapf ignores ( #26888 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 12:09:03 +00:00
f54f85129e
[Model][2/N] Improve all pooling task | Support multi-vector retrieval ( #25370 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-15 11:14:41 +00:00
d4d1a6024f
[Lora]Load tuned multi-lora kernel configs from json files ( #26319 )
...
Signed-off-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com >
Signed-off-by: Haipeng Li <li2haipeng@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-15 09:45:14 +00:00
db1764e4e0
[Platform] allow platform to init dp group ( #22243 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-15 02:32:17 -07:00
7f83b4ee8e
[Easy] Get rid of unnecessary paraenthesis in kv_cache_manager ( #26842 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-15 09:17:43 +00:00
5c3bae1a6a
[Fix] Remove divisibility requirement between num_kv_heads and tp_size in bailing_moe ( #26876 )
...
Signed-off-by: vito.yy <vito.yy@antgroup.com >
2025-10-15 16:44:04 +08:00
5210dc3940
[Misc] Update TritonLanguagePlaceholder to have attributes that are used by Flash Linear Attention ops. ( #26853 )
...
Co-authored-by: Xudong Ma <mxd@meta.com >
2025-10-15 08:37:49 +00:00
650b51f9f9
[doc] add Context Parallel Deployment doc ( #26877 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-10-15 16:33:52 +08:00
6256697997
[Doc] ruff format remaining Python examples ( #26795 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 01:25:49 -07:00
71557a5f7c
[CI] Fix mypy for vllm/executor ( #26845 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-15 01:23:33 -07:00
f3c378ffa7
[CI/Build] Add Qwen2.5-VL-7B-Instruct ChartQA Accuracy Tests in CI ( #21810 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Signed-off-by: zhewenli <zhewenli@meta.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Ye (Charlotte) Qi <ye.charlotte.qi@gmail.com >
2025-10-15 08:09:56 +00:00
f5ed68ef63
[Deepseek-V3.2][Kernel] Integrate cuda indexer k cache gather ( #26456 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com >
2025-10-15 16:05:01 +08:00
efdef57b1f
[bugfix] Lazy import cv2 ( #26869 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-15 07:47:50 +00:00
b8a4572157
[Misc] Use helper function to generate dummy messages in OpenAI MM tests ( #26875 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-15 07:17:37 +00:00
302ef403a2
[DSA][MLA] Tiny refactor on DeepSeek to make it reusable for different backends ( #26656 )
...
Signed-off-by: MengqingCao <cmq0113@163.com >
2025-10-15 00:16:44 -07:00
8865da157b
[Bugfix][Multi Modal] Fix incorrect Molmo token processing ( #26873 )
...
Signed-off-by: sanghol <sanghol@allenai.org >
2025-10-15 07:13:59 +00:00
f0862eae43
[Graph Partition] pass tests for decorator ( #26831 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-15 06:39:48 +00:00
8c851f6d04
[Bugfix] Fix qwen3-omni audio truncation issue ( #26815 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-15 05:38:36 +00:00
7cfa420f49
[BugFix] Patch inductor partitioning logic ( #26735 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-15 05:04:32 +00:00
a27b288e4a
[Feature] default --extra-body param to disable thinking in vllm bench serve ( #26784 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-10-15 04:23:44 +00:00
e471d7ca7e
[CI/Build][Bugfix] fix qutlass cmake error when set QUTLASS_SRC_DIR ( #26773 )
...
Signed-off-by: izhuhaoran <izhuhaoran@qq.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-10-15 04:09:44 +00:00
c43ca8259e
[Docs] Move build.inc into arm.inc ( #26862 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-10-14 20:35:08 -07:00
85a65e7f51
[Model] Add DeepSeek-V3.1 reasoning parser (split from PR #24972 ) ( #25589 )
...
Signed-off-by: taohui <taohui3@gmail.com >
Signed-off-by: Tao Hui <taohui3@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com >
2025-10-15 11:09:52 +08:00
a2986b3e33
[Bugfix] Fixes prefix-repetition benchmark script ( #26828 )
...
Signed-off-by: Kourosh Hakhamaneshi <Kourosh@anyscale.com >
2025-10-15 02:54:43 +00:00
96b9aa5aa0
[Frontend][torch.compile] CompilationConfig Overhaul ( #20283 ): name change compilation level to compilation mode, deprecation compilation level ( #26355 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com >
Signed-off-by: Morrison Turnansky <mturnans@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-10-15 02:51:16 +00:00
e66d787bce
Disable FlashInfer sampler by default ( #26859 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-15 02:35:18 +00:00
bfad142e25
[BUGFIX][NIXL] quick fix for 'assert self.connector_worker is not None' in get_kv_connector_stats ( #26851 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-15 02:33:25 +00:00
9354660036
[Bugfix]fix Qwen3 xml tool parser ( #26345 )
...
Signed-off-by: Zhikaiiii <1658973216@qq.com >
2025-10-15 09:50:30 +08:00
07ca70af8d
[Core][Easy] Use envs.__getattr__ for all Unify to environment variable access ( #26810 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-15 01:41:18 +00:00
2dcd12d357
[torch.compile] Fix tests for torch==2.9 inductor partition ( #26116 )
...
Signed-off-by: ProExpertProg <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2025-10-14 19:55:02 -04:00
579d2e5458
[WideEP][P/D] Add usage stats for DP+EP and KV Connector ( #26836 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
2025-10-14 23:51:54 +00:00
0512c04aee
[frontend][gptoss] Add per turn stats into Harmony Context ( #25061 )
...
Signed-off-by: lacora <hyelacora@gmail.com >
Co-authored-by: Ye Hu <yehu@fb.com >
2025-10-14 16:48:13 -07:00
7e0ef4084a
[CI Failure] Fix torchao dep failure for Quantization Test ( #26824 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-14 16:41:43 -07:00
4aed506b65
[Core] Streamline some structured output related code ( #26737 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-14 23:27:44 +00:00
a86b4c58e8
remove attn output view kernel ( #26680 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
Signed-off-by: Boyuan Feng <fby.1994@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 22:53:10 +00:00
ff4810ba73
[Minor] Group async_scheduling related fields in model runner init ( #26736 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-10-14 14:46:37 -07:00
9d6964926e
fix: response_format for completion ( #23212 )
...
Signed-off-by: Nan2018 <qinnanjoshua@gmail.com >
2025-10-14 21:23:22 +00:00
0e65818910
Added MoE configs for llama 4, H200 device with tp=4/8 tuning ( #26837 )
...
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com >
2025-10-14 14:21:03 -07:00
380f17527c
[Perf] Cache vllm.env.__getattr__ result to avoid recomputation ( #26146 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-14 17:03:21 -04:00
b92ab3deda
Notice for deprecation of AutoAWQ ( #26820 )
...
Signed-off-by: HDCharles <39544797+HDCharles@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 13:39:59 -07:00
acaa2c0a4a
[Core] Reuse empty block lists whenever possible in KVCacheBlocks to mitigate GC costs ( #24964 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-14 12:58:43 -07:00
82af928c41
[Attention][Spec Decode] FlashMLA spec decode support ( #26541 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-10-14 19:38:20 +00:00
87efc681db
llama4_vision_rope: add HIP override to accept (q, k) and avoid (positions, q, k) mismatch ( #26790 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com >
2025-10-14 11:54:12 -07:00
c3a722fcb2
[CI Failure] Fix tests with missing TinyLlama-1.1B-Chat-v1.0-FP8-e2e ( #26816 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-14 18:38:59 +00:00
aba48f7db1
[Kernel][MoE] Add MoE tunings for GLM 4.6-FP8 and GLM 4.5 Air on NVidia B200 ( #26818 )
2025-10-14 11:20:39 -07:00
04b5f9802d
[CI] Raise VLLM_MAX_SIZE_MB to 500 due to failing Build wheel - CUDA 12.9 ( #26722 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-14 10:52:05 -07:00
efc8f7d814
Update coveragerc and add codecov.yml for path fixes ( #26435 )
...
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com >
2025-10-14 09:45:06 -07:00
6d87a2838c
[Config] Remove Unused Environment Variable VLLM_DISABLE_PAD_FOR_CUDAGRAPH ( #26743 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-14 11:47:49 -04:00
e6cdbd6792
Revert "[issues template] Encourage the author implement their own ideas" ( #26814 )
2025-10-14 08:37:34 -07:00
df850c4912
[Feature][Responses API] Stream Function Call - harmony ( #24317 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-14 08:31:43 -07:00
720394de43
[KVConnector][Metrics] Aggregate scheduler-side KVConnectorStats ( #26046 )
...
Signed-off-by: Qier Li <kevin44036@gmail.com >
2025-10-14 14:38:07 +00:00
88a49745af
[issues template] Encourage the author implement their own ideas ( #26671 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-14 22:32:36 +08:00
ca683a2a72
use combo kernel to fuse qk-norm and qk-rope ( #26682 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-10-14 09:40:59 -04:00
e9f1b8c9e9
Adjusted the model order of the model registration file ( #26798 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
2025-10-14 13:26:11 +00:00
ea97940d6c
[DCP] Support Decode Context Parallel (DCP) for GQA with FlashAttention ( #24864 )
...
Signed-off-by: yuanyongjie.yyj <yuanyongjie.yyj@antgroup.com >
Signed-off-by: FENP <32334296+FENP@users.noreply.github.com >
Signed-off-by: Jaya Yuan <yuanyongjie.yyj@antgroup.com >
2025-10-14 13:07:50 +00:00
fdd32750f0
[CI/Build] Cleanup LoRA test ( #26752 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-14 12:06:35 +00:00
c715ba3735
[Feature] Change vllm.py with pydantic validation ( #26726 )
...
Signed-off-by: Vladislav <vladislav.bronzov@gmail.com >
Signed-off-by: Vladislav Bronzov <58587565+VladOS95-cyber@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-14 12:00:54 +00:00
9c4cb68339
[Chore] Remove SupportsV0Only interface and update supported models docs ( #26783 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 04:55:10 -07:00
780eb03d9b
[CI] Fix test_tool_id_kimi_k2 ( #26787 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-10-14 10:27:07 +00:00
ef9676a1f1
[Doc] ruff format some Python examples ( #26767 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 03:21:53 -07:00
70b1b330e1
Don't allow typos to fix by default ( #26785 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-14 03:05:15 -07:00
d1d063a588
[Chore] Use max_transformers_version for Qwen-VL test ( #26792 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 03:03:46 -07:00
7e6edb1469
[NIXL][HeteroTP] Enable KV transfer from HND prefill to NHD decode ( #26556 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-10-14 09:46:05 +00:00
74704d4553
[Model] Use merge_by_field_config for MM models (O-P) ( #26776 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 09:42:45 +00:00
d2f816d6ff
[Bugfix] Standardize merging multimodal embeddings ( #26771 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 09:36:21 +00:00
577d498212
[Plugin] Make plugin group clear ( #26757 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-10-14 07:49:59 +00:00
fd85c9f426
[Bugfix][FE]: Always include usage with --enable-force-include-usage ( #20983 )
...
Signed-off-by: Max Wittig <max.wittig@siemens.com >
Signed-off-by: Antoine Auger <antoineauger@users.noreply.github.com >
Co-authored-by: Antoine Auger <antoineauger@users.noreply.github.com >
2025-10-14 09:17:39 +02:00
d32c611f45
[CI/Build] Use 127.0.0.1 instead of localhost in utils ( #26750 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-10-14 07:04:00 +00:00
01ad27faff
[Model][Bugfix]fix ernie45 load failed due to ernie45 eplb code ( #26684 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-10-14 06:55:23 +00:00
481545b397
scheduler.py: Update the name of the default scheduler. ( #26758 )
...
Signed-off-by: Ryan Li <ryanli@ryanli.org >
2025-10-14 06:52:21 +00:00
d3cc8427c0
[ci] Adding the test-amd.yaml for test definitions for the AMD backend. (alternative PR) ( #26718 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-10-13 23:10:23 -07:00
4821ac1b4d
[CI] [ROCm] Automate CC list for ROCm related issue ( #26753 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-10-14 13:57:26 +08:00
4497c8f821
Fix lora tests failure in TPU CI due to the removal of LoRA bias ( #26723 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
2025-10-14 13:04:23 +08:00
2e36cdbe2b
[Docs] Add a start tag to build.inc.md ( #26747 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-10-13 21:51:55 -07:00
fe3edb4cf0
Add support for the /rerank endpoint in vllm bench serve ( #26602 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
2025-10-14 04:25:43 +00:00
29350922c6
[Feature][Quantization] auto_round format add support for regex ( #24024 )
...
Signed-off-by: n1ck-guo <heng.guo@intel.com >
Signed-off-by: Heng Guo <heng.guo@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-10-14 03:03:16 +00:00
8ae169286f
[torch.compile] Unwrap fused_marlin_moe custom op ( #26739 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-14 02:22:16 +00:00
8a0af6a561
[build][torch.compile] upgrade depyf version ( #26702 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-10-14 10:12:09 +08:00
cfded80793
[Easy] Fix env type check errors from VLLM_DEBUG_LOG_API_SERVER_RESPONSE ( #26742 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-14 01:46:44 +00:00
b59dd19b55
[compile] Enable sequence parallelism for full cuda graph without specifying compile sizes ( #26681 )
...
Signed-off-by: angelayi <yiangela7@gmail.com >
2025-10-13 18:15:34 -07:00
3e051bda82
[UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend ( #26732 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-10-13 18:12:52 -07:00
8317f72354
[Misc][DP] support customized aggregated logger for dp ( #24354 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
2025-10-13 17:45:59 -07:00
d8bebb008a
Add tests for chunked prefill and prefix cache with causal pooling models ( #26526 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: Ayush Singh <ayush1009208@gmail.com >
2025-10-14 07:45:04 +08:00
35bc22f23c
[ResponseAPI] Further polish message serialization and unit tests ( #26728 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-13 23:31:35 +00:00
fa96fb9c70
Pruning kernel Core Tests ( #26727 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com >
2025-10-13 23:08:18 +00:00
e3fdb627d9
[FrontEnd] UNREVERT CompilationConfig overhaul ( #20283 ): deprecate use_inductor in favor of backend, simplify custom_ops ( #26502 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com >
Signed-off-by: Morrison Turnansky <mturnans@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Jiangyun Zhu <riverclouds.zhu@qq.com >
2025-10-13 22:47:16 +00:00
7200a21cd1
[Bug] Fix Assertion error DeepEP/csrc/kernels/intranode.cu:928: 'false and Unsupported type' ( #26532 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-13 18:26:37 -04:00
577c72a227
[CI Perf]Prune Tests in kernel/mamba ( #26538 )
...
Signed-off-by: Fardin Hoque <kfhfar@amazon.com >
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2025-10-13 18:22:31 -04:00
314285d4f2
[CI] Fix mypy for vllm/distributed ( #26593 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-13 16:02:24 -04:00
d2a7938582
[Frontend][1/N] Improve all pooling task | Support FP16 Embedding Base64 (Still uses fp32 by default). ( #26414 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-10-13 19:06:43 +00:00
89342ce4c0
[Quantization] [Performance] Enable Marlin GEMM kernels for the calibration-free RTN-based quantization ( #26051 )
...
Signed-off-by: Alex Kogan <alex.kogan@oracle.com >
Signed-off-by: Alex Kogan <82225080+sakogan@users.noreply.github.com >
2025-10-13 18:52:54 +00:00
f89f599395
[CI][Release][Arm64]: Build arm64 release for gpu arch 8.9 ( #26698 )
2025-10-13 18:42:12 +00:00
e251e457c5
[Log] Optimize Startup Log ( #26601 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-10-14 02:06:57 +08:00
afc47e4de7
[Model] Use merge_by_field_config for MM models (M-N) ( #26710 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-10-14 01:27:01 +08:00
e3b90c1ba2
[Bugfix][Speculative Decoding] Extend Eagle quantization config fix to llama_eagle.py ( #26590 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
2025-10-13 17:17:13 +00:00
134f70b3ed
[Bugfix][Rocm] fix qr error when different inp shape ( #25892 )
...
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: ilmarkov <markovilya197@gmail.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-10-13 10:04:21 -07:00
a1b2d658ee
[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 ( #26501 )
...
Signed-off-by: Sangyeon Cho <josang1204@gmail.com >
2025-10-13 12:58:33 -04:00
5c7fe25491
[Misc] Separate prompt logging to debug ( #26713 )
...
Signed-off-by: Aleksei Tsvetkov <aitsvet@ya.ru >
2025-10-13 09:04:18 -07:00
53c9a7cee2
[P/D] [NixlConnector] kv load recovery integration ( #26171 )
...
Signed-off-by: Will Eaton <weaton@redhat.com >
2025-10-13 08:48:04 -07:00
0d21b9b51e
[UX] Speedup DeepGEMM warmup with heuristics ( #25619 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-10-13 07:59:27 -07:00
10214b6935
[FEATURE]: Use pydantic validation in multimodal.py config ( #26629 )
...
Signed-off-by: Anand Roy <86306690+andycandy@users.noreply.github.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-13 07:56:59 -07:00
4a61950f4d
[Hardware][CPU] Disable torch.compile for RISC-V to prevent APIError ( #26693 )
...
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn >
Signed-off-by: ihb2032 <1355790728@qq.com >
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn
2025-10-13 07:56:01 -07:00
3263799056
[unrevert] Add batch invariant kernel override for FlashInfer backend [2/n] ( #26373 )
...
Signed-off-by: Bram Wasti <bwasti@meta.com >
Signed-off-by: Bram Wasti <bwasti@fb.com >
2025-10-13 10:24:53 -04:00
8e67b2557a
[Bugfix] Fix out of bound index issue for Jina-embedding-v3 RoPE with cuda graph ( #26687 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-10-13 03:21:48 -07:00
4073c82c4e
[ResponseAPI] Simplify input/output message serialization ( #26620 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com >
2025-10-13 09:59:15 +00:00
767c3ab869
[Model][0/N] Improve all pooling task | clean up ( #25817 )
...
Signed-off-by: wang.yuqi <noooop@126.com >
2025-10-13 16:44:50 +08:00
4f207c7174
Ignore large reformatting PRs in git blame ( #26690 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-10-13 01:20:47 -07:00
782505ed8e
[Model] Add reasoning_parser and tool_parser for Ernie45 thinking ( #25027 )
...
Signed-off-by: wangyafeng <wangyafeng@baidu.com >
2025-10-13 15:55:20 +08:00
98f30b8cba
[Model] Fix Skywork R1V mlp ( #26673 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-10-12 22:42:17 -07:00
3cd36660f7
docs: wrong command in structured_outputs README ( #26677 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-10-12 20:59:01 -07:00
46ad73955a
[FIX] Throwing an exception when the model does not support pool tasks ( #25840 ) ( #25855 )
...
Signed-off-by: zxw <1020938856@qq.com >
Co-authored-by: wang.yuqi <noooop@126.com >
2025-10-12 20:56:21 -07:00
41f3884438
[Bugfix][Core]Fix block table out-of-range issue in priority scheduling ( #26661 )
...
Signed-off-by: quanliu <18646313696@163.com >
2025-10-13 01:25:42 +00:00
60e419c1ee
[Misc] cache result of disable_inplace ( #26666 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-10-13 00:17:50 +00:00
866eef50ca
minor
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-24 15:29:27 +00:00
ad2cf805ad
Merge branch 'main' into woosuk/model-runner-v2
2025-09-24 08:19:25 -07:00
704def253c
Merge branch 'main' into woosuk/model-runner-v2
2025-09-23 21:08:15 +00:00
42f99150c1
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-23 09:23:21 -07:00
17c2c106b1
Merge branch 'main' into woosuk/model-runner-v2
2025-09-23 09:22:58 -07:00
72f0a71939
assert
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-21 19:37:18 -07:00
fe5472dc03
Merge branch 'main' into woosuk/model-runner-v2
2025-09-21 18:56:48 -07:00
bc73f674bb
compute_logits
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-21 11:26:33 -07:00
631b5b47c1
Merge branch 'main' into woosuk/model-runner-v2
2025-09-21 11:25:18 -07:00
42ffdd9179
wip
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-20 22:15:07 +00:00
8aee6e97e6
64-bit for gumbel seed
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-20 11:43:01 +00:00
913b8e9569
Merge branch 'main' into woosuk/model-runner-v2
2025-09-20 11:18:35 +00:00
158a46888e
random uuid
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-20 11:17:45 +00:00
98ef239486
minor
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-19 23:55:46 +00:00
a66aa37f40
minor:
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-19 23:47:20 +00:00
6f038fc4fb
Merge branch 'main' into woosuk/model-runner-v2
2025-09-19 20:30:04 +00:00
010e39ec7d
minor
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-19 19:07:46 +00:00
396bbe67d3
Merge branch 'main' into woosuk/model-runner-v2
2025-09-19 18:53:18 +00:00
c7f3e84b34
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-19 09:49:40 -07:00
a8e7071924
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-19 08:33:47 -07:00
4be2c66e37
fix
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-19 09:35:38 +00:00
d30c0d50a6
refactor
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-19 07:17:53 +00:00
9c75d896a8
minor
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-19 07:11:37 +00:00
37478c18cf
async output
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-19 07:10:42 +00:00
33672774f5
Merge branch 'main' into woosuk/model-runner-v2
2025-09-19 06:52:46 +00:00
0d3de9e082
fix
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-19 06:50:56 +00:00
b405d78c07
DP sampler
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-19 06:46:46 +00:00
8af87986aa
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 18:37:30 -07:00
af65838d1f
dummy run
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 18:29:18 -07:00
52ca2f517a
sample
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 17:39:43 -07:00
8deedfa42b
-inf
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 17:24:00 -07:00
b9c74487d2
logprobs
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 17:23:02 -07:00
31619ff412
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 16:38:56 -07:00
d2be62378b
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 16:33:18 -07:00
86dade710d
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 16:32:00 -07:00
efda08481b
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 16:31:01 -07:00
82da219ff9
Implement topk_logprobs
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 16:29:38 -07:00
323a05b3c5
update
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 15:51:36 -07:00
a98eff0762
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 15:21:30 -07:00
67d8c0c21b
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 15:15:31 -07:00
2bb2cb13f4
revert
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 14:54:19 -07:00
e171e5bb67
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 14:53:32 -07:00
8407fa02ed
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 14:52:23 -07:00
82e591f7eb
remove
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 14:35:25 -07:00
330058f9b8
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 14:30:29 -07:00
aabfaa08cf
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 14:14:03 -07:00
bc6463ac97
hash
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 13:49:52 -07:00
a4962833f9
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 13:20:37 -07:00
3f50030cc8
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 13:11:46 -07:00
cbdb47dc01
working
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 13:10:35 -07:00
92f337faeb
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 12:44:21 -07:00
9050087250
update
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 12:37:29 -07:00
c1d83f2bae
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-18 12:13:56 -07:00
91510260b2
task
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-16 01:06:10 -07:00
c320a33c59
skip warmup
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-16 07:21:25 +00:00
83d11373a4
wip
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-16 07:21:25 +00:00
dfc84b11a9
wip
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-16 07:21:25 +00:00
9f2becd3e6
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-16 00:16:42 -07:00
e107680d8a
wip
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-15 21:19:18 +00:00
f1981db101
minor
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-15 19:53:58 +00:00
69b17891a3
chunked prefilling
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-15 19:41:17 +00:00
67852c1036
minor
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai >
2025-09-15 19:23:54 +00:00
8b3c13c485
wip
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-15 11:17:54 -07:00
9a6fcca030
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-14 15:56:42 -07:00
633f9f006d
Merge branch 'main' into woosuk/input-prep
2025-09-14 08:03:28 -07:00
eb3742c72a
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-13 19:19:40 -07:00
e47bb9970b
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-13 19:19:07 -07:00
5c133fc860
reorder
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-13 19:17:40 -07:00
caf963f2e9
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-13 19:13:08 -07:00
9314a83b56
Merge branch 'main' into woosuk/input-prep
2025-09-14 00:44:56 +00:00
7a50a54390
Merge branch 'main' into woosuk/input-prep
2025-09-13 21:33:54 +00:00
787e59629c
wip
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-08 16:42:26 -07:00
5f95309a6d
rename
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-07 12:01:45 -07:00
286eeb91e8
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-07 11:16:37 -07:00
6283995a6c
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-06 21:18:16 -07:00
0c56069c7e
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-06 16:35:45 -07:00
8e6cb9aa4a
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-06 12:23:02 -07:00
ead95fe5dc
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-06 10:56:27 -07:00
23eae07ea5
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-04 20:19:22 -07:00
b16e2d9602
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-01 02:10:48 -07:00
4c2a337e67
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-01 01:45:29 -07:00
cc340e26af
top_p top_k
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-01 01:30:08 -07:00
01bf16ede4
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-01 01:16:26 -07:00
af7b6c5dd4
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-31 23:50:20 -07:00
62d23b3006
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-31 21:00:16 -07:00
ba1a58f51b
MAX_SPEC_LEN
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-31 20:43:25 -07:00
22771e5d83
work
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-31 20:41:38 -07:00
c11d1e6781
optimize spec
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-31 16:40:54 -07:00
e696f78e05
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-31 13:29:58 -07:00
efcb786d52
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-31 10:44:36 -07:00
9ee9d0e274
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-28 15:02:07 -07:00
405578121c
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-28 13:19:10 -07:00
19c0dfc469
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-28 13:08:07 -07:00
e451045a66
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-28 12:55:13 -07:00
efba25e21a
minor
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-28 12:39:15 -07:00
b21393cd98
Merge branch 'main' into woosuk/input-prep
2025-08-28 09:58:08 -07:00
d6d719fb24
Merge branch 'main' into woosuk/input-prep
2025-08-28 09:57:49 -07:00
e570b0a4de
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-27 21:45:11 -07:00
a851aaa0fc
simplify
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-25 09:23:05 -07:00
b1d52734f7
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-25 08:55:12 -07:00
65f93694be
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-25 08:54:32 -07:00
7b4b72e551
fix
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-24 18:49:23 -07:00
da9cd26c78
Merge branch 'main' into woosuk/input-prep
2025-08-24 18:36:33 -07:00
a1e3745150
wip
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-24 18:36:18 -07:00
48bca9a109
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-23 11:30:29 -07:00
64c8cced18
rename
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-22 01:48:35 -07:00
79e5eb3643
wip
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-22 01:37:43 -07:00
c472982746
merge
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-21 21:40:44 -07:00
699bd7928e
Merge branch 'main' into woosuk/input-prep
2025-08-17 19:28:38 -07:00
33a3a26ca5
wip
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-08-17 14:38:24 -07:00