youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
yewentao256	40464dbf34	rename Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-29 15:19:17 -07:00
yewentao256	981cc5fdbf	Merge branch 'main' into wentao-batch-invariance-dp	2025-10-29 15:18:33 -07:00
Roger Young	d6704dd099	Fix MiniMax-M2 rmsnorm precision and remove useless code (#27627 ) Signed-off-by: xuebi <xuebi@minimaxi.com> Co-authored-by: xuebi <xuebi@minimaxi.com>	2025-10-29 21:01:05 +08:00
Cyrus Leung	ecca3fee76	[Frontend] Add `vllm bench sweep` to CLI (#27639 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-29 05:59:48 -07:00
Zhewen Li	9a0d2f0d92	[CI/Build] Skip cpu offloading test on AMD (#27690 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-10-29 12:55:51 +00:00
Isotr0py	ad3ec89532	[VLM] Add Qwen3-VL generation test (#25185 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-29 12:19:37 +00:00
Kevin H. Luu	3481e40743	[chore] Remove models weight on S3 logic (#27725 ) Signed-off-by: kevin <kevin@anyscale.com>	2025-10-29 10:29:49 +00:00
Eugene Khvedchenya	5e72216d17	Feature/video support in random mm dataset (#25963 ) Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Signed-off-by: Eugene Khvedchenya <ekhvedchenia@nvidia.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-29 18:24:52 +08:00
Isotr0py	1a33aacf82	[Misc] Raise error for missing video metadata in `MultiModalDataParser` (#27664 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-29 10:06:42 +00:00
Yue Zhang	7ba6aa8f56	[Fix] import get_kv_cache_torch_dtype error in LMCacheConnector integration (#27670 ) Signed-off-by: KevinCheung2259 <2651309292@qq.com>	2025-10-29 10:03:54 +00:00
Alec S	ab2eb27b74	[Frontend] [gpt-oss] Mcp type bug (#27689 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: Alec Solder <alecs@fb.com> Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Co-authored-by: Alec Solder <alecs@fb.com> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-10-29 10:01:32 +00:00
Alec S	3c7fefdeba	[Frontend] [gpt-oss] Tool json call parsing error retry (#27675 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Signed-off-by: Alec Solder <alecs@fb.com> Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com> Co-authored-by: Alec Solder <alecs@fb.com> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-10-29 09:42:44 +00:00
bnellnm	1891cf605a	[Bugfix] Fix modular kernel tests (#27707 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-10-29 16:14:33 +08:00
Jiangyun Zhu	8df98c2161	[perf] Enable concurrent execution of "shared_experts" and "selected_experts" in qwen3-next (#27578 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-10-29 08:12:54 +00:00
Cyrus Leung	4fb8771cc0	[CI/Build] Move pre-commit only scripts to `tools/pre_commit` (#27657 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-29 08:04:33 +00:00
Dipika Sikka	413ef7a3b4	[Speculators] Move tests + fix integration (#27308 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Rahul Tuli <rtuli@redhat.com> Signed-off-by: rahul-tuli <rtuli@redhat.com> Co-authored-by: Rahul Tuli <rtuli@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-10-29 00:54:21 -07:00
Zhewen Li	8b62495076	[Bugfix] Fix non-contiguous tensor error in `rocm_unquantized_gemm_impl` (#27605 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-10-29 00:00:15 -07:00
Zhewen Li	83fd49b1fc	[CI/Build][Bugfix]Fix Quantized Models Test on AMD (#27712 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-10-29 06:27:30 +00:00
Shaoting	a4a4f0f617	[KV Connector] Update lmcache connector with latest compatibility (#27681 ) Signed-off-by: Samuel Shen <slshen@uchicago.edu> Co-authored-by: Samuel Shen <slshen@uchicago.edu>	2025-10-29 05:38:37 +00:00
Lukas Geiger	0d8161b075	[Model] Fix Qwen3VL and Qwen3Omni after torch.compile changes (#27705 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-29 05:28:20 +00:00
liuzhenwei	d2c33c397a	[NIXL][XPU] update name of nixl wheel (#27631 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>	2025-10-29 12:43:29 +08:00
Varun Sundar Rabindranath	f6d5f5888c	[Build] Revert triton_kernels requirements (#27659 )	2025-10-28 21:07:09 -07:00
Simon Mo	9007bf57e6	Revert "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" (#27714 )	2025-10-28 20:58:01 -07:00
Huy Do	f257544709	Install pre-built xformers-0.0.32.post2 built with pt-2.9.0 (#27598 ) Signed-off-by: Huy Do <huydhn@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> v0.11.1rc4	2025-10-28 19:39:15 -07:00
Jialin Ouyang	0b51c9bd8b	[Core] Early return in SlidingWindowManager.remove_skipped_blocks (#27673 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-10-29 01:32:33 +00:00
Wentao Ye	d3ab240f39	[Bug] Fix deepep low latency use nvlink by default (#27677 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-28 23:53:12 +00:00
Lucas Kabela	94666612a9	[Misc][qwen2_5_vl][torch.compile] Enable `supports_torch_compile` on generic nn.Module and demonstrate speedup on Qwen Vision model (#23207 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com> Signed-off-by: Lucas Kabela <lucasakabela@gmail.com>	2025-10-28 22:36:43 +00:00
Nick Hill	4fe5895361	[AsyncScheduling] Make async overlap work with logprobs (#27615 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-28 22:35:54 +00:00
yewentao256	b53a65fa46	update using skip if server is not up Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-28 14:58:07 -07:00
Or Ozeri	111faf1118	[Core] Scheduler: Publish connector events after output (#25875 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-10-28 21:01:33 +00:00
yewentao256	4f2a8d9d7f	Merge branch 'main' into wentao-batch-invariance-dp Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-28 14:00:34 -07:00
Wentao Ye	6afc28a9ba	[Test] Batch Invariant: Unit test using parameterized backend (#27478 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-28 13:51:35 -07:00
Lucas Wilkinson	141e6a0505	[Misc] Make reorder batch also separate extends (#27367 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-10-28 10:55:10 -07:00
Matvei Pashkovskii	130aa8cbcf	Add load pattern configuration guide to benchmarks (#26886 ) Signed-off-by: Matvei Pashkovskii <mpashkov@amd.com> Signed-off-by: Matvei Pashkovskii <matvei.pashkovskii@amd.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-28 10:49:15 -07:00
Zhengxu Chen	e3d8186666	[compile] Add fallback path to AOT compile when serialization fails. (#27350 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-28 12:54:26 -04:00
Cyrus Leung	f5710ef02a	[Misc] Make `LayerBlockType` a `Literal` instead of `Enum` (#27658 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-28 16:23:35 +00:00
Mohammad Miadh Angkad	a8c02fb5bf	[Bugfix][CI] Fix v1 attention backend tests and add CI coverage (#26597 ) Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu> Signed-off-by: Mohammad Miadh Angkad <mangkad.bsdsba2027@aim.edu> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-10-28 11:42:05 -04:00
Kero Liang	02af36df36	[Bugfix] Fix allocation & free logic of SingleWriterShmRingBuffer (#27117 ) Signed-off-by: Kero Liang <kerorek@outlook.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: donglu <donglu@cohere.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-28 15:01:24 +00:00
Zhiyuan Li	e88bdd60d9	[FLA] Introduce Kimi Delta Attention(KDA) to VLLM (#27654 ) Signed-off-by: lizhiyuan <lizhiyuan@moonshot.cn>	2025-10-28 22:56:28 +08:00
Samuel Shen	05e034f085	[nit]: Fix import for the lmcache integration (#27600 ) Signed-off-by: Samuel Shen <slshen@uchicago.edu> Co-authored-by: Samuel Shen <slshen@uchicago.edu>	2025-10-28 14:40:55 +00:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	936643a868	[BugFix] Also consider RAY_EXPERIMENTAL_NOSET_* when storing compilation cache (#27294 ) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2025-10-28 10:22:28 -04:00
Junpu Fan	b186149e8e	[Bugfix][Frontend] validate arg priority in frontend LLM class before add request (#27596 ) Signed-off-by: Junpu Fan <junpufan@gmail.com>	2025-10-28 14:02:43 +00:00
22quinn	2abbd351ef	[Core] Enable async scheduling for external_launcher mode (#27394 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2025-10-28 13:52:47 +00:00
wangln19	446912d1cb	fix: allow HuggingFace standard chat template params via **kwargs (#27622 ) Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com> Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-10-28 21:12:34 +08:00
Zhengxu Chen	a00d6254e9	[compile] Disable dynamo guards check for AOT compilation. (#27288 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-28 12:58:12 +00:00
Asaf Joseph Gardin	05181cc57f	[Hybrid] Add mamba_block_size to Engine Args (#27289 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-10-28 12:54:24 +00:00
Zhengxu Chen	259504e147	[compile] Add enable_prompt_embeds to compile hash. (#27285 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-28 20:46:03 +08:00
Wentao Ye	0484b64248	[Bug] Fix shape issue for eplb expert weights (#27589 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-28 20:44:05 +08:00
Cyrus Leung	f58d9b6404	[Misc] Separate out `utils.counter` and move `utils.Device` to engine (#27588 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-28 12:20:46 +00:00
Matthew Bonanni	44b5ce956d	[Bugfix] In LongRoPE, decide short vs long based on max_model_len (#27431 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-28 12:00:56 +00:00

1 2 3 4 5 ...

10838 Commits