youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Woosuk Kwon	1c3ffdbecc	[V0 Deprecation] Remove V0 sampling metadata (#25345 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-09-21 10:37:11 -07:00
Rahul Tuli	c438b2951c	feat: Enable engine-level arguments with speculators models (#25250 ) Signed-off-by: Rahul Tuli <rtuli@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-09-21 11:04:45 -06:00
Woosuk Kwon	0ff8ebb2d7	[V0 Deprecation] Remove async_output_proc, preemption mode, delay factor (#25334 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-21 08:52:32 -07:00
Woosuk Kwon	26e673fe93	[V0 Deprecation] Remove V0 Sequence class & Sampler (#25332 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-09-21 08:52:15 -07:00
Cyrus Leung	65a5910ce3	[Optimization] Cache chat template result when processor fails to be loaded (#25341 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-21 19:41:02 +08:00
Simon Danielsson	9aea7373ff	[Bugfix] Typos in error message for missing model config file (#25339 ) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>	2025-09-21 04:36:47 -07:00
Roger Wang	30d08911f7	[MM][Perf] Minor Optimization on Qwen3-VL `fast_pos_embed_interpolate` (#25337 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-09-21 11:05:20 +00:00
Isotr0py	cf56cf78b4	[V1] Add sliding window support to Flex Attention backend (#24089 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-21 05:08:07 +00:00
Woosuk Kwon	7ed82d1974	[V0 Deprecation] Remove V0 MP executor (#25329 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-20 21:26:35 -07:00
Woosuk Kwon	12dbd834cf	[V0 Deprecation] Remove from_seq_group methods (#25330 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-20 21:10:48 -07:00
Wenlong Wang	035fd2bd2c	[Multi Modal][Performance] Fused Q,K's apply_rope in more models (#25005 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-09-21 03:55:10 +00:00
Woosuk Kwon	1cd885bd54	[V0 Deprecation] Remove V0 model runner base & simplify worker base (#25328 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-20 20:49:09 -07:00
Huamin Li	62b38dc832	[Doc] improve test-pipeline.yaml documentation (#25305 ) Signed-off-by: Huamin Li <3ericli@gmail.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2025-09-20 20:29:12 -07:00
Woosuk Kwon	c99db8c8dd	[V0 Deprecation] Remove V0 core (#25321 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-20 19:58:26 -07:00
Woosuk Kwon	72dd1595b4	[CI] Skip tests failing on main (#25326 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-20 19:57:46 -07:00
Woosuk Kwon	572ddf83ce	[Chore] Remove unused sampler in models (#25324 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-20 19:53:20 -07:00
Woosuk Kwon	86647d1cd0	[V0 Deprecation] Remove V0 Output Processor (#25320 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-20 17:57:20 -07:00
Woosuk Kwon	52c2a8d4ad	[V0 Deprecation] Remove LLMEngine (#25033 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-20 17:56:30 -07:00
Michael Yao	367a480bd3	[Docs] Fix warnings in vllm/profiler and vllm/transformers_utils (#25220 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-09-20 16:39:47 -07:00
Cyrus Leung	bef180f009	[V0 Deprecation] Enable the remaining multimodal tests in V1 (#25307 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-20 17:50:58 +00:00
lirong	d88918e4c2	[Core] Enable sharded state loader for V1 engine and enhance test coverage (#25308 ) Signed-off-by: pengdrumli <pengdrumli@tencent.com>	2025-09-20 21:15:22 +08:00
Isotr0py	3c713a9711	[Model] Cleanup InternViT's data parallel implementation (#25306 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-20 05:46:24 -07:00
Manoel Marques	bf8b26cad1	Generate _ModelInfo properties file when loading to improve loading speed (#23558 ) Signed-off-by: Manoel Marques <manoel.marques@ibm.com> Signed-off-by: Manoel Marques <manoelmrqs@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-20 11:51:13 +00:00
Wenlong Wang	032d661d27	[Docs] Fix warnings in mkdocs build (continued) (#25042 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-09-20 11:45:18 +00:00
Michael Goin	e08a3a3fdb	[CI Failure] Disable FlashInfer RoPE to unblock CI (#25299 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-20 08:16:56 +00:00
Cyrus Leung	3d9a1d2de5	[V1] Support `LLM.apply_model` (#18465 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-20 07:14:35 +00:00
Roger Wang	be874c0201	[Bugfix] Fix Qwen3-VL-MoE weight loading for EP (#25300 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-09-20 00:04:05 -07:00
Chen Zhang	9607d5eb44	[Hybrid Allocator] Support full attention with different hidden size (#25101 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-09-19 23:43:59 -07:00
Cyrus Leung	c60e6137f0	[Optimization] Avoid repeated model architecture conversion for pooling models (#25261 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-20 13:30:22 +08:00
Chauncey	f91480b2d4	[Bugfix] fix tool call arguments is empty (#25223 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com> Co-authored-by: xin.li <xin.li@daocloud.io>	2025-09-20 13:29:54 +08:00
Chendi.Xue	6c5f82e5aa	[BUG FIX][NON-CUDA]quick fix to avoid call cudagraph_unsafe in attention (#25298 ) Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>	2025-09-20 04:41:23 +00:00
Nick Hill	b7f186bbb3	[BugFix] Exclude self when checking for port collision (#25286 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-20 12:28:31 +08:00
JartX	3642909617	[BUGFIX] GPTQ quantization compatibility for Qwen3 Next MOE models (AutoGPTQ and AutoRound-GPTQ) (#25268 ) Signed-off-by: JartX <sagformas@epdcenter.es>	2025-09-20 11:18:13 +08:00
Harry Mellor	c308501cb6	Improve weight loading for encoder models in Transformers backend (#25289 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-20 03:11:03 +00:00
Nick Hill	535d80056b	[Misc] Support more collective_rpc return types (#25294 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-20 02:02:38 +00:00
Nick Hill	a25ade5d47	[BugFix] Ensure appropriate guards in destructors (#25284 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-20 09:06:34 +08:00
Boyuan Feng	8945b001db	[torch.compile] CUDAGraph Inductor partition integration (#24281 ) Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Signed-off-by: boyuanfeng <boyuan@meta.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-20 01:02:15 +00:00
Andrew Sansom	b8a287a0a8	[docs] Prompt Embedding feature support (#25288 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai>	2025-09-19 17:46:23 -07:00
Andrew Sansom	c7e713616a	test: Remove vestigial skip for prompt embeds tests after landing v1 Prompt Embeds support (#25291 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai>	2025-09-19 17:33:40 -07:00
Maximilien de Bayser	a36c675817	Don't skip special tokens with hermes-style tool calling (#25281 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-09-19 17:33:25 -07:00
Lucas Kabela	3da17c2cc2	[Bugfix] Remove VLLM_TEST_DYNAMO_FULLGRAPH_CAPTURE #2969 (#25090 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2025-09-19 20:27:21 -04:00
Nick Hill	14c1432789	[BugFix] Fix async scheduling CPU tensor race take 2 (#25279 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-19 16:34:07 -07:00
Lucia Fang	ee7a66dd9a	allow disable flashinfer prefill (#25276 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-09-19 22:59:41 +00:00
Zhiyu	431535b522	Enable modelopt gemma3 nvfp4/fp8, make workflow more robust (#22771 ) Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-19 22:40:33 +00:00
Wentao Ye	711e912946	[Compile] Fix Compile Warning for Ignoring `MIN_BLOCK_PER_SM` (#25193 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-19 16:23:19 -06:00
Alec S	e69e0b8b5f	[Frontend] Responses API messages out, just harmony for now (#24985 ) Signed-off-by: Alec Solder <alecs@fb.com> Co-authored-by: Alec Solder <alecs@fb.com> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-09-19 21:40:16 +00:00
David-Wen	ddc9048394	Fix: Correct FusedMoE layer reference in auto_round quantization (#24818 ) Signed-off-by: David-Wen <18927700430@163.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-19 20:44:24 +00:00
nvjullin	b1a63d1b3b	[BugFix] Make FlashInferMetadataBuilder non-blocking (#25040 ) Signed-off-by: Julien Lin <jullin@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-19 20:36:34 +00:00
Michael Goin	48ecb4438b	[Perf] Use FlashInfer RoPE for RotaryEmbedding.forward_cuda when available (#21126 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-19 14:06:49 -06:00
Harry Mellor	e57fc15971	Specify platform in `pip-compile` `pre-commit` hook so it runs on MacOS (#25273 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-19 12:43:33 -07:00

1 2 3 4 5 ...

9718 Commits