youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Zhengxu Chen	eef921f45e	AOT Compilation for torch.compile (Bundled) (#24274 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2025-10-10 19:02:11 -04:00
Bram Wasti	e317414ce1	Cache the environment variable check for batch invariance (#26510 ) Signed-off-by: Bram Wasti <bwasti@meta.com>	2025-10-10 22:47:34 +00:00
Nick Hill	949cb0170d	[BugFix] Fix async scheduling + request preemption (#26385 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-10 20:29:57 +00:00
Vadim Gimpelson	e94cfd51da	[BUG] Qwen3-next MTP. Fix attn metadata build bug (#26564 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-10-10 14:59:03 -04:00
Harry Mellor	7c12763b24	Fix some typing issues found by `mypy==1.18.2` (#26596 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-10 18:21:25 +00:00
Will Eaton	3b780a4bbb	Update CUDA architecture list in build pipeline for 12.9.1 wheels (#26592 ) Signed-off-by: Will Eaton <wseaton@users.noreply.github.com>	2025-10-10 11:15:27 -07:00
Harry Mellor	30f78af147	Update `pre-commit` hook versions (#26591 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-10 17:03:44 +00:00
Xiong Wang	19a9b169bf	Add Qwen3-Omni moe thinker (#25550 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Xiong Wang <feizi.wx@alibaba-inc.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-10 17:00:56 +00:00
Roberto L. Castro	96ad65b7fe	[Transform] [Quantization] Add QuTLASS support to vLLM (#24440 ) Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Signed-off-by: Andrei Panferov <andrei@panferov.org> Co-authored-by: Andrei Panferov <andrei@panferov.org> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-10-10 09:43:40 -07:00
Shane A	8d2b8c0ff2	[Model] Add FlexOlmo model implementation (#24923 ) Signed-off-by: Shane A <shanea@allenai.org>	2025-10-10 09:43:15 -07:00
Lukas Geiger	b2155ed317	[Model][Qwen3VL] Compute `cu_seqlens` on CPU to remove (#26496 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-10 09:42:17 -07:00
Chauncey	910abdbd08	[Bugfix] fixed top_logprobs: -1 does not appear to work as intended (#26470 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-10-11 00:41:17 +08:00
baonudesifeizhai	cddce79fda	[torch.compile] Make inductor partition rules respect splitting_ops #25691 (#25845 ) Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com> Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-10 16:35:28 +00:00
Mark McLoughlin	e519281920	[Metrics] Add test for multi-modal cache stats logging (#26588 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-10-10 16:00:50 +00:00
Elvir Crnčević	7b03584de8	Silu v2 (#25074 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: elvircrn <elvircrn@gmail.com> Signed-off-by: Elvir Crnčević <elvircrn@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>	2025-10-10 15:19:53 +00:00
Sage Moore	ae9d0e7da5	[Bugfix] Make DP padding optional in coordinate_batch_across_dp (#26375 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-10-10 10:53:33 -04:00
Daniel Cámpora	0e67102d93	Added test_top_k_per_row to test-pipeline.yaml. (#26569 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-10-10 10:48:33 -04:00
Jason Li	f4ba2061cf	[BugFix][torch.compile] Fix fused_scaled_matmul_reduce_scatter signature for PyTorch 2.8 (#26038 ) Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com> Signed-off-by: <> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-10 07:42:13 -07:00
Chauncey	1e6848a65d	[CI] fix test_run_batch.py::test_completions - AssertionError (#26578 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-10-10 22:16:28 +08:00
Andy Lo	67661375fa	[BugFix] Fix noop elimination edge case (#26394 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2025-10-10 13:33:04 +00:00
Lucas Kabela	213b64452a	[Bugfix] Convert untraceable GroupShape to list for AMD impl (#26535 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2025-10-10 13:32:29 +00:00
Mark McLoughlin	784c231151	[NIXL] Ignore abort on already-finished request (#25067 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-10-10 12:21:56 +02:00
Chen Zhang	606b00e80f	[bugfix][DCP] fix block_size of hash in DCP prefix caching (#26296 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-10-10 03:02:49 -07:00
Chauncey	720d3cd0f0	[CI] fix ruff format (#26579 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-10-10 03:02:12 -07:00
Ashwin Phadke	ab196edefb	Remove LoRA bias support (#25807 ) Signed-off-by: Ashwin Phadke <ashwinphadke12@rediffmail.com> Signed-off-by: Ashwin Phadke <23502062+ashwin-phadke@users.noreply.github.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-10 09:50:33 +00:00
Luis Tomas Bolivar	3ee202ea1e	[GPT-OSS] Add support for arrays at tool message content (#25593 ) Signed-off-by: Luis Tomas Bolivar <ltomasbo@redhat.com>	2025-10-10 09:00:45 +00:00
Cyrus Leung	ad430a67ca	[Metrics] Log multi-modal cache stats and fix reset (#26285 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-10 01:45:55 -07:00
Chen Zhang	6f0f570c43	[deepseek] kernel block size for UniformTypeKVCacheSpecs (#26559 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-10-10 16:40:41 +08:00
Boyuan Feng	b545a0b207	fix test_simple_inductor_graph_partition (#26522 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-10-10 06:39:19 +00:00
Lucas Wilkinson	29255cfc3b	[Spec-Decode] Support piecewise cudagraphs for Eagle head (#25109 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-10-10 01:20:31 -04:00
Ben Browning	da4455609d	[Chore]: One pythonic tool parser test uses the wrong parser (#26515 ) Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-10-10 04:03:55 +00:00
Nick Hill	aafb99a4d4	[Core] Small simplification in `GPUModelRunner._update_states()` (#26508 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-10 10:53:58 +08:00
Rui Qiao	757fa4a4da	[DP][ray] Support different VLLM_RAY_DP_PACK_STRATEGY (#23849 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-10-09 19:53:43 -07:00
Julien Denize	c6187f55f7	Refactor MistralTokenizer (#26358 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2025-10-09 22:48:58 +00:00
Wentao Ye	8983e0216f	[CI] Fix Pre-commit Issue Cannot determine type of "rank" and "world_size" (#26448 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-09 15:16:48 -07:00
Wentao Ye	1ee35382cb	[Bug] Fix modular_kernel: ZeroDivisionError: integer division or modulo by zero (#26528 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-09 15:13:27 -07:00
Benjamin Chislett	6e783bc54b	[Bugfix] Fix CUDA graph selection bug in FlashInfer at high concurrency (#26499 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-09 17:12:34 -04:00
Michael Goin	c9d33c60dc	[UX] Add FlashInfer as default CUDA dependency (#26443 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-10-09 14:10:02 -07:00
Nick Hill	2e54db4d2b	[Core] Remove unused `prev_sampled_token_ids_invalid_indices` input batch field (#26514 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-09 20:22:14 +00:00
elvischenv	44f633dba1	[Flashinfer][gpt-oss] Support FP8-qkv Flashinfer TRTLLM Sinks Attention (#25674 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-10-09 16:13:39 -04:00
bnellnm	a462331e36	[Bugfix] Disable moe inplace for torch >= 2.9 (#26497 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-10-09 18:07:38 +00:00
roikoren755	4069db3f2e	[Bugfix] Enable padded FP4 quantization (#25947 ) Signed-off-by: Roi Koren <roik@nvidia.com>	2025-10-09 10:59:41 -07:00
Sage Moore	0d37450eb7	[BUGFIX] Add cu_tokens_across_sp to DPMetadata (#26457 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-10-09 17:13:56 +00:00
bnellnm	47e66c24e2	[Model] Apply shared experts overlap optimization to all models with shared experts (#26145 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-10-09 11:31:04 -04:00
Ming Yang	3b736e1c38	[Attention][DCP] Support DCP with query length > 1 (MTP) with FA3 (#25049 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-10-09 08:06:29 -07:00
Lukas Geiger	2c1c7dfb35	[Models][Qwen] Replace `pad` with `cat` for better performance (#26486 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-10-09 14:51:26 +00:00
Harry Mellor	e246ad6f0c	Upgrade Pydantic to v2.12.0 and remove hack for Python 3.13 (#26481 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-09 06:02:40 -07:00
Jiangyun Zhu	5728da11ea	Revert #26113 "[Frontend] CompilationConfig overhaul (#20283 ): deprecate use_inductor in favor of backend, simplify custom_ops" (#26472 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-10-09 05:43:55 -07:00
Simon Danielsson	92be3f3517	[Feature] Use pydantic validation in parallel.py config (#26417 ) Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-09 12:41:31 +00:00
Isotr0py	d1ddf340c8	[V0 deprecation] Remove `QKVCrossParallelLinear` implementation (#26475 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-09 10:52:27 +00:00

1 2 3 4 5 ...

10333 Commits