youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Isotr0py	00a4e56d8d	[Bugfix] Fix broken deepseek fp8 TP weights loading (#24367 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-06 09:23:12 -07:00
mohankku	0eadaeff7e	[Bugfix] Avoid uninitialized usage of azp_val when AZP is false. (#24335 ) Signed-off-by: Mohan Kumar Kumar <mohan.cbein@gmail.com> Signed-off-by: mohankku <mohan.cbein@gmail.com>	2025-09-06 08:17:03 -07:00
Benjamin Chislett	0077c8634e	Add @benchislett to codeowner for spec decode and structured outputs (#24362 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-09-06 22:03:35 +08:00
Roger Wang	b121ca22ad	[CI] Disable flaky structured output test from CI (#24366 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-09-06 13:31:56 +00:00
Roger Wang	eddaafc1c7	[Multimodal] Improve max video embedding length estimation in V1 (#24312 ) Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: Roger Wang <hey@rogerw.me>	2025-09-06 02:33:19 -07:00
Andrew Sansom	305a1cc0d2	refactor: Turn GPUModelRunner.inputs_embeds to a CpuGpuBuffer (#24345 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai>	2025-09-05 23:01:23 -07:00
wang.yuqi	6d6c6b05d3	[New Model]: google/embeddinggemma-300m (#24318 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-09-05 22:58:36 -07:00
Isotr0py	53b19ccdd5	[Core] Allow disabling TP sharding for parallel Linear layer (#23024 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-05 22:53:58 -07:00
Nick Hill	6432739ef1	[Bugfix] Catch and log invalid token ids in detokenizer (#24351 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-05 22:30:22 -07:00
yzds	ac201a0eaf	[Feature] Support Decode Context Parallel (DCP) for MLA (#23734 ) Signed-off-by: hongchao <hongchao@msh.team> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: hongchao <hongchao@msh.team> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-09-06 13:24:05 +08:00
Yong Hoon Shin	3c529fc994	[KV Sharing] Raise error if using eagle with fast prefill (#24350 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-09-05 20:22:40 -07:00
Didier Durand	35bf193864	[Doc]: fix typos in Python comments (#24294 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-05 19:41:12 -07:00
22quinn	35efa70297	Add @22quinn as code reviewer for RL related components (#24346 )	2025-09-06 01:56:15 +00:00
Benjamin Chislett	cee182b297	[Perf][V1] Fully overlap model execution (#23569 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-09-05 18:20:17 -07:00
Rafael Vasquez	c954c6629c	[CI] Add timeouts to tests (#24260 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-09-05 17:26:22 -07:00
Shiyan Deng	9dfbeb41e5	[RFC] allow cancelation after shutdown in blocking collective_rpc (#23390 ) Signed-off-by: Shiyan Deng <dsy842974287@meta.com>	2025-09-05 14:14:18 -07:00
elvischenv	eedb2a2a10	[Bugfix] Fix silu_mul+quant fusion test (#24341 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-09-05 20:13:42 +00:00
Chauncey	23a6c5280e	[gpt-oss][Bugfix]Fix streamableparser for missing handling of certain token_ids (#24306 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-09-05 10:26:00 -07:00
youkaichao	7812bcf278	[docs] add shenzhen meetup (#24326 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-09-05 22:48:42 +08:00
Louie Tsai	006e7a34ae	Adding int4 and int8 models for CPU benchmarking (#23709 ) Signed-off-by: Tsai, Louie <louie.tsai@intel.com>	2025-09-05 20:08:50 +08:00
liuzhenwei	e599e2c65e	[XPU][P/D] Add XPU support in NixlConnector (#22436 ) Signed-off-by: zhenwei <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-04 21:03:12 -07:00
Aaron Pham	c29fb540ff	[gpt-oss] tool parser supports for /chat/completions [1/n] (#22386 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-09-04 20:39:12 -07:00
Nicolò Lucchesi	65e038931d	[Frontend] Skip unnecessary detokenization when token_id is requested (#24236 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-09-04 23:04:12 +00:00
Zhuohan Li	886ccbe5ba	[CI/Build] Reduce the number of redundant cases to test for LoRA (#24276 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-09-04 21:58:44 +00:00
elvischenv	adc3ddb430	[Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files (#23727 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-04 14:25:45 -07:00
Seiji Eicher	60b755cbcb	[Misc] Have AsyncLLM `custom_stat_loggers` extend default logger list (#20952 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com> Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-09-04 14:25:30 -07:00
Saman A. Pour	482e52f56c	QWEN3 Coder Fused MoE kernels Optimization configs (#24266 ) Signed-off-by: Saman Keon <samanamp@outlook.com>	2025-09-04 20:33:43 +00:00
Po-Han Huang (NVIDIA)	78336a0c3e	Upgrade FlashInfer to v0.3.0 (#24086 ) Signed-off-by: Po-Han Huang <pohanh@nvidia.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-09-04 09:49:20 -07:00
Jee Jee Li	94866d7c93	[Misc] Slight improve deepgemm print (#24085 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-04 16:06:51 +00:00
Didier Durand	83609ca91d	[Doc]: fix typos in Python comments (#24173 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-04 08:52:17 -07:00
Nick Hill	e41a0fa377	[Perf] Freeze core engine proc heap after init (#24008 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-04 22:55:23 +08:00
nvjullin	37241077d5	[Misc] Removed force_fp8_e4m3fnuz from FP8LinearOp (#23725 ) Signed-off-by: Julien Lin <jullin@nvidia.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-04 09:25:40 -04:00
Yash Pratap Singh	c9f7081f9c	[LoRA]: Add lora support to qwen-2.5-omni (#24231 )	2025-09-04 05:50:50 -07:00
Kunshang Ji	16ded21eeb	[XPU] support Triton Attention backend on Intel GPU (#24149 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-04 20:41:08 +08:00
nopperl	2b30afa442	Use hidden_size_per_head as head_size fallback (#24221 ) Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>	2025-09-04 12:59:16 +01:00
Jiangyun Zhu	eafa8dcde6	[Model] Add pp support for hunyuan (#24212 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-09-04 03:58:26 -07:00
TJian	6c7af8110a	[Doc] Update vLLM Singapore Meetup info (#24234 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-09-04 02:58:18 -07:00
Kebe	8f423e5f43	[Feature][Response API] Add streaming support for non-harmony (#23741 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-09-04 17:49:06 +08:00
Ignacio Sica	369a079568	[Hardware][Apple-CPU] Disable OneDNN build for Apple Silicon (#24200 ) Signed-off-by: ignaciosica <mignacio.sica@gmail.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2025-09-04 02:48:25 -07:00
Lucas Wilkinson	402759d472	[Attention] FlashAttn MLA (#14258 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2025-09-04 02:47:59 -07:00
Fanli Lin	2c301ee2eb	[Bugfix] Fix Incremental Detokenization with `tokenizers == 0.22.0` (#24159 ) Signed-off-by: Fanli Lin <fanli.lin@intel.com> Signed-off-by: Fanli Lin <fanli0116@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-04 02:47:08 -07:00
whx	3efb9f4d95	[Attention][Platform] Refactor MLA to support Custom Op (#23332 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2025-09-04 02:46:37 -07:00
anthonsu	04f3c35cff	Improve flexibility of auto_tune.sh execution. (#23766 ) Signed-off-by: Anthony Su <50185138+anthonsu@users.noreply.github.com> Signed-off-by: anthonsu <50185138+anthonsu@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-04 09:41:41 +00:00
mgazz	51d5e9be7d	[Core][Model] Terratorch backend integration (#23513 ) Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com> Signed-off-by: Christian Pinto <christian.pinto@ibm.com> Co-authored-by: Christian Pinto <christian.pinto@ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-09-04 00:22:41 -07:00
bingchen-mi	e7fc70016f	[Model] Add MiDashengLM model support (#23652 ) Signed-off-by: chenbing8 <chenbing8@xiaomi.com> Signed-off-by: bingchen-mi <chenbing8@xiaomi.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-04 00:08:09 -07:00
Weida Hong	12e1e63cc5	[Misc] Enhance output readability of helper script (#24214 ) Signed-off-by: Weida Hong <wdhongtw@google.com>	2025-09-04 06:38:26 +00:00
Li, Jiang	57b1ce94f7	[CPU] Refactor CPU unquantized linear (#24150 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-09-04 14:28:45 +08:00
Benji Beck	cb55ad86fe	Migrate ultravox inputs to TensorSchema (#23503 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-09-04 06:09:11 +00:00
Flora Feng	712b273f65	[Refactor] Introduce basic Renderer for completion-style request (#24010 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2025-09-04 05:21:12 +00:00
Qiming Zhang	e919d6f549	[Kernel][Bugfix] Fix grouped topk cu (#24146 ) Signed-off-by: mayuyuace <qiming1.zhang@intel.com>	2025-09-04 12:37:37 +08:00

1 2 3 4 5 ...

9184 Commits