|
|
00a4e56d8d
|
[Bugfix] Fix broken deepseek fp8 TP weights loading (#24367)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-06 09:23:12 -07:00 |
|
|
|
0eadaeff7e
|
[Bugfix] Avoid uninitialized usage of azp_val when AZP is false. (#24335)
Signed-off-by: Mohan Kumar Kumar <mohan.cbein@gmail.com>
Signed-off-by: mohankku <mohan.cbein@gmail.com>
|
2025-09-06 08:17:03 -07:00 |
|
|
|
0077c8634e
|
Add @benchislett to codeowner for spec decode and structured outputs (#24362)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-09-06 22:03:35 +08:00 |
|
|
|
b121ca22ad
|
[CI] Disable flaky structured output test from CI (#24366)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-09-06 13:31:56 +00:00 |
|
|
|
eddaafc1c7
|
[Multimodal] Improve max video embedding length estimation in V1 (#24312)
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-09-06 02:33:19 -07:00 |
|
|
|
305a1cc0d2
|
refactor: Turn GPUModelRunner.inputs_embeds to a CpuGpuBuffer (#24345)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
2025-09-05 23:01:23 -07:00 |
|
|
|
6d6c6b05d3
|
[New Model]: google/embeddinggemma-300m (#24318)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-05 22:58:36 -07:00 |
|
|
|
53b19ccdd5
|
[Core] Allow disabling TP sharding for parallel Linear layer (#23024)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-05 22:53:58 -07:00 |
|
|
|
6432739ef1
|
[Bugfix] Catch and log invalid token ids in detokenizer (#24351)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-09-05 22:30:22 -07:00 |
|
|
|
ac201a0eaf
|
[Feature] Support Decode Context Parallel (DCP) for MLA (#23734)
Signed-off-by: hongchao <hongchao@msh.team>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: hongchao <hongchao@msh.team>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-09-06 13:24:05 +08:00 |
|
|
|
3c529fc994
|
[KV Sharing] Raise error if using eagle with fast prefill (#24350)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-09-05 20:22:40 -07:00 |
|
|
|
35bf193864
|
[Doc]: fix typos in Python comments (#24294)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-05 19:41:12 -07:00 |
|
|
|
35efa70297
|
Add @22quinn as code reviewer for RL related components (#24346)
|
2025-09-06 01:56:15 +00:00 |
|
|
|
cee182b297
|
[Perf][V1] Fully overlap model execution (#23569)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-09-05 18:20:17 -07:00 |
|
|
|
c954c6629c
|
[CI] Add timeouts to tests (#24260)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-09-05 17:26:22 -07:00 |
|
|
|
9dfbeb41e5
|
[RFC] allow cancelation after shutdown in blocking collective_rpc (#23390)
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
|
2025-09-05 14:14:18 -07:00 |
|
|
|
eedb2a2a10
|
[Bugfix] Fix silu_mul+quant fusion test (#24341)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-09-05 20:13:42 +00:00 |
|
|
|
23a6c5280e
|
[gpt-oss][Bugfix]Fix streamableparser for missing handling of certain token_ids (#24306)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-09-05 10:26:00 -07:00 |
|
|
|
7812bcf278
|
[docs] add shenzhen meetup (#24326)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-09-05 22:48:42 +08:00 |
|
|
|
006e7a34ae
|
Adding int4 and int8 models for CPU benchmarking (#23709)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
|
2025-09-05 20:08:50 +08:00 |
|
|
|
e599e2c65e
|
[XPU][P/D] Add XPU support in NixlConnector (#22436)
Signed-off-by: zhenwei <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-09-04 21:03:12 -07:00 |
|
|
|
c29fb540ff
|
[gpt-oss] tool parser supports for /chat/completions [1/n] (#22386)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-09-04 20:39:12 -07:00 |
|
|
|
65e038931d
|
[Frontend] Skip unnecessary detokenization when token_id is requested (#24236)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-04 23:04:12 +00:00 |
|
|
|
886ccbe5ba
|
[CI/Build] Reduce the number of redundant cases to test for LoRA (#24276)
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
|
2025-09-04 21:58:44 +00:00 |
|
|
|
adc3ddb430
|
[Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files (#23727)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-04 14:25:45 -07:00 |
|
|
|
60b755cbcb
|
[Misc] Have AsyncLLM custom_stat_loggers extend default logger list (#20952)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-09-04 14:25:30 -07:00 |
|
|
|
482e52f56c
|
QWEN3 Coder Fused MoE kernels Optimization configs (#24266)
Signed-off-by: Saman Keon <samanamp@outlook.com>
|
2025-09-04 20:33:43 +00:00 |
|
|
|
78336a0c3e
|
Upgrade FlashInfer to v0.3.0 (#24086)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-09-04 09:49:20 -07:00 |
|
|
|
94866d7c93
|
[Misc] Slight improve deepgemm print (#24085)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-04 16:06:51 +00:00 |
|
|
|
83609ca91d
|
[Doc]: fix typos in Python comments (#24173)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-04 08:52:17 -07:00 |
|
|
|
e41a0fa377
|
[Perf] Freeze core engine proc heap after init (#24008)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-09-04 22:55:23 +08:00 |
|
|
|
37241077d5
|
[Misc] Removed force_fp8_e4m3fnuz from FP8LinearOp (#23725)
Signed-off-by: Julien Lin <jullin@nvidia.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-04 09:25:40 -04:00 |
|
|
|
c9f7081f9c
|
[LoRA]: Add lora support to qwen-2.5-omni (#24231)
|
2025-09-04 05:50:50 -07:00 |
|
|
|
16ded21eeb
|
[XPU] support Triton Attention backend on Intel GPU (#24149)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-09-04 20:41:08 +08:00 |
|
|
|
2b30afa442
|
Use hidden_size_per_head as head_size fallback (#24221)
Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>
|
2025-09-04 12:59:16 +01:00 |
|
|
|
eafa8dcde6
|
[Model] Add pp support for hunyuan (#24212)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-09-04 03:58:26 -07:00 |
|
|
|
6c7af8110a
|
[Doc] Update vLLM Singapore Meetup info (#24234)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-09-04 02:58:18 -07:00 |
|
|
|
8f423e5f43
|
[Feature][Response API] Add streaming support for non-harmony (#23741)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-09-04 17:49:06 +08:00 |
|
|
|
369a079568
|
[Hardware][Apple-CPU] Disable OneDNN build for Apple Silicon (#24200)
Signed-off-by: ignaciosica <mignacio.sica@gmail.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2025-09-04 02:48:25 -07:00 |
|
|
|
402759d472
|
[Attention] FlashAttn MLA (#14258)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-09-04 02:47:59 -07:00 |
|
|
|
2c301ee2eb
|
[Bugfix] Fix Incremental Detokenization with tokenizers == 0.22.0 (#24159)
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
Signed-off-by: Fanli Lin <fanli0116@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-04 02:47:08 -07:00 |
|
|
|
3efb9f4d95
|
[Attention][Platform] Refactor MLA to support Custom Op (#23332)
Signed-off-by: whx-sjtu <2952154980@qq.com>
|
2025-09-04 02:46:37 -07:00 |
|
|
|
04f3c35cff
|
Improve flexibility of auto_tune.sh execution. (#23766)
Signed-off-by: Anthony Su <50185138+anthonsu@users.noreply.github.com>
Signed-off-by: anthonsu <50185138+anthonsu@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-04 09:41:41 +00:00 |
|
|
|
51d5e9be7d
|
[Core][Model] Terratorch backend integration (#23513)
Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-04 00:22:41 -07:00 |
|
|
|
e7fc70016f
|
[Model] Add MiDashengLM model support (#23652)
Signed-off-by: chenbing8 <chenbing8@xiaomi.com>
Signed-off-by: bingchen-mi <chenbing8@xiaomi.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-04 00:08:09 -07:00 |
|
|
|
12e1e63cc5
|
[Misc] Enhance output readability of helper script (#24214)
Signed-off-by: Weida Hong <wdhongtw@google.com>
|
2025-09-04 06:38:26 +00:00 |
|
|
|
57b1ce94f7
|
[CPU] Refactor CPU unquantized linear (#24150)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-09-04 14:28:45 +08:00 |
|
|
|
cb55ad86fe
|
Migrate ultravox inputs to TensorSchema (#23503)
Signed-off-by: Benji Beck <benjibeck@meta.com>
|
2025-09-04 06:09:11 +00:00 |
|
|
|
712b273f65
|
[Refactor] Introduce basic Renderer for completion-style request (#24010)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2025-09-04 05:21:12 +00:00 |
|
|
|
e919d6f549
|
[Kernel][Bugfix] Fix grouped topk cu (#24146)
Signed-off-by: mayuyuace <qiming1.zhang@intel.com>
|
2025-09-04 12:37:37 +08:00 |
|