youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
bnellnm	c1909e7e8c	[Kernels] MoE refactor (#19636 ) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: ElizaWszola <ewszola@redhat.com> Co-authored-by: ElizaWszola <ewszola@redhat.com>	2025-07-02 06:08:27 -07:00
cronoik-inceptionai	b95877509b	Documentation update tool_calling: mapping back to function from response (#20373 )	2025-07-02 05:55:49 -07:00
zichongli5	706ff13224	[Model] Adds support for SlimMoE models Phi-tiny-MoE-instruct (#20286 ) Signed-off-by: Zichong Li <t-lizichong@microsoft.com@Reasoning-H100-VM3.drbuo4tcjzruhloch3eo0b25ef.cx.internal.cloudapp.net> Co-authored-by: Zichong Li <t-lizichong@microsoft.com@Reasoning-H100-VM3.drbuo4tcjzruhloch3eo0b25ef.cx.internal.cloudapp.net> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-07-02 12:54:12 +00:00
WangHuaqiang	ccbfb1d1c9	[Bugfix] Fix the max_seq_len limit of 16384 for DeepSeek models (#20322 ) Signed-off-by: Wang Huaqiang <huaqiang.wang@intel.com>	2025-07-02 12:53:36 +00:00
Joonchen Liau	9e5552aa13	[NVIDIA] Support Cutlass w8a8 FP8 for Blackwell Geforce GPUs (sm120) (#17280 ) Signed-off-by: kaln27 <liaojuncheng123@foxmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-02 06:47:19 -06:00
Lu Fang	0c600b9ab6	[Build/CI] Automatically tag DeepSeek related PRs (#20370 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-07-02 04:02:43 -07:00
CSWYF3634076	e303dcf523	[Model] Add Ernie4.5 and Ernie4.5MoE Model Support (#20220 ) Signed-off-by: wangyafeng <wangyafeng@baidu.com>	2025-07-02 03:37:01 -07:00
Michael Yao	ae9c4d416f	[Docs] Make TPU ref prettier in google_tpu.md (#20356 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-07-02 02:04:08 -07:00
Michael Yao	d853520b3e	[Docs] Fix indentations for 2-level items in deprecation_policy.md (#20352 ) Signed-off-by: windsonsea <haifeng.yao@daocloud.io>	2025-07-01 23:50:31 -07:00
Cyrus Leung	ba51aea65e	[Bugfix] Keye-VL compatibility with `tok_kwargs` (#20058 ) (#20353 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-01 23:46:59 -07:00
Kwai-Keye	8452946c06	[Model][VLM] Support Keye-VL-8B-Preview (#20126 ) Signed-off-by: Kwai-Keye <Keye@kuaishou.com>	2025-07-01 23:35:04 -07:00
Chenheli Hua	2e7cbf2d7d	[Frontend] Support configurable mm placeholder strings & flexible video sampling policies via CLI flags. (#20105 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-07-01 23:34:03 -07:00
Chengji Yao	7da296be04	[TPU] kv cache update kernel supports dynamic grid (#20235 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-07-02 06:33:37 +00:00
QiliangCui	b205e8467d	[Doc][TPU] Add models and features supporting matrix. (#20230 ) Signed-off-by: Qiliang Cui <cuiq@google.com>	2025-07-02 06:33:20 +00:00
yyzxw	be0cfb2b68	fix[Docs]: link anchor is incorrect #20309 (#20315 ) Signed-off-by: zxw <1020938856@qq.com>	2025-07-02 06:32:34 +00:00
Cyrus Leung	1a03dd496b	[Bugfix] Fix dynamic rotary embedding (#20343 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-02 06:31:26 +00:00
Kunshang Ji	27b8017636	[FIX][Intel GPU]fix ipex flash_attn_varlen_func api missing parameter (#20348 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-07-01 22:26:40 -07:00
Lifans	9ec1e3065a	[Misc][Doc] Add missing comment for LLM (#20285 ) Signed-off-by: Lifan Shen <lifans@meta.com>	2025-07-01 19:04:24 -07:00
Wentao Ye	9dae7d46bf	[Refactor] Remove Unused Env `VLLM_ENABLE_MOE_ALIGN_BLOCK_SIZE_TRITON` (#20334 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-01 19:03:43 -07:00
Wentao Ye	7058d7dd5d	[Refactor] Remove duplicate `find_free_port` (#20333 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-01 19:03:07 -07:00
Liangliang Ma	a0389e0554	[UT][intel GPU] use current_platform instead of device hardcode in v1 tests (#20169 ) Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>	2025-07-02 09:06:04 +08:00
Tyler Michael Smith	3be8d312a2	[Kernel][Bugfix] Fixup some warnings in nvfp4_blockwise_moe when CUDA < 12.8 (#20324 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-07-01 18:05:47 -07:00
czhu-cohere	3abfe22154	Enable group size 64 for Machete (#20290 ) Signed-off-by: czhu-cohere <conway.zhu@cohere.com>	2025-07-01 18:05:44 -07:00
Wentao Ye	e81fbefe8a	[Refactor] Refactor import utils (#20269 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-01 18:05:42 -07:00
周周周	9290de5667	remove unused variables in marlin_template.h (#20236 )	2025-07-02 00:51:52 +00:00
Woosuk Kwon	7f280d69c9	[Optimization] Cache sampled token ids in model runner (#20291 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-01 11:01:31 -07:00
TJian	02cabff207	[V1] [ROCm] Enable EP with AITER Fused MoE (#20270 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-07-01 16:48:30 +00:00
Shintarou Okada	3d19d47d91	[Frontend] Expand tools even if tool_choice="none" (#17177 ) Signed-off-by: okada shintarou <okada@preferred.jp>	2025-07-01 12:47:38 -04:00
Woosuk Kwon	8acb4badee	[CUDA graphs] Enable full cuda graphs with FA3 AoT scheduling (#20301 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-01 09:07:36 -07:00
Nicolò Lucchesi	314af8617c	[Docs] Update transcriptions API to use openai client with `stream=True` (#20271 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-07-01 15:47:13 +00:00
Woosuk Kwon	0e96cc9b7e	[Misc] Minor refactoring for scheduler (#20299 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-01 07:55:32 -07:00
aiyiwang2025	ecad851cbd	[Model]Add Tencent HunYuanMoEV1 Model Support (#20114 ) Signed-off-by: aiyiwang <aiyiwang@tencent.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: quinnrong <quinnrong@tencent.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-01 07:28:13 -07:00
Yuxuan Zhang	ed70f3c64f	Add GLM4.1V model (Draft) (#19331 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-07-01 12:48:26 +00:00
Nicolò Lucchesi	650d5dbd04	[Misc] Minor refactor of NIXL background handshake (#20068 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-07-01 12:40:14 +01:00
Kyle Sayers	9025a9a705	[Quant] [Bugfix] Fix quantization config matching with `hf_to_vllm_mapper` (#20046 )	2025-07-01 19:20:34 +09:00
Lionel Villard	c05596f1a3	[Perf] Validate @config in pre-commit instead of dynamically (#20200 ) Signed-off-by: Lionel Villard <villard@us.ibm.com>	2025-07-01 05:10:28 -04:00
Reid	787b13389e	[doc] fix the incorrect logo in dark mode (#20289 ) Signed-off-by: reidliu41 <reid201711@gmail.com>	2025-07-01 08:18:09 +00:00
TY-AMD	96453cfa83	[BugFix][V1][ROCm] Triton MLA uses V0 backend on V1 engine (#19067 ) Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com>	2025-07-01 16:12:19 +08:00
Kebe	b1c1fe35a5	[Misc] remove redundant char (#20287 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-07-01 15:33:22 +08:00
Varun Sundar Rabindranath	08d81f1014	[Bugfix] Fix deepep tests (#20288 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-07-01 15:29:08 +08:00
Li, Jiang	6cc1e7d96d	[CPU] Update custom ops for the CPU backend (#20255 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-01 07:25:03 +00:00
czhu-cohere	9909726d2a	Enable ZP Support for Machete (#20268 ) Signed-off-by: czhu-cohere <conway.zhu@cohere.com>	2025-07-01 07:12:20 +00:00
Prashant Gupta	22e9d42040	[Misc] add xgrammar for arm64 (#18359 ) Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>	2025-07-01 07:02:20 +00:00
Richard Barnes	86debab54c	Fix `numel()` downcast in vllm/csrc/moe/moe_align_sum_kernels.cu +2 (#17082 ) Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-01 06:48:10 +00:00
Michael Goin	be250bbc67	[V1] Only print cudagraph tqdm on rank 0 with `is_global_first_rank` (#19516 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-01 06:02:09 +00:00
Alex Kogan	27949354fa	[Feature] A calibration-free RTN-based quantization for accurate and accelerated INT4/INT8 inference (#18768 ) Signed-off-by: Alex Kogan <alex.kogan@oracle.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-07-01 05:44:38 +00:00
Ernest Wong	bd5038af07	[Doc] add config and troubleshooting guide for NCCL & GPUDirect RDMA (#15897 ) Signed-off-by: Ernest Wong <chwong719@gmail.com>	2025-06-30 21:44:39 -07:00
Chendi.Xue	a2f14dc8f9	[CI][Intel Gaudi][vllm-Plugin]Add CI for hpu-plugin-v1-test (#20196 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2025-07-01 04:17:07 +00:00
Kuntai Du	92ee7baaf9	[Example] add one-click runnable example for P2P NCCL XpYd (#20246 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-06-30 21:03:55 -07:00
Woosuk Kwon	7151f92241	[Misc] Fix spec decode example (#20296 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-06-30 21:01:48 -07:00

1 2 3 4 5 ...

7431 Commits