youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Thomas Parnell	bd875d2eb7	[Bugfix] Update FA commit hash (#22546 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-08 16:10:25 -07:00
Varun Sundar Rabindranath	f703b923f3	[Misc] DeepGEMM : Avoid JIT generation in the hot-path (#22215 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-08-08 16:09:59 -07:00
Lucas Wilkinson	cd9b9de1fb	[BugFix] Fix IMA FlashMLA full cuda-graph and DP + Update FlashMLA (#21691 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-08-08 16:09:42 -07:00
Chen Zhang	fe6d8257a1	[gpt-oss] Support tool call and implement MCP tool server (#22427 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-08 15:06:37 -07:00
Ricardo Decal	e290594072	[Docs] Rename “Distributed inference and serving” to “Parallelism & Scaling” (#22466 ) Signed-off-by: Ricardo Decal <rdecal@anyscale.com>	2025-08-08 19:26:21 +00:00
Yongye Zhu	f756a682d9	[gpt-oss] guard import when triton kernel is not installed (#22529 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-08 11:18:33 -07:00
Daniel Serebrenik	f0964e29cb	[Benchmark] Add benchmark tool for multi turn conversations (#20267 )	2025-08-08 10:28:50 -07:00
Yongye Zhu	e789cad6b8	[gpt-oss] triton kernel mxfp4 (#22421 ) Signed-off-by: <zyy1102000@gmail.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>	2025-08-08 08:24:07 -07:00
Harry Mellor	e5ebeeba53	Remove exception for Python 3.8 typing from linter (#22506 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-08 03:06:46 -07:00
Harry Mellor	7be7f3824a	[Docs] Improve API docs (+small tweaks) (#22459 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-08 03:02:51 -07:00
Nick Hill	ccdae737a0	[BugFix] Don't cancel asyncio tasks directly from destructors (#22476 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-08 01:13:18 -07:00
rongfu.leng	904063907c	[Misc] fix openai version (#22485 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-08-08 01:12:54 -07:00
Cyrus Leung	43c4f3d77c	[Misc] Begin deprecation of `get_tensor_model_*_group` (#22494 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-08 01:11:54 -07:00
Cyrus Leung	1712543df6	[CI/Build] Fix multimodal tests (#22491 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-08 00:31:19 -07:00
lkchen	808a7b69df	[bench] Fix benchmark/serve.py to ignore unavailable results (#22382 ) Signed-off-by: Linkun <github@lkchen.net>	2025-08-07 23:15:50 -07:00
iAmir97	099c046463	[Doc] Sleep mode documentation (#22310 ) Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com> Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com> Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Hong Hanh <hanh.usth@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-08-08 12:25:18 +08:00
Po-Han Huang (NVIDIA)	af473f0a85	[bugfix] Fix Llama3/4 issues caused by FlashInfer 0.2.10 (#22426 ) Signed-off-by: Po-Han Huang <pohanh@nvidia.com>	2025-08-07 20:25:01 -07:00
Cyrus Leung	157f9c1368	Fix pre-commit (#22487 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-07 20:21:54 -07:00
ZiTian Zhao	6f287915d8	Optimize MiniCPMO mask creation with vectorized implementation (#22464 ) Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com> Signed-off-by: zitian zhao <zitian.zhao@tencentmusic.com>	2025-08-07 20:18:50 -07:00
Yuxuan Zhang	c152e2a8a0	not tie_word_embeddings for glm-4.5 and glm-4.5v (#22460 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>	2025-08-07 19:37:23 -07:00
Chauncey	17eaaef595	[Bugfix] Fix RuntimeError: Index put requires the source and destination dtypes match (#22065 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-08-07 19:20:21 -07:00
Junhao Li	3303f134e0	[Kernel] Add support for block FP8 on SM120 (NVIDIA 5090 and RTX PRO 6000) (#22131 ) Signed-off-by: Junhao Li <junhao@ubicloud.com>	2025-08-07 19:18:28 -07:00
Shu Wang	b2c8ce57c6	Fix Flashinfer CUTLASS MOE Allgather (#21963 ) Signed-off-by: Shu Wang <shuw@nvidia.com>	2025-08-07 19:18:25 -07:00
Shu Wang	a3b9c17b56	Support Tensorrt-LLM MoE fp4 for low-latency (#21331 ) Signed-off-by: Shu Wang <shuw@nvidia.com> Signed-off-by: Po-Han Huang <pohanh@nvidia.com> Signed-off-by: Shu Wang. <shuw@nvidia.com> Signed-off-by: XIn Li <xinli@nvidia.com> Co-authored-by: XIn Li <xinli@nvidia.com>	2025-08-07 19:18:22 -07:00
Zhiyu	d57dc2364e	Add ModelOpt Qwen3 nvfp4 support (#20101 ) Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>	2025-08-07 19:18:19 -07:00
Andrew Sansom	e2c8f1edec	[PERF] Use pybase64 to more quickly decode prompt embeddings (#22469 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai>	2025-08-07 19:15:32 -07:00
TJian	1ee5ead5f8	[ROCm] [V1] [SpecDec] Enable Speculative Decoding on ROCm V1 Engine (#21496 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-08-07 19:13:17 -07:00
Ning Xie	acf8aeb79e	[Misc] normalize multiprocessing Queue usage (#22371 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-08-08 01:57:27 +00:00
Harry Mellor	7e3a8dc906	Remove `from_dict` from `SpeculativeConfig` (#22451 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-07 10:13:04 -07:00
Cyrus Leung	139d155781	[Frontend] Use engine argument to control MM cache size (#22441 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-07 09:47:10 -07:00
Cyrus Leung	8c9da6be22	[Core] Simplify mm processing cache (#22457 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-07 09:47:07 -07:00
Woosuk Kwon	399d2a10e2	Fix pre-commit error in main (#22462 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-07 08:54:39 -07:00
Chen Zhang	4815b00f54	[gpt-oss] Generate ResponseOutputItem from Harmony Message (#22410 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-07 08:33:25 -07:00
Chen Zhang	4da8bf20d0	[Tool] Fix auto tool call (#22434 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-07 07:03:38 -07:00
fxmarty-amd	7e0b121812	[Bugfix] Add missing `packed_modules_mapping` to `DeepseekV2ForCausalLM` (#22352 ) Signed-off-by: Felix Marty <Felix.Marty@amd.com>	2025-08-07 06:30:48 -07:00
Cyrus Leung	766bc8162c	[Core] Store only the keys for multi-modal data in P0 (#22198 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-07 01:45:04 -07:00
WeiQing Chen	289b18e670	[Docs] Update features/disagg_prefill, add v1 examples and development (#22165 ) Signed-off-by: David Chen <530634352@qq.com>	2025-08-07 00:59:23 -07:00
Andrew Chan	35171b1172	[Doc] update docs for nightly benchmarks (#12022 ) Signed-off-by: Andrew Chan <andrewkchan.akc@gmail.com>	2025-08-07 00:29:45 -07:00
Ricardo Decal	a2c6696bfe	[Docs] Factor out troubleshooting to its own guide; add section for Ray Observability (#21578 ) Signed-off-by: Ricardo Decal <rdecal@anyscale.com>	2025-08-07 00:29:13 -07:00
Yong Hoon Shin	5e8398805e	[Doc] Fix link to prefix caching design (#22384 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-08-07 00:28:15 -07:00
Woosuk Kwon	136825de75	[Misc] Enhance code formatting in mxfp4.py (#22423 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-07 00:26:24 -07:00
JaceyShao	c2dba2dba8	Add H20-3e fused MoE kernel tuning configs for GLM-4.5 (#22433 ) Signed-off-by: shaojunqi <shaojunqi.sjq@alibaba-inc.com> Co-authored-by: shaojunqi <shaojunqi.sjq@alibaba-inc.com>	2025-08-07 00:24:47 -07:00
Harry Mellor	434d2f3f7a	[Docs] Add missing dependency for docs build (#22435 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-07 00:22:07 -07:00
Adrián García García	8e8e0b6af1	feat: Add --enable-log-outputs flag for logging model generations (#20707 ) Signed-off-by: Adrian Garcia <adrian.garcia@inceptionai.ai>	2025-08-06 23:10:13 -07:00
Ming Yang	82216dc21f	[Misc] Support routing logic simulation (#21990 ) Signed-off-by: Ming Yang <minos.future@gmail.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-06 23:06:20 -07:00
Moritz Sanft	370661856b	[Frontend] Update OpenAI error response to upstream format (#22099 ) Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>	2025-08-06 23:06:00 -07:00
vllmellm	cbc8457b26	[Model] Switch to Fused RMS norm in Qwen2.5_VL model. (#22184 ) Signed-off-by: kf <kuanfu.liu@embeddedllm.com> Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: kf <kuanfu.liu@embeddedllm.com>	2025-08-06 23:05:24 -07:00
lkchen	4d4297e8fe	[Bench] Split serve.py:main into async/async versions (#22405 ) Signed-off-by: Linkun <github@lkchen.net>	2025-08-06 23:05:07 -07:00
wang.yuqi	2a4c825523	[CI] Skip the pooling models that do not support transformers v4.55 (#22411 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-08-06 23:05:03 -07:00
WeiQing Chen	4be02a3776	[Bugfix] EPLB load statistics problem (#22167 ) Signed-off-by: ycyaw66 <497410282@qq.com> Signed-off-by: David Chen <530634352@qq.com> Co-authored-by: ycyaw66 <497410282@qq.com>	2025-08-07 04:07:54 +00:00

1 2 3 4 5 ...

8400 Commits