youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
mgoin	728c365e4d	Use uv to install python in Dockerfile Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-02 11:05:47 -04:00
Thomas Parnell	be8921fbba	Change size of single CUDA graph for CI to 4 (#26089 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-10-02 14:14:28 +00:00
Huy Do	d4e7a1152d	Update base image to 22.04 (jammy) (#26065 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-10-02 05:48:04 -07:00
pwschuurman	be22bb6f3d	Run:ai model streamer add GCS package support (#24909 ) Signed-off-by: Peter Schuurman <psch@google.com>	2025-10-01 20:59:13 -07:00
Nick Hill	169313b9f8	[Misc] Make handling of SamplingParams clearer in n>1 case (#26032 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-01 19:31:39 -07:00
Gregory Shtrasberg	0b018d8baf	[ROCm][Bugfix] Add missing parameter to ROCm backend (#26029 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-10-01 19:23:14 -07:00
Jerry Zhang	c31246800c	Support RL online quantization with torchao (#23014 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-10-01 16:39:29 -07:00
Lucas Wilkinson	4134312b35	[BugFix] ChunkedLocalAttention is currently not CG compatible (#26034 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-10-01 16:28:00 -07:00
Wentao Ye	da554f932e	[Bug] Fix Negative Cuda Memory Usage (#25683 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-01 18:16:26 -04:00
Hosang	aac622e0cd	[ROCm][Build] Add support for AMD Ryzen AI MAX / AI 300 Series (#25908 ) Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>	2025-10-01 21:39:49 +00:00
Lucas Wilkinson	1726e93ef1	[BugFix][DP/EP] Fix CUTLASS MLA hang under load (#26026 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2025-10-01 12:30:00 -07:00
Michael Goin	ee04c0cd04	[CI] Tweaks to GPT-OSS Eval (Blackwell) for stability (#26030 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-01 12:02:17 -07:00
Huamin Li	c36f0aa300	Fix test_mamba_ssm_ssd.py due to missing _query_start_loc_to_chunk_indices_offsets (#25995 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-10-01 18:18:36 +00:00
Johnny	5234dc7451	[NVIDIA] Blackwell Family (#24673 ) Signed-off-by: Johnny <johnnynuca14@gmail.com> Signed-off-by: johnnynunez <johnnynuca14@gmail.com> Signed-off-by: Johnny <johnnync13@gmail.com> Signed-off-by: Salvatore Cena <cena@cenas.it> Co-authored-by: Aidyn-A <31858918+Aidyn-A@users.noreply.github.com> Co-authored-by: Salvatore Cena <cena@cenas.it>	2025-10-01 10:50:54 -07:00
Kenichi Maehashi	3b7c20a6b5	[Bugfix] Apply same sampling parameters for both `n=1` and `n>1` (#26005 ) Signed-off-by: Kenichi Maehashi <maehashi@preferred.jp>	2025-10-01 14:37:35 +00:00
Nathan Scott	f9e714813a	[Benchmark] Finish documented v0.11.0 deprecation of --endpoint-type (#26007 ) Signed-off-by: Nathan Scott <nathans@redhat.com>	2025-10-01 12:41:57 +00:00
billishyahao	2518230d3e	[MISC] Fix misleading batch_size_capture_list when cuda_graph_sizes < 4 (#25829 ) Signed-off-by: billishyahao <bill.he@amd.com> Co-authored-by: Luka Govedic <ProExpertProg@users.noreply.github.com>	2025-10-01 08:39:45 -04:00
Harry Mellor	a332b84578	[CI] Only capture a single CUDA graph size in CI by default (#25951 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-01 10:03:44 +01:00
Cyrus Leung	1405f0c7ba	[Misc] Factor out common `_apply_feature_select_strategy` (#26003 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-01 01:31:03 -07:00
Wenlong Wang	84d57342b6	[BugFix][MM] Fix Nonetype error when video is cache in qwen2.5-omni-thinker (#26004 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-10-01 08:03:25 +00:00
nadathurv	57b46d769e	[Doc] updating torch.compile doc link (#25989 ) Signed-off-by: nadathurv <work.vnadathur@gmail.com> Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com> Co-authored-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com>	2025-10-01 07:04:56 +00:00
Lucia Fang	f48b6a03ba	[Misc]allow disable pynccl (#25421 ) Signed-off-by: Lu Fang <fanglu@fb.com> Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>	2025-10-01 06:04:13 +00:00
Harry Mellor	2a69ab4899	Update to Transformers `v4.56.2` (#24638 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-30 22:07:07 -07:00
Lucas Wilkinson	8d7da92fd7	[BugFix] Fix default kv-cache-dtype default for DeepseekV3.2 (#25988 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-09-30 21:58:31 -07:00
Zhewen Li	e952eee698	[Bugfix] Fix `__syncwarp` on ROCM (#25996 )	2025-09-30 21:15:11 -07:00
Roger Wang	66bca9b8bd	[MM] Add text-only mode for Qwen3-VL (#26000 )	2025-09-30 21:13:42 -07:00
Param	99028fda44	Fix INT8 quantization error on Blackwell GPUs (SM100+) (#25935 ) Signed-off-by: padg9912 <phone.and.desktop@gmail.com>	2025-09-30 19:19:53 -07:00
Wentao Ye	1244948885	[Log] Optimize Log for FP8MOE (#25709 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-30 19:18:43 -07:00
Salvatore Cena	a73f6491c8	Update launch_bounds_utils.h for correct compile on Multiple Cuda Arch - PTXAS out of range Warning (#25843 ) Signed-off-by: Salvatore Cena <cena@cenas.it> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-30 19:18:19 -07:00
Lucia Fang	001e50c92c	[Model] MTP fallback to eager for DeepSeek v32 (#25982 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-10-01 01:53:22 +00:00
Lucas Wilkinson	96ebcaa3ad	[Misc] Make EP kernels install script support uv (#25785 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-09-30 23:38:34 +00:00
Andrew Xia	5db1870bb9	[gpt-oss] use vLLM instead of openai types for streaming (#25186 ) Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-09-30 22:47:07 +00:00
Harry Mellor	2ce26b9b5d	[Docs] Remove API Reference from search index (#25949 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-30 22:10:02 +00:00
Harry Mellor	a388252ac4	Add explicit pooling classes for the Transformers backend (#25322 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-30 23:07:06 +01:00
David Ben-David	9a9f48dff7	[V1] [P/D] Add Support for KV Load Failure Recovery (#19330 ) Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com>	2025-09-30 14:57:08 -07:00
Jee Jee Li	67f3fb0844	[Bench] Add DeepSeekV32 to MoE benchmark (#25962 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-30 14:13:48 -07:00
cjackal	43b752c325	[Llama4] [multimodal] Fix misplaced dtype cast of `cos_sin_cache` in `Llama4VisionRotaryEmbedding` (#25889 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2025-09-30 20:35:15 +00:00
Or Ozeri	cfd302db9b	OffloadingConnector: Fix GPU block tracking bug (#25856 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-09-30 19:53:04 +00:00
bnellnm	fb610ae684	[Docs] Add moe kernel features doc (#25297 ) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-30 19:03:15 +00:00
Cyrus Leung	2f652e6cdf	[Doc] Improve MM Pooling model documentation (#25966 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-30 18:58:29 +00:00
Wentao Ye	e6a226efba	[Bug] Fix AttributeError: 'QKVParallelLinear' object has no attribute 'orig_dtype' (#25958 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-30 11:13:03 -07:00
youkaichao	a2e6fa7e03	[bugfix][deepseek] fix flashmla kernel selection (#25956 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-10-01 00:30:36 +08:00
Cyrus Leung	9f1c4ecaf2	[Bugfix] Token type and position embeddings fail to be applied to `inputs_embeds` (#25922 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-01 00:23:12 +08:00
Pavani Majety	ef283548f7	[Bugfix] Fix accuracy issue of TRTLLM FP8 MOE and improve logging (#25895 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-09-30 10:51:31 -04:00
Anion	f4db5e6de1	[Bugfix][Model] Fix inference for Hunyuan dense models (#25354 ) Signed-off-by: anion <1005128408@qq.com> Signed-off-by: Anion <123177548+Anionex@users.noreply.github.com>	2025-09-30 14:38:07 +00:00
Sergio Paniego Blanco	099aaee536	Add Hugging Face Inference Endpoints guide to Deployment docs (#25886 ) Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-30 14:35:06 +00:00
Asaf Joseph Gardin	35fe398c7c	[Kernel][Moe Configs] Add more tuned triton configs for ExpertsInt8 and FP8 (#25858 ) Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>	2025-09-30 07:30:44 -07:00
ihb2032	bb6d43047e	[Fix] Improve CPU backend compatibility for RISC-V (#25816 ) Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn> Signed-off-by: ihb2032 <1355790728@qq.com>	2025-09-30 13:48:07 +00:00
Reza Barazesh	bc546f76a1	[CI] Move applicable tests to CPU (#24080 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-30 14:45:20 +01:00
Nicolò Lucchesi	80608ba5af	[NIXL] Add support for MLA caches with different latent dim (#25902 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-09-30 12:18:29 +00:00

1 2 3 4 5 ...

10059 Commits