youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Wentao Ye	245e4f2c01	[Feature] Batch Invariant: Support DeepGEMM and Blackwell (#27127 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-18 09:28:05 -04:00
iAmir97	1d165d6d85	[Chore] Separate out `vllm.utils.mem_utils` (#27143 ) Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com> Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com> Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-18 10:06:59 +00:00
dongbo910220	83004020fd	[Test] Add test for /health endpoint on engine failure (#26074 ) Signed-off-by: dongbo910220 <1275604947@qq.com>	2025-10-18 09:59:05 +00:00
Chendi.Xue	12e21701e7	[DOC][FEATURES][CPU]update cpu feature for v1 (#27135 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2025-10-18 01:10:45 -07:00
Varun Sundar Rabindranath	30a33b92ee	[Misc] Rev DeepEP (#27122 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-18 14:54:29 +08:00
Hanchenli	7c572544e4	[GPT-OSS] Structure_Tag support for gpt-oss tool-call in cot (#25515 ) Signed-off-by: Hanchenli <lihanc2002@gmail.com> Signed-off-by: Hanchenli <61769611+Hanchenli@users.noreply.github.com> Signed-off-by: Wei Wei <wwei6@meta.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Wei Wei <wwei6@meta.com> Co-authored-by: Wei Wei <weiweinpu@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-17 21:55:54 -07:00
Huamin Li	c312320764	[CI/Build] tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache layout in backend-correctness tests (#26663 ) Signed-off-by: Huamin Li <3ericli@gmail.com> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-10-17 21:11:26 -07:00
ZiTian Zhao	c981f0ea78	[Perf] Add H100 fused MoE config (#25398 ) Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>	2025-10-18 02:21:27 +00:00
Lehua Ding	6367bde739	[BugFix][Core] Fix error when enable async-scheduling in multi-node env (#25887 ) Signed-off-by: Lehua Ding <lehuading@tencent.com> Signed-off-by: Lehua Ding <lehuading@qq.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-10-17 22:16:18 +00:00
Wentao Ye	f50cc221ea	[Test] Make `test_failure` more stable for batch invariance (#27054 )	2025-10-17 16:59:08 -04:00
Pradyun92	acedc74b1a	[V1][Spec Decode] Fix greedy temperature detection after sampler refactor (#27077 ) Signed-off-by: Pradyun Ramadorai <pradyunr@amazon.com> Co-authored-by: Pradyun Ramadorai <pradyunr@amazon.com>	2025-10-17 13:27:47 -07:00
Zhuohan Li	d29483b58a	[Minor] Remove unnecessary error message (#27115 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-10-17 20:02:12 +00:00
Michael Goin	950cf9e58e	[Bugfix] Use PIECEWISE cudagraphs on Blackwell if max_model_len > 131072 (#27114 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-17 19:47:18 +00:00
Isotr0py	3125d79950	[Chore] Remove unused `PolyNorm` layer (#27110 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-17 19:03:43 +00:00
vllmellm	e33ee23ee3	[Bugfix] [AITER] [ROCm] Fix Quark MoE Quant Config and AITER Fused MoE quant type logic (#27029 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-10-17 12:51:10 -06:00
rasmith	b10c64c834	[ROCm][Bugfix][Model] Fix illegal memory access when running qwen3_moe models with rms_norm (Qwen3-235B-A22B, Qwen3-30B-A3B, etc.) (#26192 ) Signed-off-by: Randall Smith <ransmith@amd.com> Signed-off-by: Randall Smith <Randall.Smith@amd.com> Signed-off-by: rasmith <Randall.Smith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-17 14:17:18 -04:00
Aleksandr Malyshev	0925b28a8e	[ROCM] MoE fp4 CK kernel (#26545 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2025-10-17 14:06:33 -04:00
Nicolò Lucchesi	99722d5f0e	[CI] Remove forbidden slash (#27112 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-10-17 09:38:00 -07:00
燃	4c91a28e30	[bugfix] Qwen3-VL fix video incorrect timestamp calculations while do_sample_frames=True (#27104 ) Co-authored-by: 松灵 <wpf272043@alibaba-inc.com>	2025-10-17 16:26:33 +00:00
Patrick von Platen	b038d9c40c	[Data-parallel] Allow DP>1 for world_size > num_gpus on node (8) (#26367 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Rui Qiao <ruisearch42@gmail.com>	2025-10-17 08:24:42 -07:00
Nicolò Lucchesi	2ba60ec7fe	[CI] Nixl integration tests (#27010 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-10-17 07:13:31 -07:00
Luka Govedič	bd7157a071	[torch.compile] Enable attention and allreduce fusion without custom ops enabled (#24604 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-17 08:10:23 -06:00
Yongtao Huang	be429d0cfd	Fix incorrect docstring for stop_profile() method (#27101 ) Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com>	2025-10-17 06:30:23 -07:00
Reima Karhila (AMD)	c253745eb8	[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI350 and MI355 (#25586 ) Signed-off-by: Reima Karhila <reima.karhila@amd.com> Signed-off-by: xaguilar <Xavier.AguilarFruto@amd.com> Co-authored-by: xaguilar <Xavier.AguilarFruto@amd.com>	2025-10-17 04:56:12 -07:00
Jee Jee Li	daec4d2624	[Model]Improve Qwen3VLMoeForConditionalGeneration packed_modules_mapping (#27096 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-17 04:47:00 -07:00
Harry Mellor	6c9fdbf725	[Docs] Replace `rst` style double-backtick with `md` single-backtick (#27091 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-17 02:47:34 -07:00
Harry Mellor	483ea64611	[Docs] Replace all explicit anchors with real links (#27087 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-17 02:22:06 -07:00
Mengqing Cao	e20eba753b	[VLM][Refactor] Remove useless func `get_input_positions` in `MRotaryEmbedding` (#27088 ) Signed-off-by: MengqingCao <cmq0113@163.com>	2025-10-17 02:00:30 -07:00
cong-meta	bbc1b29665	Update troubleshooting.md and remind VLLM_TRACE_FUNCTION usage (#27069 ) Signed-off-by: cong-meta <prowindy@hotmail.com>	2025-10-17 01:53:06 -07:00
Chauncey	acb1bfa601	[CI] fix docs build failed (#27082 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-10-17 07:53:40 +00:00
zhrrr	75c7ad9918	[Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel (#26717 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com> Signed-off-by: izhuhaoran <izhuhaoran@qq.com>	2025-10-17 07:30:35 +00:00
Li, Jiang	5550ff9c25	[CI/Build] Update compressed tensor test path to fix CPU CI (#27068 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-10-16 22:34:56 -07:00
Said Taghadouini	3aeb19a39e	[Model] Add support for LightOnOCR (#26916 ) Signed-off-by: Said Taghadouini <taghadouinisaid@gmail.com> Signed-off-by: Said Taghadouini <84044788+staghado@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-10-17 05:05:24 +00:00
Cyrus Leung	8c017b3490	[Model] Always use Transformers backend for PaliGemma and Gemma3-MM (#26715 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-17 05:03:35 +00:00
Zhewen Li	9c2c2287a0	[CI/Build] Update Llama4 eval yaml (#27070 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-10-17 04:59:47 +00:00
Jee Jee Li	fec2b341ad	[Kernel] Lazy import FlashInfer (#26977 )	2025-10-17 04:48:18 +00:00
Jee Jee Li	87bc0c492f	[Bugfix] Fix ReplicatedLinearWithLoRA (#27065 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-17 04:43:16 +00:00
Nick Hill	fe3b9372ad	[Core] Change `execute_model_with_error_logging()` to be a ctx manager (#27060 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-17 11:45:32 +08:00
Tao He	bde9e2272a	[Bugfix][Qwen] fixes the weights dtype in qwen3_next: it is actually a bfloat16 (#27030 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>	2025-10-17 03:37:52 +00:00
Boyuan Feng	08405609cc	disable graph partition in custom op (#26952 ) Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-17 11:08:47 +08:00
Nick Hill	ab81379ea6	[Perf] Exploit out-of-band buffers in shm_broadcast (#26961 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-10-16 20:08:03 -07:00
Harry Mellor	4ffd6e8942	[Docs] Reduce custom syntax used in docs (#27009 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-16 20:05:34 -07:00
Tomas Ruiz	965c5f4914	vllm bench serve shows num of failed requests (#26478 ) Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>	2025-10-16 19:55:09 -07:00
Lukas Geiger	4d055ef465	Remove unused imports (#26972 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-10-16 19:51:17 -07:00
Boyuan Feng	17c540a993	[torch.compile] fix simple inductor graph partition test (#27050 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-10-16 21:09:36 -04:00
Cyrus Leung	4d4d6bad19	[Chore] Separate out `vllm.utils.importlib` (#27022 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-17 00:48:59 +00:00
Lucia Fang	11ae016bd7	[torch.compile] Passing only necessary compilation config to inductor pass config (#27041 ) Signed-off-by: Lu Fang <fanglu@fb.com> Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com>	2025-10-17 00:01:52 +00:00
jiahanc	41d3071918	[NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel (#26714 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-10-16 16:20:25 -07:00
Harry Mellor	fb5e10d3fb	Refactor Transformers backend to use mixins (#26906 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-16 21:50:39 +00:00
Bram Wasti	b2f78cbad4	[small][batch invariance] Rename the env and internal flags to simplify usage (#26855 ) Signed-off-by: Bram Wasti <bwasti@meta.com>	2025-10-16 21:40:25 +00:00

1 2 3 4 5 ...

10579 Commits