youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
ZiTian.Zhao	e27d25a0dc	[fix] fix correct assertion syntax error in attention utils. (#22154 ) Signed-off-by: zitian.zhao <zitian.zhao@tencentmusic.com>	2025-08-03 19:24:02 -07:00
Seiji Eicher	6f5478298d	Use `aiohttp` connection pool for benchmarking (#21981 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-08-03 19:23:32 -07:00
Isotr0py	6a39ba85fe	[Bugfix] Fix failing multimodal standard test (#22153 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-03 19:04:38 +00:00
Yuxuan Zhang	d3c18c9cb0	fuse fp32 for GLM-4.5 e_score_correction_bias (#22143 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>	2025-08-03 09:04:54 -07:00
TankNee	83f7bbb318	Add chat doc in quick start (#21213 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-08-03 07:47:55 -07:00
Li, Jiang	b5dfb94fa0	[CI/Build][Bugfix] Fix Qwen2.5 tests in CPU CI via fallback silu_and_mul to torch native implementation (#22145 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-08-03 05:34:04 -07:00
Woosuk Kwon	6d98843b31	[Responses API] Disable response store by default (#22137 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-03 04:04:21 -07:00
David Ben-David	aefeea0fde	[V1] [P/D] Refactor KV Connector Path (#21980 ) Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com>	2025-08-03 04:03:40 -07:00
H	24d1dffbeb	[executor] feat: add supports_pp attr to executors (#21786 ) Signed-off-by: Haibin Lin <haibin.lin@bytedance.com>	2025-08-03 18:04:45 +08:00
Ning Xie	7de45db9a5	[Misc] update doc comment for send (#22026 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-08-03 00:55:20 -07:00
Roberto L. Castro	789562c28c	Support CUTLASS NVFP4 (w4a4) for Blackwell Geforce GPUs (SM120) (#21309 ) Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>	2025-08-03 00:54:22 -07:00
Ye (Charlotte) Qi	3f36c325fa	[Benchmark] Support ready check timeout in `vllm bench serve` (#21696 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Co-authored-by: Roger Wang <hey@rogerw.me>	2025-08-03 00:52:38 -07:00
Isotr0py	3dddbf1f25	[Misc] Add tensor schema test coverage for multimodal models (#21754 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-08-03 00:52:14 -07:00
jiahanc	337eb23bcc	[Fix] Fix llama4 modelopt weight loading error (#22107 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-08-03 00:50:34 -07:00
Rui Qiao	2ff46b8826	[Misc] Bump ray to 2.48.0 (#22123 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-08-02 19:42:00 -07:00
Xiao	554df8a6a2	Revert "[compile][startup] Disable C++ compilation of symbolic shapes" (#22122 ) Signed-off-by: Xiao Liu <xiszishu@gmail.com>	2025-08-02 09:03:30 -07:00
Yan Ma	73e1b9b1d4	[xpu]support moe models on XPU platform (#21643 ) Signed-off-by: yan <yan.ma@intel.com> Signed-off-by: Yan Ma <yan.ma@intel.com>	2025-08-02 07:49:08 -07:00
Thomas Parnell	4abfd8796f	[V1] [Hybrid] Validate compatibility of attention backend batch reordering at init time (#21557 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-02 05:29:40 -07:00
Cyrus Leung	f5d0f4784f	[Frontend] Improve error message for too many mm items (#22114 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-02 02:20:38 -07:00
Chih-Chieh Yang	b690e34824	[Model] Mamba2 preallocate SSM output tensor to avoid d2d copy overhead (#21075 ) Signed-off-by: Chih-Chieh Yang <7364402+cyang49@users.noreply.github.com> Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>	2025-08-02 01:59:34 -07:00
Yuxuan Zhang	25373b6c6c	for glm-4.1V update (#22000 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-08-02 01:46:57 -07:00
Vadim Gimpelson	58eee5f2e0	[PERF] Use faster way of decode in tokenizer: avoid useless list-to-list conversion (#20000 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai>	2025-08-02 01:43:52 -07:00
Roger Wang	067c34a155	docs: remove deprecated disable-log-requests flag (#22113 ) Signed-off-by: Roger Wang <hey@rogerw.me>	2025-08-02 00:19:48 -07:00
Chih-Chieh Yang	c64861d63c	[Bugfix] Mamba2 remove bugged initial state condition in chunk scan (#22034 ) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>	2025-08-01 23:55:57 -07:00
Yong Hoon Shin	8564dc9448	Fix test_kv_sharing_fast_prefill flakiness (#22038 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-08-01 23:55:34 -07:00
Rui Qiao	4ac8437352	[Misc] Getting and passing ray runtime_env to workers (#22040 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-08-01 23:54:40 -07:00
vllmellm	d3a6f2120b	[FEAT][ROCm] Enable running Flash Attention as ViT attn backend for Qwen-VL models on ROCm platform. (#22069 ) Signed-off-by: tjtanaavllm <tunjian.tan@amd.com> Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: tjtanaavllm <tunjian.tan@amd.com>	2025-08-01 23:53:18 -07:00
Sage Moore	0edaf752d7	[Attention][DBO] Add support for "splitting" the CommonAttentionMetadata (#21153 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-08-01 19:47:53 -07:00
Wentao Ye	6e8d8c4afb	[Test] Add Unit Test for Batched DeepGEMM (#21559 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-02 10:45:46 +08:00
Nick Hill	8d524ce79f	[BugFix] Improve internal DP load balancing (#21617 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-01 19:45:27 -07:00
Dipika Sikka	9f9c38c392	[Speculators][Speculative Decoding] Add Qwen Eagle3 Support (#21835 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>	2025-08-01 19:43:37 -07:00
Varun Sundar Rabindranath	a65f46be5e	[Misc] DeepGemmExperts : Avoid JIT generation in the hot-path (#21955 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-08-01 19:42:03 -07:00
Nicolò Lucchesi	57393715e8	[Misc] `VLLM_TARGET_DEVICE.lower()` (#22101 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-01 19:41:40 -07:00
vllmellm	ee2eb6ecd8	[Model] Qwen2.5 VL SiLU-and-Mul (#22066 ) Signed-off-by: kf <kuanfu.liu@embeddedllm.com> Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: kf <kuanfu.liu@embeddedllm.com>	2025-08-01 19:34:37 -07:00
fhl2000	23322431c8	[V1][CUDA] Full cudagraph support for FlashInfer (#21367 )	2025-08-01 21:49:34 -04:00
JartX	3654847db5	feat: Add Support GPTQ Quantization MOE on ROCM vllm serve (#21733 )	2025-08-01 21:12:19 -04:00
Wentao Ye	eefbf4a68b	[Perf] Optimize `reshape_and_cache_flash` CUDA Kernel (#22036 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-01 19:18:51 -04:00
Michael Goin	88faa466d7	[CI] Initial tests for SM100 Blackwell runner (#21877 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-01 16:18:38 -07:00
Nick Hill	881e1af43a	[BugFix] Harden distributed DP startup (#21538 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-01 21:40:45 +00:00
XiongfeiWei	d84b97a3e3	Add lora test for tp>1 case for TPU. (#21970 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-08-01 18:56:08 +00:00
Rui Qiao	d331759488	Introduce RayPPCommunicator for ray-based PP (#21660 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-08-01 11:50:58 -07:00
Animesh Jain	9659bc7f27	[compile][startup] Disable C++ compilation of symbolic shapes (#20836 ) Signed-off-by: Animesh Jain <anijain@umich.edu>	2025-08-01 10:38:52 -07:00
Michael Goin	3277e8f9e1	Fix pre-commit failure for SECURTIY.md (#22102 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-01 10:36:07 -07:00
Jee Jee Li	8d705996df	[Misc] Minor enhancement of benchmark_moe (#22068 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-02 01:35:30 +08:00
Harry Mellor	38c8bce8b6	Enable headless models for pooling in the Transformers backend (#21767 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-01 10:31:29 -07:00
Varun Sundar Rabindranath	ac45c44d98	[Bugfix] [Performance] DeepEPHighThroughput + DeepSeek : Quant before Dispatch (#21837 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-08-01 10:14:38 -07:00
Huzaifa Sidhpurwala	d6664664b4	security policy: take 1 (#21119 ) Signed-off-by: Huzaifa Sidhpurwala <huzaifas@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-08-01 10:09:49 -07:00
rongfu.leng	b879ecd6e2	[Bugfix] fix when skip tokenizer init (#21922 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-08-01 10:09:36 -07:00
Isotr0py	3f8e952179	[Bugfix] Fix glm4.1v video inference issue (#22067 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-08-01 09:33:30 -07:00
Harry Mellor	326a1b001d	Improve documentation of `ModelConfig.try_get_generation_config` to prevent future confusion (#21526 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-01 09:32:27 -07:00

1 2 3 4 5 ...

8258 Commits