youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Sage Moore	6d76bd034a	revert kv connector fix Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-08-13 14:45:35 -04:00
yewentao256	9e16220e4e	fix ubatch datatype issue Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-13 10:52:56 -07:00
yewentao256	5215c80a49	Merge commit '6e8d8c4afbddf725b34ef938616701869f5b3462' into sage/dbo-full-cudagraphsh Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-13 10:15:08 -07:00
yewentao256	dd2a94fd9d	fix assert error num_tokens_across_dp is None or num_tokens_across_dp[dp_rank] == batchsize Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-11 13:31:27 -07:00
Sage Moore	e526b1c091	fix num_tokens_across_dp sizing issue Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-08-11 15:27:12 +00:00
yewentao256	44ead56ad5	fix set forward context error Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-11 14:07:29 +00:00
yewentao256	28e7c30b01	Fix pre-commit error Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-11 14:06:25 +00:00
Sage Moore	2cf200c5b8	remove debug logging Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-08-08 19:07:49 +00:00
Sage Moore	5bbfd95bdb	add support for multiple builders in the model runner Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-08-08 19:01:20 +00:00
Sage Moore	6b0c303ab4	misc fixes Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-08-05 19:23:23 +00:00
Sage Moore	4819bb8715	fix eager mode Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-08-05 18:01:25 +00:00
Sage Moore	0edaf752d7	[Attention][DBO] Add support for "splitting" the CommonAttentionMetadata (#21153 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-08-01 19:47:53 -07:00
Wentao Ye	6e8d8c4afb	[Test] Add Unit Test for Batched DeepGEMM (#21559 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-02 10:45:46 +08:00
Nick Hill	8d524ce79f	[BugFix] Improve internal DP load balancing (#21617 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-01 19:45:27 -07:00
Dipika Sikka	9f9c38c392	[Speculators][Speculative Decoding] Add Qwen Eagle3 Support (#21835 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>	2025-08-01 19:43:37 -07:00
Varun Sundar Rabindranath	a65f46be5e	[Misc] DeepGemmExperts : Avoid JIT generation in the hot-path (#21955 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-08-01 19:42:03 -07:00
Nicolò Lucchesi	57393715e8	[Misc] `VLLM_TARGET_DEVICE.lower()` (#22101 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-01 19:41:40 -07:00
vllmellm	ee2eb6ecd8	[Model] Qwen2.5 VL SiLU-and-Mul (#22066 ) Signed-off-by: kf <kuanfu.liu@embeddedllm.com> Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: kf <kuanfu.liu@embeddedllm.com>	2025-08-01 19:34:37 -07:00
fhl2000	23322431c8	[V1][CUDA] Full cudagraph support for FlashInfer (#21367 )	2025-08-01 21:49:34 -04:00
JartX	3654847db5	feat: Add Support GPTQ Quantization MOE on ROCM vllm serve (#21733 )	2025-08-01 21:12:19 -04:00
Wentao Ye	eefbf4a68b	[Perf] Optimize `reshape_and_cache_flash` CUDA Kernel (#22036 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-01 19:18:51 -04:00
Michael Goin	88faa466d7	[CI] Initial tests for SM100 Blackwell runner (#21877 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-01 16:18:38 -07:00
Nick Hill	881e1af43a	[BugFix] Harden distributed DP startup (#21538 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-01 21:40:45 +00:00
XiongfeiWei	d84b97a3e3	Add lora test for tp>1 case for TPU. (#21970 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2025-08-01 18:56:08 +00:00
Rui Qiao	d331759488	Introduce RayPPCommunicator for ray-based PP (#21660 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-08-01 11:50:58 -07:00
Animesh Jain	9659bc7f27	[compile][startup] Disable C++ compilation of symbolic shapes (#20836 ) Signed-off-by: Animesh Jain <anijain@umich.edu>	2025-08-01 10:38:52 -07:00
Michael Goin	3277e8f9e1	Fix pre-commit failure for SECURTIY.md (#22102 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-01 10:36:07 -07:00
Jee Jee Li	8d705996df	[Misc] Minor enhancement of benchmark_moe (#22068 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-02 01:35:30 +08:00
Harry Mellor	38c8bce8b6	Enable headless models for pooling in the Transformers backend (#21767 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-01 10:31:29 -07:00
Varun Sundar Rabindranath	ac45c44d98	[Bugfix] [Performance] DeepEPHighThroughput + DeepSeek : Quant before Dispatch (#21837 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-08-01 10:14:38 -07:00
Huzaifa Sidhpurwala	d6664664b4	security policy: take 1 (#21119 ) Signed-off-by: Huzaifa Sidhpurwala <huzaifas@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-08-01 10:09:49 -07:00
rongfu.leng	b879ecd6e2	[Bugfix] fix when skip tokenizer init (#21922 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-08-01 10:09:36 -07:00
Isotr0py	3f8e952179	[Bugfix] Fix glm4.1v video inference issue (#22067 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-08-01 09:33:30 -07:00
Harry Mellor	326a1b001d	Improve documentation of `ModelConfig.try_get_generation_config` to prevent future confusion (#21526 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-01 09:32:27 -07:00
Harry Mellor	2d7b09b998	Deprecate `--disable-log-requests` and replace with `--enable-log-requests` (#21739 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-01 17:16:37 +01:00
David Xia	97608dc276	[Docs] use `uv` in CPU installation docs (#22089 ) Signed-off-by: David Xia <david@davidxia.com>	2025-08-01 07:55:55 -07:00
Nick Hill	3146519add	[BugFix] Don't change title of top-level process (#22032 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-01 07:37:55 -07:00
Richard Zou	8026a335a1	[BugFix] Update AttnFusionPass cache key (#21947 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2025-08-01 07:11:29 -07:00
Wentao Ye	a59cd9d9f7	[Refactor] Fix Compile Warning #1444-D (#21462 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-01 06:10:30 -07:00
Abirdcfly	5c54d9759d	[Bugfix][PD] set max_completion_tokens=1 if req has this value (#21841 ) Signed-off-by: Abirdcfly <fp544037857@gmail.com>	2025-08-01 06:08:45 -07:00
Gamhang	0a6d305e0f	feat(multimodal): Add customizable background color for RGBA to RGB conversion (#22052 ) Signed-off-by: Jinheng Li <ahengljh@gmail.com> Co-authored-by: Jinheng Li <ahengljh@gmail.com>	2025-08-01 06:07:33 -07:00
Michael Goin	f81c1bb055	[Bugfix] Check NVIDIA artifactory is accessible before using flashinfer cubin kernels (#21893 )	2025-08-01 08:28:45 -04:00
Harry Mellor	fb0e0d46fc	Fix `get_kwargs` for case where type hint is `list[Union[str, type]]` (#22016 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-01 05:26:42 -07:00
TJian	26b5f7bd2a	[BUG] [ROCm] Fix import bug on ROCm (#22083 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-08-01 05:25:20 -07:00
Dipika Sikka	dfbc1f8880	[Speculative Decoding] Add `speculators` config support (#21345 )	2025-08-01 08:25:18 -04:00
Harry Mellor	87c94bc879	Revert "Update sampling_metadata.py (#21937 )" (#22088 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-01 05:24:46 -07:00
Jee Jee Li	28b18cc741	[Quantization] Enable BNB support for InternS1 (#21953 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-01 11:09:54 +00:00
WeiQing Chen	4931486988	[Doc] Added warning of speculating with draft model (#22047 ) Signed-off-by: Dilute-l <dilu2333@163.com> Co-authored-by: Dilute-l <dilu2333@163.com>	2025-08-01 02:11:56 -07:00
Woosuk Kwon	0f81b310db	[Misc] Remove upper bound in openai package version (#22060 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-08-01 02:11:40 -07:00
wuhang	e6680f9e25	[Bugfix] Add log prefix in non-dp mode engine core (#21889 ) Signed-off-by: wuhang <wuhang6@huawei.com>	2025-08-01 09:04:16 +00:00

1 2 3 4 5 ...

8370 Commits