youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Kunshang Ji	fce10dbed5	[XPU] Add xpu torch.compile support (#22609 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-08-27 05:33:27 +00:00
Dipika Sikka	d272415e57	[Quantization] Expand compressed-tensors MoE matching logic to support NFP4 + FP8 MoEs (#22674 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Signed-off-by: Dipika <dipikasikka1@gmail.com>	2025-08-27 05:00:21 +00:00
Chen Zhang	142ac08030	[Frontend] Optimize beam search performance by limiting concurrency (#23599 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-27 04:59:14 +00:00
Chen Zhang	3210264421	[Frontend] Add --log-error-stack to print stack trace for error response (#22960 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-27 04:58:59 +00:00
CSWYF3634076	644d57d531	[Model] Add Ernie4.5 VL Model Support (#22514 ) Signed-off-by: wangyafeng <wangyafeng@baidu.com>	2025-08-26 21:02:55 -07:00
Chenheli Hua	c905684cfe	[Core] Asynchronous h2d in merge_multimodal_embeddings via pinned memory. (#23686 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-08-26 20:05:34 -07:00
Yiheng Xu	786835807b	[Bugfix]: Qwen3 Coder Tool Parser (#23099 ) Signed-off-by: Yiheng Xu <charlesyihengxu@gmail.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-08-26 19:58:32 -07:00
Wei	fecbb7c782	[Bugfix][gpt-oss] passing the cache config in gpt-oss (#23613 ) Signed-off-by: Wei Wei <wwei6@meta.com>	2025-08-27 02:54:23 +00:00
Harry Mellor	6dab89b8ec	[Docs] Fix math rendering in docs (#23676 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-26 18:47:08 -07:00
Michael Goin	de02b07db4	[Bugfix] Lazy import gpt_oss_triton_kernels_moe for mxfp4 (#23678 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-27 09:34:57 +08:00
Chen Zhang	eb1995167e	[gpt-oss] Enable unit test for response API harmony integration (#23533 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-26 18:23:26 -07:00
czhu-cohere	2c2b140ae8	[quantization] use channel scales for w4a8 + misc fixes (#23570 ) Signed-off-by: czhu-cohere <conway.zhu@cohere.com>	2025-08-26 18:23:23 -07:00
yzds	c7c80af084	fix pynccl reduce_scatter (#23648 ) Co-authored-by: hongchao <hongchao@msh.team>	2025-08-26 18:21:11 -07:00
wuhang	6891205b16	[Feature][Responses API] Support MCP tool in background mode (#23494 ) Signed-off-by: wuhang <wuhang6@huawei.com>	2025-08-27 01:06:58 +00:00
zixuanzhang226	b1625dbe9c	feat: add triton fused moe config for GLM-4.5-Air-FP8 on B200 (#23695 ) Signed-off-by: Zixuan Zhang <zixuanzhang@bytedance.com>	2025-08-26 18:06:10 -07:00
Federico	585e0bde36	[Bugfix] UnboundLocalError when GptOss reasoning specified (#23054 ) Signed-off-by: Federico <65908512+coval3nte@users.noreply.github.com>	2025-08-27 00:29:52 +00:00
Wentao Ye	714872f1a9	[Compile] Fix Cmake Warning (#23689 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-26 23:48:32 +00:00
Thomas Parnell	5f1af97f86	[V1] [Hybrid] Enable Full CUDA graph by default for hybrid models in V1 (#22594 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-26 23:28:55 +00:00
Zhonghua Deng	c3b0fd1ee6	[V1][P/D]P2pNcclConnector supports flashinfer (#23536 ) Signed-off-by: Abatom <abzhonghua@gmail.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-08-26 22:56:16 +00:00
Harry Mellor	6421b66bf4	[Docs] Move quant supported hardware table to README (#23663 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-26 22:26:46 +00:00
Huzaifa Sidhpurwala	2f13319f47	Enhance the pre-notification policy (#23532 ) Signed-off-by: Huzaifa Sidhpurwala <huzaifas@redhat.com>	2025-08-26 20:41:36 +00:00
Chen Zhang	d696f86e7b	[doc] Hybrid KV Cache Manager design doc (#22688 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-26 20:19:05 +00:00
Isotr0py	9816b81f5f	[Model] Enable video support for InternVL3.5 models (#23658 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-26 19:46:52 +00:00
Jiangyun Zhu	c37c0af990	[Misc] Fix comments in `tests/kernels/quantization` (#23675 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-08-26 19:31:20 +00:00
Cyrus Leung	9715f7bb0f	[Bugfix] Fix incorrect original shape in hashing (#23672 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-08-26 19:01:25 +00:00
Russell Bryant	98aa16ff41	[v1] Add cross-attention KV cache support for encoder-decoder models (#23664 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-08-26 18:49:06 +00:00
Thomas Parnell	227e231b55	[Docs] [V1] [Hybrid] Update docs to remove FlashInfer constraint for hybrid models (#23665 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-26 18:33:16 +00:00
Hyogeun Oh (오효근)	730d0ac8b9	[Docs] Fix warnings in `mkdocs build` (#23649 ) Signed-off-by: Zerohertz <ohg3417@gmail.com> Signed-off-by: Hyogeun Oh (오효근) <ohg3417@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-26 18:19:23 +00:00
Li, Jiang	9b0187003e	[Bugfix] Fix cuda event usage with CPU model runner (#23643 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-08-26 17:10:42 +00:00
vllmellm	44ac25eae2	[CI] [Doc]: Add GH Action for auto labeling issues with `rocm` tag (#20988 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-08-26 16:20:13 +00:00
nvjullin	7ea22e42d5	[Misc] Add override for allreduce fusion thresholds (#23639 ) Signed-off-by: Julien Lin <jullin@nvidia.com>	2025-08-26 15:53:04 +00:00
Yuekai Zhang	9d4183dd2e	[model] support qwen2audio embedding input (#23625 ) Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-26 23:48:08 +08:00
Yuekai Zhang	513298f1b4	[Bugfix] fix bf16 multimodal model hash (#23623 ) Signed-off-by: Yuekai Zhang <zhangyuekai@foxmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-08-26 23:47:50 +08:00
Harry Mellor	379f828fba	[Docs] Reduce requirements for docs build (#23651 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-26 15:43:28 +00:00
Hongxia Yang	1fdc732419	[ROCm] Starting to add AMD code reviewers for ROCm components (#23496 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>	2025-08-26 07:32:37 -07:00
TianyuLi0	f58675bfb3	[CPU] add cpu fused moe pytorch native implementation (#23146 ) Signed-off-by: Tianyu Li <tianyu.li@arm.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2025-08-26 14:09:17 +00:00
Didier Durand	7c04779afa	[Doc]: fix various spelling issues in multiple files (#23636 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-08-26 14:05:29 +00:00
nvjullin	f66673a39d	[Kernel] Added flashinfer fp8 per-tensor gemms (#22895 ) Signed-off-by: Julien Lin <jullin@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-26 06:54:04 -07:00
En Ouyang	b78bed1bc5	[Hardware][Mac] Fix the installation fail for Apple Silicon (CPU) (#23565 ) Signed-off-by: oye93 <en.ouyang93@outlook.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2025-08-26 13:04:25 +00:00
Harry Mellor	164b2273c8	[Docs] Fix broken links to `docs/api/summary.md` (#23637 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-26 13:00:18 +00:00
Chen Zhang	2b4fc9bd9b	Support FlashAttention Backend for Hybrid SSM Models (#23299 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-26 12:41:52 +00:00
Guillaume Calmettes	ebd5a77bb5	feat: add usage to TranscriptionResponse (text and json response_format) (#23576 ) Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>	2025-08-26 05:26:26 -07:00
Matúš Námešný	384dd1b0a8	[Bugfix] Add missing enable_log_outputs parameter to init_app_state function (#23634 ) Signed-off-by: Matúš Námešný <matus.namesny@ameria.com>	2025-08-26 12:13:15 +00:00
Jee Jee Li	fdeb3dac13	[Model] fix DeepSeek e_score_correction_bias dtype to fp32 (#23640 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-26 20:09:47 +08:00
Michael Goin	d52358c1e0	[Perf] Remove duplicated NVFP4 blockscales to save memory (#23379 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-26 19:16:33 +08:00
Huy Do	6ace2f72b0	Fix writing benchmark results with tuple keys (#23633 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-08-26 19:16:09 +08:00
Harry Mellor	b00e69f8ca	Fix nits from #20059 (#23548 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-26 03:27:20 -07:00
Cyrus Leung	50fede6634	[V1] Enable V1 for compute capability < 8.0 + FP32 (#23614 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-26 03:00:18 -07:00
Roger Wang	b5d34af328	[Bugfix] Fix scheduling when repeated images in one request (#23544 ) Signed-off-by: Roger Wang <hey@rogerw.me> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.me> Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>	2025-08-26 09:46:28 +00:00
Jee Jee Li	9b5f64238f	[Bugfix] Fix Qwen25VL packed_modules_mapping (#23604 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-26 01:09:14 -07:00

1 2 3 4 5 ...

8936 Commits