|
|
0a6d305e0f
|
feat(multimodal): Add customizable background color for RGBA to RGB conversion (#22052)
Signed-off-by: Jinheng Li <ahengljh@gmail.com>
Co-authored-by: Jinheng Li <ahengljh@gmail.com>
|
2025-08-01 06:07:33 -07:00 |
|
|
|
fb0e0d46fc
|
Fix get_kwargs for case where type hint is list[Union[str, type]] (#22016)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-01 05:26:42 -07:00 |
|
|
|
dfbc1f8880
|
[Speculative Decoding] Add speculators config support (#21345)
|
2025-08-01 08:25:18 -04:00 |
|
|
|
82de9b9d46
|
[Misc] Automatically resolve HF processor init kwargs (#22005)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-31 22:44:10 -07:00 |
|
|
|
ad57f23f6a
|
[Bugfix] Fix: Fix multi loras with tp >=2 and LRU cache (#20873)
Signed-off-by: charent <19562666+charent@users.noreply.github.com>
|
2025-07-31 19:48:13 -07:00 |
|
|
|
3700642013
|
[Refactor] Remove Duplicate per_block_cast_to_fp8, Remove Dependencies of DeepGEMM (#21787)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-08-01 01:13:27 +00:00 |
|
|
|
e360316ab9
|
Add DeepGEMM to Dockerfile in vllm-base image (#21533)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-31 18:01:55 -07:00 |
|
|
|
6e672daf62
|
Add FlashInfer allreduce RMSNorm Quant fusion (#21069)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-07-31 13:58:38 -07:00 |
|
|
|
71470bc4af
|
[Misc] Add unit tests for chunked local attention (#21692)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-31 11:39:16 -07:00 |
|
|
|
9e0726e5bf
|
[Meta] Official Eagle mm support, first enablement on llama4 (#20788)
Signed-off-by: morgendave <morgendave@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-07-31 10:35:07 -07:00 |
|
|
|
9484641616
|
[Model] Add step3 vl (#21998)
Signed-off-by: oliveryuan <yuansong@step.ai>
Co-authored-by: oliveryuan <yuansong@step.ai>
|
2025-07-31 23:19:06 +08:00 |
|
|
|
5daffe7cf6
|
[BugFix] Fix case where collective_rpc returns None (#22006)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-31 12:51:37 +00:00 |
|
|
|
2836dd73f1
|
[Model][CI] Let more pooling models support v1 (#21747)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-31 01:51:15 -07:00 |
|
|
|
3e36fcbee6
|
[Bugfix]: fix metadata file copy in test_sharded_state_loader (#21830)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-07-31 06:22:11 +00:00 |
|
|
|
055bd3978e
|
[CI Bugfix] Fix CI OOM for test_shared_storage_connector_hashes (#21973)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-31 11:45:29 +08:00 |
|
|
|
ca9e2be3ed
|
[Core] Move EngineCoreRequest to Request conversion out of EngineCore (#21627)
Signed-off-by: linzebing <linzebing1995@gmail.com>
|
2025-07-30 15:00:54 -07:00 |
|
|
|
287f527f54
|
[Feature] Add async tensor parallelism for scaled mm (#20155)
Signed-off-by: cascade812 <cascade812@outlook.com>
|
2025-07-30 17:23:41 -04:00 |
|
|
|
56bd537dde
|
[Misc] Support more collective_rpc return types (#21845)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-30 10:20:20 -07:00 |
|
|
|
f4135232b9
|
feat(distributed): add get_required_kvcache_layout class method to kv connector api (#20433)
Signed-off-by: wxsm <wxsms@foxmail.com>
|
2025-07-30 16:41:51 +00:00 |
|
|
|
4904e53c32
|
[Bugfix] SharedStorage Connector for V1 PD multimodal (#21611)
Signed-off-by: fake0fan <645327136@qq.com>
Signed-off-by: herotai214 <herotai214@gmail.com>
Co-authored-by: herotai214 <herotai214@gmail.com>
|
2025-07-30 09:18:37 -07:00 |
|
|
|
004203e953
|
[CI/Build] Fix registry tests (#21934)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-30 09:10:41 -07:00 |
|
|
|
5c765aec65
|
[Bugfix] Fix TypeError in scheduler when comparing mixed request_id types (#21816)
Signed-off-by: chiliu <chiliu@paypal.com>
Co-authored-by: chiliu <chiliu@paypal.com>
|
2025-07-30 08:54:44 -07:00 |
|
|
|
ad510309ee
|
Override attention metadata for fast prefill in some KV sharing setups (#21590)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-07-30 08:54:15 -07:00 |
|
|
|
6e599eebe8
|
[Bugfix] Fix OOM tests in initialization test (#21921)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-30 07:35:47 -07:00 |
|
|
|
8f4a1c9a04
|
[Misc] Improve code readability of KVCacheManager (#21673)
Signed-off-by: tanruixiang <tanruixiang0104@gmail.com>
Signed-off-by: Ruixiang Tan <819464715@qq.com>
Signed-off-by: GitHub <noreply@github.com>
|
2025-07-30 07:20:43 -07:00 |
|
|
|
0271c2ff2f
|
[Test] Add Benchmark and Unit Test for per_token_group_quant (#21860)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-30 07:15:02 -07:00 |
|
|
|
547795232d
|
[Tests] Fixing bug inside MultiModalProfiler. (#21842)
Signed-off-by: Varun Shenoy <varun.vinayak.shenoy@oracle.com>
|
2025-07-30 00:44:15 -07:00 |
|
|
|
65f311ce59
|
[Frontend] Add LLM.reward specific to reward models (#21720)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-29 20:56:03 -07:00 |
|
|
|
555e7225bc
|
[v1][attention] Support Hybrid Allocator + FlashInfer (#21412)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-07-30 01:45:29 +00:00 |
|
|
|
58b11b24a6
|
[Bugfix] Fix workspace buffer None issue for Flashinfer TRTLLM Backend (#21525)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-07-29 10:34:00 -04:00 |
|
|
|
04e38500ee
|
[Bugfix] VLLM_V1 supports passing other compilation levels (#19340)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-07-29 09:35:58 -04:00 |
|
|
|
755fa8b657
|
[KVCache] Make KVCacheSpec hashable (#21791)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-07-29 19:58:29 +08:00 |
|
|
|
37efc63b64
|
[V0 deprecation] Guided decoding (#21347)
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-29 03:15:30 -07:00 |
|
|
|
c6c9122d50
|
[Kernel] SM90 CUTLASS FP8 GEMM: add support for swap AB + kernel tuning (#20396)
Signed-off-by: Faqin Zhong <faqin.zhong@gmail.com>
Co-authored-by: Duncan Moss <djm.moss@gmail.com>
|
2025-07-28 23:13:58 +00:00 |
|
|
|
b18b417fbf
|
Revert "[V1] Exception Handling when Loading KV Cache from Remote Store" (#21778)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
|
2025-07-28 20:15:18 +00:00 |
|
|
|
94b71ae106
|
Use metavar to list the choices for a CLI arg when custom values are also accepted (#21760)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-28 19:31:10 +00:00 |
|
|
|
04fe61aa3d
|
[CI/Build] Fix plugin tests (#21758)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-28 15:08:05 +00:00 |
|
|
|
9ace2eaf35
|
[Bugfix] Improve JSON extraction in LlamaToolParser (#19024)
Signed-off-by: keru <keyang.ru@oracle.com>
Co-authored-by: keru <keyang.ru@oracle.com>
|
2025-07-28 12:36:58 +00:00 |
|
|
|
656c24f1b5
|
[Ernie 4.5] Name Change for Base 0.3B Model (#21735)
Signed-off-by: vasqu <antonprogamer@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-28 12:22:32 +00:00 |
|
|
|
a4ed731546
|
[Model] Prioritize Transformers fallback over suffix matching (#21719)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-28 02:15:31 -07:00 |
|
|
|
a6c050286a
|
[v1][mamba] Added mamba_type into MambaSpec (#21715)
Signed-off-by: asafg <asafg@ai21.com>
Co-authored-by: asafg <asafg@ai21.com>
|
2025-07-28 08:15:55 +00:00 |
|
|
|
7656cf4cf3
|
[Bugfix] [issue-21565] Fix the incompatibility issue with stream and named function calling when Thinking is disabled (#21573)
Signed-off-by: wangzi <3220100013@zju.edu.cn>
Co-authored-by: wangzi <3220100013@zju.edu.cn>
|
2025-07-27 22:43:50 -07:00 |
|
|
|
88e46c7c8d
|
Migrate Glm4vImageInputs, Glm4vVideoInputs to TensorSchema (#21678)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-07-27 22:36:08 -07:00 |
|
|
|
c7ffe93d9c
|
[Model] Support TP/PP/mamba2 kernel for PLaMo2 (#19674)
Signed-off-by: Shinichi Hemmi <shemmi@preferred.jp>
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
Co-authored-by: Calvin Metzger <metzger@preferred.jp>
Co-authored-by: Sixue Wang <cecilwang@preferred.jp>
|
2025-07-28 05:00:47 +00:00 |
|
|
|
15a72ac478
|
[V1] Exception Handling when Loading KV Cache from Remote Store (#21534)
Signed-off-by: liuyumoye <adeline_ly2023@outlook.com>
Co-authored-by: liuyumoye <adeline_ly2023@outlook.com>
|
2025-07-27 20:34:17 -07:00 |
|
|
|
86ae693f20
|
[Deprecation][2/N] Replace --task with --runner and --convert (#21470)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-27 19:42:40 -07:00 |
|
|
|
57c22e57f9
|
Fix CUDA permute/unpermute for use with DeepGemm Moe (#17934)
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>
|
2025-07-27 07:08:00 -07:00 |
|
|
|
bda9d0535f
|
[Refactor] Refactor MOE NVFP4 Code Base: ModelOpt + Compressed Tensor (#21631)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-27 05:25:21 -07:00 |
|
|
|
3d847a3125
|
[VLM] Add video support for Intern-S1 (#21671)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-27 11:49:43 +00:00 |
|
|
|
eed2f463b2
|
[VLM] Support HF format Phi-4-MM model (#17121)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-26 20:07:57 -07:00 |
|