|
|
001e50c92c
|
[Model] MTP fallback to eager for DeepSeek v32 (#25982)
Signed-off-by: Lu Fang <fanglu@fb.com>
|
2025-10-01 01:53:22 +00:00 |
|
|
|
9a9f48dff7
|
[V1] [P/D] Add Support for KV Load Failure Recovery (#19330)
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
|
2025-09-30 14:57:08 -07:00 |
|
|
|
bc546f76a1
|
[CI] Move applicable tests to CPU (#24080)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-30 14:45:20 +01:00 |
|
|
|
80608ba5af
|
[NIXL] Add support for MLA caches with different latent dim (#25902)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2025-09-30 12:18:29 +00:00 |
|
|
|
fa7e254a7f
|
[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
|
2025-09-30 17:14:41 +08:00 |
|
|
|
e23cacda35
|
[Bugfix]: Clean up chunked prefill logging when using whisper (#25075)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
|
2025-09-30 08:17:49 +00:00 |
|
|
|
d0d138bc55
|
[Nixl][P/D] Add cuda2cpu support (HD->DH transfer) (#24690)
Signed-off-by: Chenxi Yang <cxyang@fb.com>
Co-authored-by: Chenxi Yang <cxyang@fb.com>
|
2025-09-29 14:31:51 +00:00 |
|
|
|
cd87bfbf37
|
[CI/Build] Reorganize root-level V1 tests (#25767)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-27 13:51:15 +08:00 |
|
|
|
f1d53d150c
|
[Multimodal][Speculative Decoding]Eagle Eagle3 mm support, enablement on qwen2.5vl (#22872)
Signed-off-by: Junhong <liujunhong11@huawei.com>
Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com>
Co-authored-by: Junhong <liujunhong11@huawei.com>
Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>
|
2025-09-27 03:35:47 +00:00 |
|
|
|
6f5c0931c1
|
[Spec decode] automatically disable mm for text-only draft models (#25667)
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
|
2025-09-27 08:10:21 +08:00 |
|
|
|
dc48ba0c75
|
Kernel-override Determinism [1/n] (#25603)
Signed-off-by: Bram Wasti <bwasti@meta.com>
|
2025-09-26 16:59:09 -07:00 |
|
|
|
c70ac4b8ff
|
[spec decode] Consolidate speculative decode method name for MTP (#25232)
Signed-off-by: zixi-qi <qizixi@meta.com>
|
2025-09-26 22:27:05 +00:00 |
|
|
|
f075693da7
|
[V1] address post issues related to #20059 (part 1) (#23046)
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-26 15:58:19 -04:00 |
|
|
|
8d52f2b3a7
|
[ray][metrics] Replace ':' with '_' for OpenTelemetry compatibility in Ray (#25439)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Signed-off-by: Seiji Eicher <58963096+eicherseiji@users.noreply.github.com>
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com>
|
2025-09-26 09:43:30 -07:00 |
|
|
|
db1e42f627
|
[CI/Build] Fix some V1 tests not being run (#25569)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-26 20:52:36 +08:00 |
|
|
|
fe6b19c314
|
[Bugfix] Properly abort pooling request. (#25734)
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-26 05:47:34 -07:00 |
|
|
|
2827b3f4a3
|
[CI] Fix test_shared_storage_connector_hashes (#25748)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-09-26 20:46:17 +08:00 |
|
|
|
e71b8e210d
|
[Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. (#24986)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-09-25 15:22:03 -07:00 |
|
|
|
3468f17ebe
|
[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
|
2025-09-25 17:37:50 +00:00 |
|
|
|
2f17117606
|
[mypy] Fix wrong type annotations related to tuple (#25660)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-25 13:00:45 +00:00 |
|
|
|
58c360d9be
|
[Bug] fix import and unit test (#25558)
Signed-off-by: Jonas M. Kübler <44084297+jmkuebler@users.noreply.github.com>
|
2025-09-24 10:17:59 +00:00 |
|
|
|
190c45a6af
|
[TPU][Bugfix] fix the missing apply_model in tpu worker (#25526)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-09-24 05:18:08 +00:00 |
|
|
|
c30b405b8f
|
[Spec Decode] Enable FlashInfer Spec Decoding (#25196)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Co-authored-by: lhsjohn <huashuoli@tencent.com>
|
2025-09-23 22:29:58 -04:00 |
|
|
|
7ad5e50adf
|
Improve output when failing json.loads() on structured output test (#25483)
Signed-off-by: dougbtv <dosmith@redhat.com>
|
2025-09-23 18:03:31 -06:00 |
|
|
|
abad204be6
|
[BugFix] Fix OOM in vLLM replicas by ensuring consistent NCCL memory accounting (#25359)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
2025-09-23 15:49:09 -07:00 |
|
|
|
4f8c4b890a
|
[Core] Use KVCacheBlock as much as possible instead of dict[block_id, KVCacheBlock] (#24830)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-09-23 15:11:14 -07:00 |
|
|
|
d5944d5146
|
[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue (#25406)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
|
2025-09-23 15:44:35 -04:00 |
|
|
|
875d6def90
|
Add backward compatibility for GuidedDecodingParams (#25422)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-23 17:07:30 +01:00 |
|
|
|
cc1dc7ed6d
|
[Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support (#24845)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-09-23 16:02:10 +00:00 |
|
|
|
da5e7e4329
|
[Docs] NixlConnector quickstart guide (#24249)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
|
2025-09-23 14:23:22 +00:00 |
|
|
|
f05a4f0e34
|
[P/D] Support NIXL connector to disconnect during a clean shutdown (#24423)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
|
2025-09-23 16:08:02 +02:00 |
|
|
|
5774b0a1da
|
[NIXL][OOT platform] support nixl_connector with oot platform and other nixl_backend (#25121)
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
|
2025-09-23 04:17:42 +00:00 |
|
|
|
6fa78d8f23
|
[V0 deprecation] Remove platform v1 controling interface (#25410)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-23 01:48:12 +00:00 |
|
|
|
090197034f
|
[Bugfix] Fix missing clear_connector_metadata (#25397)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-23 08:10:59 +08:00 |
|
|
|
8db2939289
|
[KV offload][5/N] Add CPUOffloadingSpec (#24251)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-22 12:30:36 -07:00 |
|
|
|
cfbee3d0e7
|
[CLI env var] Add VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH in env variables (#25274)
Signed-off-by: qqma <qqma@amazon.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: qqma <qqma@amazon.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-22 10:37:43 -07:00 |
|
|
|
4cf71cc88a
|
[TPU] Deprecate xm.mark_step in favor of `torch_xla.sync (#25254)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-09-22 10:12:57 +00:00 |
|
|
|
0ff8ebb2d7
|
[V0 Deprecation] Remove async_output_proc, preemption mode, delay factor (#25334)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-21 08:52:32 -07:00 |
|
|
|
26e673fe93
|
[V0 Deprecation] Remove V0 Sequence class & Sampler (#25332)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-09-21 08:52:15 -07:00 |
|
|
|
cf56cf78b4
|
[V1] Add sliding window support to Flex Attention backend (#24089)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-21 05:08:07 +00:00 |
|
|
|
9607d5eb44
|
[Hybrid Allocator] Support full attention with different hidden size (#25101)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-09-19 23:43:59 -07:00 |
|
|
|
535d80056b
|
[Misc] Support more collective_rpc return types (#25294)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-09-20 02:02:38 +00:00 |
|
|
|
c59a0eca42
|
[KV offload][4/N] Offloading KV connector (#22595)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-19 19:07:17 +00:00 |
|
|
|
2506ce5189
|
[Core][Prefix Hash] Fix prefix hash metrics sliding window maintainance (#24990)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-09-19 12:22:53 -06:00 |
|
|
|
6c117cff7d
|
[Frontend] Pass API server count to each process (#23717)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-20 01:15:19 +08:00 |
|
|
|
7ac67ea525
|
[KV offload][3/N] Add worker-side CPU support (#21448)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-19 09:53:45 -07:00 |
|
|
|
aed16879a9
|
Move ModelConfig from config/__init__.py to config/model.py (#25252)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-19 16:22:33 +00:00 |
|
|
|
a3d087adec
|
[P/D][Nixl] Introduce KVTransferMetrics and aggregation strategy (#22188)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-09-19 11:09:14 +00:00 |
|
|
|
9d1c50a5ac
|
[KV offload][2/N] Introduce LRU-based CPU offloading management (#20075)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-19 00:20:51 +00:00 |
|
|
|
a53ad626d6
|
[KV offload][1b/N] rename offloading to kv_offload (#25191)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-09-18 20:53:52 +00:00 |
|