|
|
bc546f76a1
|
[CI] Move applicable tests to CPU (#24080)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-30 14:45:20 +01:00 |
|
|
|
fa7e254a7f
|
[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
|
2025-09-30 17:14:41 +08:00 |
|
|
|
e23cacda35
|
[Bugfix]: Clean up chunked prefill logging when using whisper (#25075)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
|
2025-09-30 08:17:49 +00:00 |
|
|
|
cd87bfbf37
|
[CI/Build] Reorganize root-level V1 tests (#25767)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-27 13:51:15 +08:00 |
|
|
|
4f8c4b890a
|
[Core] Use KVCacheBlock as much as possible instead of dict[block_id, KVCacheBlock] (#24830)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-09-23 15:11:14 -07:00 |
|
|
|
9607d5eb44
|
[Hybrid Allocator] Support full attention with different hidden size (#25101)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-09-19 23:43:59 -07:00 |
|
|
|
2506ce5189
|
[Core][Prefix Hash] Fix prefix hash metrics sliding window maintainance (#24990)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-09-19 12:22:53 -06:00 |
|
|
|
29283e8976
|
[Chore] Cleanup guided namespace, move to structured outputs config (#22772)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-18 09:20:27 +00:00 |
|
|
|
45bfa49cb8
|
[Tests] fix initialization of kv hash in tests (#24273)
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
|
2025-09-15 21:48:27 +00:00 |
|
|
|
bc0f6059a2
|
[UT] enhance free kv cache block queue popleft_n (#24220)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-09-15 10:04:37 +00:00 |
|
|
|
3f3313981c
|
[kv cache] update num_free_blocks in the end (#24228)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-09-15 05:15:12 +00:00 |
|
|
|
8e5cdcda4e
|
[Hybrid Allocator] Support Pipeline Parallel (#23974)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-09-14 15:55:17 -07:00 |
|
|
|
0377802c20
|
[Multimodal] Remove legacy multimodal fields in favor of MultiModalFeatureSpec (#24548)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2025-09-12 21:42:23 +08:00 |
|
|
|
82dfb12e52
|
[Core] Use sha256 bytes instead of BlockHash to reduce GC overhead (#23673)
Signed-off-by: linzebing <linzebing1995@gmail.com>
|
2025-09-08 21:34:37 -07:00 |
|
|
|
fad73be1a5
|
[Doc]: fix typos in Python comments (#24077)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-02 02:38:55 -07:00 |
|
|
|
5490d633ce
|
[UT] fix unify_kv_cache_configs when kv cache config needs sort (#23843)
|
2025-08-30 11:22:14 +00:00 |
|
|
|
69f46359dd
|
[Multimodal] Consolidate mm inputs into MultiModalFeatureSpec (#23779)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2025-08-29 18:36:57 +08:00 |
|
|
|
5da4f5d857
|
[Bugfix] Fix for V1 priority scheduling crashes at preemption (#23713)
Signed-off-by: Hanchenli <lihanc2002@gmail.com>
|
2025-08-28 00:44:52 +00:00 |
|
|
|
b5d34af328
|
[Bugfix] Fix scheduling when repeated images in one request (#23544)
Signed-off-by: Roger Wang <hey@rogerw.me>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
|
2025-08-26 09:46:28 +00:00 |
|
|
|
d765cf01fe
|
[Core][Multimodal] Track encode cache entries by mm_hash and enable embedding sharing between requests (#22711)
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-08-25 00:41:17 -07:00 |
|
|
|
c9b38be8aa
|
[Spec Decode] Make propose_draft_token_ids non-blocking for lower TTFT (#23041)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-18 17:20:38 -07:00 |
|
|
|
4dff91c93d
|
[Refactor] Allow optional MultiModalKwargsItem in IPC (#23022)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-16 11:30:49 +00:00 |
|
|
|
c280066f9d
|
[v1] Move block_hashes from KVCacheManager to Request.block_hashes (#19728)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-08-15 16:52:52 -07:00 |
|
|
|
19b927e52d
|
[Core] Use individual MM items in P0/P1 cache and model runner (#22570)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-13 07:18:07 -07:00 |
|
|
|
7175817637
|
Revert "[Bugfix] V1 Fix the cursor leakage issue during request scheduling." (#22223)
|
2025-08-04 18:37:06 -07:00 |
|
|
|
2dffac464c
|
[Bugfix] V1 Fix the cursor leakage issue during request scheduling. (#21173)
Signed-off-by: CLFutureX <775523362@qq.com>
|
2025-08-04 18:34:10 -07:00 |
|
|
|
8f4a1c9a04
|
[Misc] Improve code readability of KVCacheManager (#21673)
Signed-off-by: tanruixiang <tanruixiang0104@gmail.com>
Signed-off-by: Ruixiang Tan <819464715@qq.com>
Signed-off-by: GitHub <noreply@github.com>
|
2025-07-30 07:20:43 -07:00 |
|
|
|
755fa8b657
|
[KVCache] Make KVCacheSpec hashable (#21791)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-07-29 19:58:29 +08:00 |
|
|
|
86ae693f20
|
[Deprecation][2/N] Replace --task with --runner and --convert (#21470)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-27 19:42:40 -07:00 |
|
|
|
fc5f756db4
|
[v1][Core] Clean up usages of SpecializedManager (#21407)
Signed-off-by: Zhou Fang <fang.github@gmail.com>
|
2025-07-24 00:40:11 -07:00 |
|
|
|
a1f3610fc6
|
[Core] Add basic unit test for maybe_evict_cached_block (#21400)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-07-23 00:02:02 -07:00 |
|
|
|
ed25054577
|
[Core] Introduce popleft_n and append_n in FreeKVCacheBlockQueue to further optimize block_pool (#21222)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-07-22 06:17:47 -07:00 |
|
|
|
d97841078b
|
[Misc] unify variable for LLM instance (#20996)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-07-21 12:18:33 +01:00 |
|
|
|
9a9fda1423
|
[Core] Support Local Chunked Attention for Hybrid KV Cache (#19351)
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Lu Fang <fanglu@meta.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lu Fang <fanglu@meta.com>
|
2025-07-18 20:48:38 -07:00 |
|
|
|
0f199f197b
|
[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue (#21005)
Signed-off-by: Jialin Ouyang <jialino@meta.com>
|
2025-07-18 12:34:40 -07:00 |
|
|
|
d4d309409f
|
Implement Async Scheduling (#19970)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-14 23:01:46 -07:00 |
|
|
|
66f6fbd393
|
[Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) (#20511)
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
|
2025-07-14 02:45:31 +00:00 |
|
|
|
f45a332886
|
[Sched] Enhance the logic to remove stopped requests from queues (#20739)
|
2025-07-12 15:33:13 -07:00 |
|
|
|
4a98edff1f
|
[Structured Outputs][V1] Skipping with models doesn't contain tokenizers (#20365)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-07-04 15:05:49 +08:00 |
|
|
|
2863befce3
|
[Optimization] Use Shared CachedRequestData Instance Across All Requests (#20232)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-06-30 09:07:50 -07:00 |
|
|
|
4a0f7888a3
|
[Core] feat: Implement Priority Scheduling in V1 Engine (#19057)
Signed-off-by: amit <amit.man@gmail.com>
Co-authored-by: Roger Wang <Rogerw0108@gmail.com>
|
2025-06-22 20:18:08 -07:00 |
|
|
|
799397ee4f
|
Support embedding models in V1 (#16188)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-06-18 21:36:33 -07:00 |
|
|
|
c9280e6346
|
[Bugfix] Respect num-gpu-blocks-override in v1 (#19503)
Signed-off-by: Jon Swenson <jmswen@gmail.com>
|
2025-06-12 11:00:23 +00:00 |
|
|
|
646d62f636
|
[Core] Use tuple for kv cache group block ids (#19175)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-06-10 07:01:17 +02:00 |
|
|
|
f8a1a2d108
|
[v1] Hybrid Memory Allocator (#17996)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-05 20:47:09 -07:00 |
|
|
|
a8da78eac9
|
[Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers (#19029)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-04 00:14:06 +00:00 |
|
|
|
b5fd9506c1
|
[Bugfix] get_num_blocks_to_allocate with null_block (#19031)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-03 15:30:55 -07:00 |
|
|
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
|
|
f32fcd9444
|
[v1][KVCacheManager] Rename BlockHashType to BlockHash (#19015)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-06-03 08:01:48 +00:00 |
|
|
|
2dbe8c0774
|
[Perf] API-server scaleout with many-to-many server-engine comms (#17546)
|
2025-05-30 08:17:00 -07:00 |
|