youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Woosuk Kwon	71683ca6f6	[V0 Deprecation] Remove multi-step scheduling (#22138 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-08-12 20:18:39 -07:00
Ning Xie	d97841078b	[Misc] unify variable for LLM instance (#20996 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-07-21 12:18:33 +01:00
Woosuk Kwon	dd572c0ab3	[V0 Deprecation] Remove V0 Spec Decode workers (#21152 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-07-18 21:47:50 -07:00
quanliu	92183b41f3	[Bugfix][Core] Prefix caching causes incorrect outputs due to outdated ComputedBlocksTracker (#18957 ) Signed-off-by: 刘全 <quan.liu2@dbappsecurity.com.cn> Co-authored-by: 刘全 <quan.liu2@dbappsecurity.com.cn>	2025-06-15 21:56:37 -07:00
Ning Xie	2f1c19b245	[CI] change spell checker from codespell to typos (#18711 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-11 19:57:10 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Andrew Sansom	cc2a77d7f1	[Core] [Bugfix] Add Input Embeddings (#15428 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: 临景 <linjing.yx@alibaba-inc.com> Co-authored-by: Bryce1010 <bryceyx@gmail.com> Co-authored-by: Nan2018 <nan@protopia.ai> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 01:06:39 -07:00
Aleksandr Malyshev	bc7c4d206b	[Kernel][ROCM] Upstream prefix prefill speed up for vLLM V1 (#13305 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com> Signed-off-by: maleksan85 <maleksan@amd.com> Signed-off-by: <> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: qli88 <qiang.li2@amd.com> Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>	2025-04-22 19:11:56 -07:00
Chen Zhang	3a5f0afcd2	[V1] Implement sliding window attention in kv_cache_manager (#14097 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-04-01 00:33:17 -07:00
Robert Shaw	d4d93db2c5	[V1] V1 Enablement Oracle (#13726 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-03-14 22:02:20 -07:00
Vincent	a4f1ee35d6	Deprecate `best_of` Sampling Parameter in anticipation for vLLM V1 (#13997 ) Signed-off-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com> Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-05 20:22:43 +00:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
Sage Moore	38acae6e97	[ROCm] Fix the Kernels, Core, and Prefix Caching AMD CI groups (#13970 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-02-27 20:31:47 +00:00
Jee Jee Li	5157338ed9	[Misc] Improve LoRA spelling (#13831 )	2025-02-25 23:43:01 -08:00
Joe Runde	3bcb8c75da	[Core] Reduce TTFT with concurrent partial prefills (#10235 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Prashant Gupta <prashantgupta@us.ibm.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>	2025-02-14 15:36:07 -08:00
Russell Bryant	73b35cca7f	[Core] Improve hash collision avoidance in prefix caching (#12621 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-03 16:28:20 -08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Cody Yu	7206ce4ce1	[Core] Support `reset_prefix_cache` (#12284 )	2025-01-22 18:52:27 +00:00
Sungjae Lee	c31d4a57a6	[Core] support LoRA and prompt adapter in content-based hashing for Block Manager v2 prefix caching (#8240 )	2024-12-13 07:51:25 -08:00
Cyrus Leung	8f10d5e393	[Misc] Split up pooling tasks (#10820 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-11 01:28:00 -08:00
Murali Andoorveedu	db66e018ea	[Bugfix] Fix for Spec model TP + Chunked Prefill (#10232 ) Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com> Signed-off-by: Sourashis Roy <sroy@roblox.com> Co-authored-by: Sourashis Roy <sroy@roblox.com>	2024-11-26 09:11:16 -08:00
Ricky Xu	4634a89d18	Prefix Cache Aware Scheduling [1/n] (#10128 ) Signed-off-by: rickyx <rickyx@anyscale.com>	2024-11-22 21:15:55 -08:00
Pavani Majety	6c1208d083	[Core] Add Sliding Window Support with Flashinfer (#10462 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2024-11-20 19:56:47 -08:00
Aaron Pham	21063c11c7	[CI/Build] drop support for Python 3.8 EOL (#8464 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2024-11-06 07:11:55 +00:00
Cyrus Leung	bbc3619dc8	[Core] Make encoder-decoder inputs a nested structure to be more composable (#9604 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-11-05 10:07:31 +08:00
youkaichao	4fdc581f9e	[core] simplify seq group code (#9569 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2024-10-24 00:16:44 -07:00
Cody Yu	d11bf435a0	[MISC] Consolidate cleanup() and refactor offline_inference_with_prefix.py (#9510 )	2024-10-18 14:30:55 -07:00
Cyrus Leung	051eaf6db3	[Model] Add user-configurable task for models that support both generation and embedding (#9424 )	2024-10-18 11:31:58 -07:00
Kuntai Du	81ede99ca4	[Core] Deprecating block manager v1 and make block manager v2 default (#8704 ) Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).	2024-10-17 11:38:15 -05:00
sroy745	f3a507f1d3	[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149 )	2024-10-10 14:17:17 +08:00
youkaichao	18b296fdb2	[core] remove beam search from the core (#9105 )	2024-10-07 05:47:04 +00:00
Varun Sundar Rabindranath	cb3b2b9ba4	[Bugfix] Fix incorrect updates to num_computed_tokens in multi-step scheduling (#9038 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-10-06 12:48:11 -07:00
sroy745	5bf8789b2a	[Bugfix] Block manager v2 with preemption and lookahead slots (#8824 )	2024-09-29 09:17:45 +08:00
sroy745	fc3afc20df	Fix tests in test_chunked_prefill_scheduler which fail with BlockManager V2 (#8752 )	2024-09-24 21:26:36 -07:00
sroy745	ee777d9c30	Fix test_schedule_swapped_simple in test_scheduler.py (#8780 )	2024-09-24 21:26:18 -07:00
sroy745	88577ac928	Fix tests in test_scheduler.py that fail with BlockManager V2 (#8728 )	2024-09-24 04:43:13 +00:00
Cody Yu	e3580537a4	[Performance] Enable chunked prefill and prefix caching together (#7753 )	2024-08-28 00:36:31 -07:00
Megha Agarwal	2eedede875	[Core] Asynchronous Output Processor (#7049 ) Co-authored-by: Alexander Matveev <alexm@neuralmagic.com>	2024-08-26 20:53:20 -07:00
Cody Yu	2deb029d11	[Performance][BlockManagerV2] Mark prefix cache block as computed after schedule (#7822 )	2024-08-26 11:24:53 -07:00
Cody Yu	3ac50b47d0	[MISC] Add prefix cache hit rate to metrics (#7606 )	2024-08-19 11:52:07 -07:00
SangBin Cho	ff7ec82c4d	[Core] Optimize SPMD architecture with delta + serialization optimization (#7109 )	2024-08-18 17:57:20 -07:00
Cade Daniel	baa240252e	[Core] Fix edge case in chunked prefill + block manager v2 (#7380 )	2024-08-09 23:48:49 +00:00
Zach Zheng	782e53ab59	[Bugfix][fast] Fix the get_num_blocks_touched logic (#6849 )	2024-08-08 10:43:30 -07:00
afeldman-nm	fd95e026e0	[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942 ) Co-authored-by: Andrew Feldman <afeld2012@gmail.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-08-06 16:51:47 -04:00
youkaichao	c8a7e93273	[core][scheduler] simplify and improve scheduler (#6867 )	2024-07-31 23:51:09 -07:00
Jiaxin Shan	42c7f66a38	[Core] Support dynamically loading Lora adapter from HuggingFace (#6234 ) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2024-07-22 15:42:40 -07:00
Antoni Baum	9ed82e7074	[Misc] Small perf improvements (#6520 )	2024-07-19 12:10:56 -07:00
Alexander Matveev	3476ed0809	[Core] Optimize block_manager_v2 vs block_manager_v1 (to make V2 default) (#5602 )	2024-07-01 20:10:37 -07:00
Cyrus Leung	0e9164b40a	[mypy] Enable type checking for test directory (#5017 )	2024-06-15 04:45:31 +00:00
leiwen83	1b8a0d71cf	[Core][Bugfix]: fix prefix caching for blockv2 (#5364 ) Signed-off-by: Lei Wen <wenlei03@qiyi.com> Co-authored-by: Lei Wen <wenlei03@qiyi.com>	2024-06-14 17:23:56 -07:00

1 2

84 Commits