572ddf83ce
[Chore] Remove unused sampler in models ( #25324 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-09-20 19:53:20 -07:00
9de25c294b
[CI/Build] Remove redundant LoRA model tests ( #23706 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-08-27 05:51:50 +00:00
285178b3b8
[V0 Deprecation] Remove V0 LoRA test ( #23418 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-08-22 09:56:51 +00:00
1eaff27815
[V0 deprecation] Remove long context LoRA ( #21169 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-07-19 02:15:41 -07:00
906e05d840
[Misc] Remove the unused LoRA test code ( #20494 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-07-05 13:48:16 +08:00
95a6568b5c
[CI/Build] Fix LoRA test ( #19350 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-06-09 09:52:10 +00:00
02f0c7b220
[Misc] Add SPDX-FileCopyrightText ( #19100 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-06-03 11:20:17 -07:00
8132365b74
[Bugfix]: v1 engine - consider lora adapters in allowed_token_ids ( #17855 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2025-05-11 00:53:58 -07:00
c20ef40fd0
[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend ( #14238 )
...
Signed-off-by: Akshat Tripathi <akshat@krai.ai >
Signed-off-by: Chengji Yao <chengjiyao@google.com >
Co-authored-by: Chengji Yao <chengjiyao@google.com >
2025-05-07 16:28:47 -04:00
86c3369eb8
[CI/Build] Fix CI LoRA failure ( #16270 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-09 09:13:56 +08:00
4203926f10
[CI/Build] Further clean up LoRA tests ( #15920 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-02 01:39:09 -07:00
8095341a01
[misc] LoRA: Remove unused long context test data ( #15558 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-27 10:04:51 +08:00
e0fdfa1608
[CI/Build] Delete LoRA bias test ( #14849 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-14 22:09:25 -07:00
12c29a881f
[Bugfix] Further clean up LoRA test ( #14422 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-07 10:30:55 +00:00
ddd1ef66ec
[Bugfix] Fix JambaForCausalLM LoRA ( #14370 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-06 22:05:47 -08:00
cf069aa8aa
Update deprecated Python 3.8 typing ( #13971 )
2025-03-02 17:34:51 -08:00
cc5e8f6db8
[Model] Add LoRA support for TransformersModel ( #13770 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-02 09:17:34 +08:00
105b8ce4c0
[Misc] Reduce LoRA-related static variable ( #13166 )
2025-02-22 00:21:30 -08:00
512368e34a
[Misc] Qwen2.5 VL support LoRA ( #13261 )
2025-02-19 18:37:55 -08:00
467a96a541
[V1] LoRA Support ( #10957 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-02-06 09:32:51 -08:00
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files ( #12628 )
...
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**
commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:18:24 2025 -0500
Add SPDX license headers to python source files
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
also be easily used by tools to help manage license compliance.
The Linux Foundation runs license scans against the codebase to help
ensure
we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
More information can be found on the SPDX site:
- https://spdx.dev/learn/handling-license-info/
Signed-off-by: Russell Bryant <rbryant@redhat.com >
commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:36:32 2025 -0500
Check for SPDX headers using pre-commit
Signed-off-by: Russell Bryant <rbryant@redhat.com >
---------
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-02-02 11:58:18 -08:00
8bddb73512
[Hardware][CPU] Multi-LoRA implementation for the CPU backend ( #11100 )
...
Signed-off-by: Akshat Tripathi <akshat@krai.ai >
Signed-off-by: Oleg Mosalov <oleg@krai.ai >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Oleg Mosalov <oleg@krai.ai >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-01-12 13:01:52 +00:00
55509c2114
[MODEL] LoRA support for Jamba model ( #11209 )
...
Signed-off-by: Erez Schwartz <erezs@ai21.com >
2024-12-27 17:58:21 +00:00
b1b1038fbd
[Bugfix] Fix Qwen2-VL LoRA weight loading ( #11430 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2024-12-24 09:56:10 +00:00
8a06428c70
[LoRA] Adds support for bias in LoRA ( #5733 )
...
Signed-off-by: Umesh Deshpande <udeshpa@us.ibm.com >
Co-authored-by: Umesh Deshpande <udeshpa@us.ibm.com >
2024-11-12 11:08:40 -08:00
cea808f325
[3/N] model runner pass the whole config to model ( #9958 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2024-11-02 12:08:49 -07:00
d11bf435a0
[MISC] Consolidate cleanup() and refactor offline_inference_with_prefix.py ( #9510 )
2024-10-18 14:30:55 -07:00
36ea79079b
[Misc][LoRA] Support loading LoRA weights for target_modules in reg format ( #9275 )
2024-10-11 12:31:21 +00:00
9ade8bbc8d
[Model] add a bunch of supported lora modules for mixtral ( #9008 )
...
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com >
2024-10-04 16:24:40 +00:00
3d49776bbb
[Model][LoRA]LoRA support added for MiniCPMV2.5 ( #7199 )
2024-09-29 06:59:45 +00:00
9d104b5beb
[CI/Build] Update Ruff version ( #8469 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2024-09-18 11:00:56 +00:00
42c7f66a38
[Core] Support dynamically loading Lora adapter from HuggingFace ( #6234 )
...
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com >
2024-07-22 15:42:40 -07:00
f5e73c9f1b
[Lora] Use safetensor keys instead of adapter_config.json to find unexpected modules. ( #5909 )
...
Co-authored-by: sang <sangcho@anyscale.com >
2024-06-30 17:11:15 +00:00
0e9164b40a
[mypy] Enable type checking for test directory ( #5017 )
2024-06-15 04:45:31 +00:00
ea3890a5f0
[Core][Distributed] code deduplication in tp&pp with coordinator( #5293 )
...
[Core][Distributed] add coordinator to reduce code duplication in tp and pp (#5293 )
2024-06-12 17:27:08 -07:00
ccdc490dda
[Core] Change LoRA embedding sharding to support loading methods ( #5038 )
2024-06-06 19:07:57 -07:00
c74c913bfb
[misc] remove comments that were supposed to be removed ( #4977 )
2024-05-22 09:02:58 -04:00
f12c3b5b3d
[Model] Add Phi-2 LoRA support ( #4886 )
2024-05-21 14:24:17 +09:00
2e9a2227ec
[Lora] Support long context lora ( #4787 )
...
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.
It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.
Follow up of https://github.com/vllm-project/vllm/pull/3095/files
2024-05-18 16:05:23 +09:00
d17c8477f1
[Bugfix] Fix LoRA loading check ( #4138 )
...
Co-authored-by: simon-mo <simon.mo@hey.com >
2024-04-19 00:59:54 -07:00
69e1d2fb69
[Core] Refactor model loading code ( #4097 )
2024-04-16 11:34:39 -07:00
1096717ae9
[Core] Support LoRA on quantized models ( #4012 )
2024-04-11 21:02:44 -07:00
63e7176f26
[Core][Refactor] move parallel_utils into vllm/distributed ( #3950 )
...
[WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950 )
2024-04-10 15:33:30 -07:00
8af890a865
Enable more models to inference based on LoRA ( #3382 )
...
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com >
2024-03-25 18:09:31 -07:00
01bfb22b41
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
f1c0fc3919
Migrate logits computation and gather to model_runner ( #3233 )
2024-03-20 23:25:01 +00:00
4c922709b6
Add distributed model executor abstraction ( #3191 )
2024-03-11 11:03:45 -07:00
929b4f2973
Add LoRA support for Gemma ( #3050 )
2024-02-28 13:03:28 -08:00
3b7178cfa4
[Neuron] Support inference with transformers-neuronx ( #2569 )
2024-02-28 09:34:34 -08:00
2a543d6efe
Add LoRA support for Mixtral ( #2831 )
...
* add mixtral lora support
* formatting
* fix incorrectly ported logic
* polish tests
* minor fixes and refactoring
* minor fixes
* formatting
* rename and remove redundant logic
* refactoring
* refactoring
* minor fix
* minor refactoring
* fix code smell
2024-02-14 00:55:45 +01:00