273690a50a
[Core] Optimize LoRA weight loading ( #25403 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-09-23 18:19:45 +08:00
5f5271f1ee
Move LoRAConfig from config/__init__.py to config/lora.py ( #24644 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-09-11 11:01:38 +00:00
bb3eb80d92
[Core] Split LoRA layers ( #24574 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-09-10 07:47:51 -07:00
886ccbe5ba
[CI/Build] Reduce the number of redundant cases to test for LoRA ( #24276 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
2025-09-04 21:58:44 +00:00
e03940762b
[CI/Build] Reduce LoRA layer test cases ( #23721 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-08-27 10:59:35 +00:00
02f0c7b220
[Misc] Add SPDX-FileCopyrightText ( #19100 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-06-03 11:20:17 -07:00
86c3369eb8
[CI/Build] Fix CI LoRA failure ( #16270 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-09 09:13:56 +08:00
4203926f10
[CI/Build] Further clean up LoRA tests ( #15920 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-02 01:39:09 -07:00
dfa82e2a3d
[CI/Build] Clean up LoRA tests ( #15867 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-01 16:28:50 +00:00
79455cf421
[Misc] Enable V1 LoRA by default ( #15320 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-04-01 16:53:56 +08:00
5ff0d32580
[V1] LoRA - Add triton kernels for V1 ( #13096 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-10 17:27:53 -04:00
ddd1ef66ec
[Bugfix] Fix JambaForCausalLM LoRA ( #14370 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-06 22:05:47 -08:00
cf069aa8aa
Update deprecated Python 3.8 typing ( #13971 )
2025-03-02 17:34:51 -08:00
5157338ed9
[Misc] Improve LoRA spelling ( #13831 )
2025-02-25 23:43:01 -08:00
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files ( #12628 )
...
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**
commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:18:24 2025 -0500
Add SPDX license headers to python source files
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
also be easily used by tools to help manage license compliance.
The Linux Foundation runs license scans against the codebase to help
ensure
we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
More information can be found on the SPDX site:
- https://spdx.dev/learn/handling-license-info/
Signed-off-by: Russell Bryant <rbryant@redhat.com >
commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:36:32 2025 -0500
Check for SPDX headers using pre-commit
Signed-off-by: Russell Bryant <rbryant@redhat.com >
---------
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-02-02 11:58:18 -08:00
8bddb73512
[Hardware][CPU] Multi-LoRA implementation for the CPU backend ( #11100 )
...
Signed-off-by: Akshat Tripathi <akshat@krai.ai >
Signed-off-by: Oleg Mosalov <oleg@krai.ai >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Oleg Mosalov <oleg@krai.ai >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-01-12 13:01:52 +00:00
ca871491ed
[Misc][LoRA] Abstract PunicaWrapper ( #10955 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2024-12-09 12:54:44 -08:00
571da8fc43
[Misc][LoRA] Clean up the function interface of Punica ( #10917 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2024-12-05 13:22:28 +00:00
7f5edb5900
[Misc][LoRA] Replace hardcoded cuda device with configurable argument ( #10223 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2024-11-12 11:10:15 +08:00
622b7ab955
[Hardware] using current_platform.seed_everything ( #9785 )
...
Signed-off-by: wangshuai09 <391746016@qq.com >
2024-10-29 14:47:44 +00:00
7e7eae338d
[Misc] Standardize RoPE handling for Qwen2-VL ( #9250 )
2024-10-16 13:56:17 +08:00
6ffa3f314c
[CI/Build] Avoid CUDA initialization ( #8534 )
2024-09-18 10:38:11 +00:00
50b8d08dbd
[Misc/Testing] Use torch.testing.assert_close ( #7324 )
2024-08-16 04:24:04 +00:00
9118217f58
[LoRA] Relax LoRA condition ( #7146 )
2024-08-06 01:57:25 +00:00
99d7cabd7b
[LoRA] ReplicatedLinear support LoRA ( #7081 )
2024-08-02 22:40:19 -07:00
7ecee34321
[Kernel][RFC] Refactor the punica kernel based on Triton ( #5036 )
2024-07-31 17:12:24 -07:00
ee93f4f92a
[CORE] Quantized lm-head Framework ( #4442 )
...
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com >
Co-authored-by: ZX <zx@lbx.dev >
2024-07-02 22:25:17 +00:00
67005a07bc
[Bugfix] Add fully sharded layer for QKVParallelLinearWithLora ( #5665 )
...
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com >
2024-06-21 04:46:28 +00:00
0e9164b40a
[mypy] Enable type checking for test directory ( #5017 )
2024-06-15 04:45:31 +00:00
ccdc490dda
[Core] Change LoRA embedding sharding to support loading methods ( #5038 )
2024-06-06 19:07:57 -07:00
2e9a2227ec
[Lora] Support long context lora ( #4787 )
...
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.
It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.
Follow up of https://github.com/vllm-project/vllm/pull/3095/files
2024-05-18 16:05:23 +09:00
eefeb16464
[Kernel] Full Tensor Parallelism for LoRA Layers ( #3524 )
...
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com >
2024-04-27 00:03:48 -07:00
468d761b32
[Misc] Reduce supported Punica dtypes ( #4304 )
2024-04-23 18:54:33 -07:00
1e96c3341a
Add extra punica sizes to support bigger vocabs ( #4015 )
2024-04-11 22:18:57 +00:00
8af890a865
Enable more models to inference based on LoRA ( #3382 )
...
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com >
2024-03-25 18:09:31 -07:00
01bfb22b41
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
f1c0fc3919
Migrate logits computation and gather to model_runner ( #3233 )
2024-03-20 23:25:01 +00:00
2f8844ba08
Re-enable the 80 char line width limit ( #3305 )
2024-03-10 19:49:14 -07:00
93dc5a2870
chore(vllm): codespell for spell checking ( #2820 )
2024-02-21 18:56:01 -08:00
96b6f475dd
Remove hardcoded device="cuda" to support more devices ( #2503 )
...
Co-authored-by: Jiang Li <jiang1.li@intel.com >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
2024-02-01 15:46:39 -08:00
9b945daaf1
[Experimental] Add multi-LoRA support ( #1804 )
...
Co-authored-by: Chen Shen <scv119@gmail.com >
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com >
Co-authored-by: Avnish Narayan <avnish@anyscale.com >
2024-01-23 15:26:37 -08:00