aa48e502fb
[MISC] Upgrade dependency to PyTorch 2.3.1 ( #5327 )
2024-07-12 12:04:26 -07:00
9d47f64eb6
[CI/Build] [3/3] Reorganize entrypoints tests ( #5966 )
2024-06-30 12:58:49 +08:00
4ad7b53e59
[CI/Build][Misc] Update Pytest Marker for VLMs ( #5623 )
2024-06-18 13:10:04 +00:00
89c920785f
[CI/Build] Update vision tests ( #5307 )
2024-06-06 05:17:18 -05:00
260d119e86
[Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU ( #5137 )
2024-06-01 06:45:32 +00:00
5ae5ed1e60
[Core] Consolidate prompt arguments to LLM engines ( #4328 )
...
Co-authored-by: Roger Wang <ywang@roblox.com >
2024-05-28 13:29:31 -07:00
757b62c495
[CI/Build] Codespell ignore build/ directory ( #4945 )
2024-05-21 09:06:10 -07:00
2e9a2227ec
[Lora] Support long context lora ( #4787 )
...
Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through.
It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors.
Follow up of https://github.com/vllm-project/vllm/pull/3095/files
2024-05-18 16:05:23 +09:00
d627a3d837
[Misc] Upgrade to torch==2.3.0 ( #4454 )
2024-04-29 20:05:47 -04:00
a88081bf76
[CI] Disable non-lazy string operation on logging ( #4326 )
...
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com >
2024-04-26 00:16:58 -07:00
0ae11f78ab
[Mypy] Part 3 fix typing for nested directories for most of directory ( #4161 )
2024-04-22 21:32:44 -07:00
09473ee41c
[mypy] Add mypy type annotation part 1 ( #4006 )
2024-04-12 14:35:50 -07:00
ca81ff5196
[Core] manage nccl via a pypi package & upgrade to pt 2.2.1 ( #3805 )
2024-04-04 10:26:19 -07:00
2ff767b513
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) ( #3290 )
...
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Co-authored-by: HaiShaw <hixiao@gmail.com >
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com >
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com >
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu >
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com >
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com >
Co-authored-by: guofangze <guofangze@kuaishou.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2024-04-03 14:15:55 -07:00
45b6ef6513
feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark ( #3277 )
2024-03-27 13:39:26 -07:00
01bfb22b41
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
9fdf3de346
Cmake based build system ( #2830 )
2024-03-18 15:38:33 -07:00
14e3f9a1b2
Replace lstrip() with removeprefix() to fix Ruff linter warning ( #2958 )
2024-03-15 21:01:30 -07:00
2f8844ba08
Re-enable the 80 char line width limit ( #3305 )
2024-03-10 19:49:14 -07:00
93dc5a2870
chore(vllm): codespell for spell checking ( #2820 )
2024-02-21 18:56:01 -08:00
b0a1d667b0
Pin PyTorch & xformers versions ( #2155 )
2023-12-17 01:46:54 -08:00
f3e024bece
[CI/CD] Upgrade PyTorch version to v2.1.1 ( #2045 )
2023-12-11 17:48:11 -08:00
f07c1ceaa5
[FIX] Fix docker build error ( #1831 ) ( #1832 )
...
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com >
2023-11-29 23:06:50 -08:00
5ffc0d13a2
Migrate linter from pylint to ruff ( #1665 )
2023-11-20 11:58:01 -08:00
06458a0b42
Upgrade to CUDA 12 ( #1527 )
...
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2023-11-08 14:17:49 -08:00
6a6119554c
lock torch version to 2.0.1 ( #1290 )
2023-10-10 09:21:57 -07:00
376725ce74
[PyPI] Packaging for PyPI distribution ( #140 )
2023-06-05 20:03:14 -07:00