|
|
b6d7392579
|
[Misc][CI/Build] Include cv2 via mistral_common[opencv] (#8951)
|
2024-09-30 04:28:26 +00:00 |
|
|
|
260024a374
|
[Bugfix][Intel] Fix XPU Dockerfile Build (#7824)
Signed-off-by: tylertitsworth <tyler.titsworth@intel.com>
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-09-27 23:45:50 -07:00 |
|
|
|
2467b642dd
|
[CI/Build] fix setuptools-scm usage (#8771)
|
2024-09-24 12:38:12 -07:00 |
|
|
|
ee5f34b1c2
|
[CI/Build] use setuptools-scm to set __version__ (#4738)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-09-23 09:44:26 -07:00 |
|
|
|
0e40ac9b7b
|
[ci][build] fix vllm-flash-attn (#8699)
|
2024-09-21 23:24:58 -07:00 |
|
|
|
71c60491f2
|
[Kernel] Build flash-attn from source (#8245)
|
2024-09-20 23:27:10 -07:00 |
|
|
|
5ce45eb54d
|
[misc] small qol fixes for release process (#8517)
|
2024-09-16 15:11:27 -07:00 |
|
|
|
1ef0d2efd0
|
[Kernel][Hardware][Amd]Custom paged attention kernel for rocm (#8310)
|
2024-09-13 17:01:11 -07:00 |
|
|
|
6a512a00df
|
[model] Support for Llava-Next-Video model (#7559)
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-09-10 22:21:36 -07:00 |
|
|
|
6234385f4a
|
[CI/Build] enable ccache/scccache for HIP builds (#8327)
|
2024-09-10 08:55:08 -07:00 |
|
|
|
c02638efb3
|
[CI/Build] make pip install vllm work in macos (for import only) (#8118)
|
2024-09-03 12:37:08 -07:00 |
|
|
|
5b86b19954
|
[Misc] Optional installation of audio related packages (#8063)
|
2024-09-01 14:46:57 -07:00 |
|
|
|
1b32e02648
|
[Bugfix] Pass PYTHONPATH from setup.py to CMake (#7730)
|
2024-08-21 11:17:48 -07:00 |
|
|
|
1a36287b89
|
[Bugfix] Fix xpu build (#7644)
|
2024-08-18 22:00:09 -07:00 |
|
|
|
386087970a
|
[CI/Build] build on empty device for better dev experience (#4773)
|
2024-08-11 13:09:44 -07:00 |
|
|
|
80cbe10c59
|
[OpenVINO] migrate to latest dependencies versions (#7251)
|
2024-08-07 09:49:10 -07:00 |
|
|
|
a8d604ca2a
|
[Misc] Disambiguate quantized types via a new ScalarType (#6396)
|
2024-08-02 13:51:58 -07:00 |
|
|
|
b482b9a5b1
|
[CI/Build] Add support for Python 3.12 (#7035)
|
2024-08-02 13:51:22 -07:00 |
|
|
|
7ecee34321
|
[Kernel][RFC] Refactor the punica kernel based on Triton (#5036)
|
2024-07-31 17:12:24 -07:00 |
|
|
|
dbfe254eda
|
[Feature] vLLM CLI (#5090)
Co-authored-by: simon-mo <simon.mo@hey.com>
|
2024-07-14 15:36:43 -07:00 |
|
|
|
ccd3c04571
|
[ci][build] fix commit id (#6420)
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-07-14 22:16:21 +08:00 |
|
|
|
111fc6e7ec
|
[Misc] Add generated git commit hash as vllm.__commit__ (#6386)
|
2024-07-12 22:52:15 +00:00 |
|
|
|
57f09a419c
|
[Hardware][Intel] OpenVINO vLLM backend (#5379)
|
2024-06-28 13:50:16 +00:00 |
|
|
|
728c4c8a06
|
[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814)
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com>
Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>
|
2024-06-17 11:01:25 -07:00 |
|
|
|
03dccc886e
|
[Misc] Add vLLM version getter to utils (#5098)
|
2024-06-13 11:21:39 -07:00 |
|
|
|
916d219d62
|
[ci] Use sccache to build images (#5419)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-06-12 17:58:12 -07:00 |
|
|
|
1a8bfd92d5
|
[Hardware] Initial TPU integration (#5292)
|
2024-06-12 11:53:03 -07:00 |
|
|
|
8bab4959be
|
[Misc] Remove VLLM_BUILD_WITH_NEURON env variable (#5389)
|
2024-06-11 00:37:56 -07:00 |
|
|
|
5467ac3196
|
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047)
|
2024-06-09 16:23:30 -04:00 |
|
|
|
a66cf40b20
|
[Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer (#4927)
This PR enables the fused topk_softmax kernel used in moe layer for HIP
|
2024-06-02 14:13:26 -07:00 |
|
|
|
a360ff80bb
|
[CI/Build] CMakeLists: build all extensions' cmake targets at the same time (#5034)
|
2024-05-31 22:06:45 -06:00 |
|
|
|
5bd3c65072
|
[Core][Optimization] remove vllm-nccl (#5091)
|
2024-05-29 05:13:52 +00:00 |
|
|
|
8bc68e198c
|
[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update tensorizer to version 2.9.0 (#4208)
|
2024-05-13 14:57:07 -07:00 |
|
|
|
ff5abcd746
|
[ROCm] Add support for Punica kernels on AMD GPUs (#3140)
Co-authored-by: miloice <jeffaw99@hotmail.com>
|
2024-05-09 09:19:50 -07:00 |
|
|
|
89579a201f
|
[Misc] Use vllm-flash-attn instead of flash-attn (#4686)
|
2024-05-08 13:15:34 -07:00 |
|
|
|
344bf7cd2d
|
[Misc] add installation time env vars (#4574)
|
2024-05-03 15:55:56 -07:00 |
|
|
|
5ad60b0cbd
|
[Misc] Exclude the tests directory from being packaged (#4552)
|
2024-05-02 10:50:25 -07:00 |
|
|
|
8b798eec75
|
[CI/Build][Bugfix] VLLM_USE_PRECOMPILED should skip compilation (#4534)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-05-01 18:01:50 +00:00 |
|
|
|
715c2d854d
|
[Frontend] [Core] Tensorizer: support dynamic num_readers, update version (#4467)
|
2024-04-30 16:32:13 -07:00 |
|
|
|
a88081bf76
|
[CI] Disable non-lazy string operation on logging (#4326)
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
|
2024-04-26 00:16:58 -07:00 |
|
|
|
cd2f63fb36
|
[CI/CD] add neuron docker and ci test scripts (#3571)
|
2024-04-18 15:26:01 -07:00 |
|
|
|
563c54f760
|
[BugFix] Fix tensorizer extra in setup.py (#4072)
|
2024-04-14 14:12:42 -07:00 |
|
|
|
711a000255
|
[Frontend] [Core] feat: Add model loading using tensorizer (#3476)
|
2024-04-13 17:13:01 -07:00 |
|
|
|
c2b4a1bce9
|
[Doc] Add typing hints / mypy types cleanup (#3816)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-04-11 17:17:21 -07:00 |
|
|
|
cfaf49a167
|
[Misc] Define common requirements (#3841)
|
2024-04-05 00:39:17 -07:00 |
|
|
|
ca81ff5196
|
[Core] manage nccl via a pypi package & upgrade to pt 2.2.1 (#3805)
|
2024-04-04 10:26:19 -07:00 |
|
|
|
0e3f06fe9c
|
[Hardware][Intel] Add CPU inference backend (#3634)
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com>
|
2024-04-01 22:07:30 -07:00 |
|
|
|
3492859b68
|
[CI/Build] update default number of jobs and nvcc threads to avoid overloading the system (#3675)
|
2024-03-28 00:18:54 -04:00 |
|
|
|
8f44facddd
|
[Core] remove cupy dependency (#3625)
|
2024-03-27 00:33:26 -07:00 |
|
|
|
01bfb22b41
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|