Commit Graph

15 Commits

Author SHA1 Message Date
96b6f475dd Remove hardcoded device="cuda" to support more devices (#2503)
Co-authored-by: Jiang Li <jiang1.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2024-02-01 15:46:39 -08:00
89efcf1ce5 [Minor] Fix test_cache.py CI test failure (#2684) 2024-01-31 10:12:11 -08:00
4f65af0e25 Add swap_blocks unit tests (#2616) 2024-01-30 09:30:50 -08:00
9090bf02e7 Support FP8-E5M2 KV Cache (#2279)
Co-authored-by: zhaoyang <zhao.yang16@zte.com.cn>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-01-28 16:43:54 -08:00
6e01e8c1c8 [CI] Add Buildkite (#2355) 2024-01-14 12:37:58 -08:00
941767127c Revert the changes in test_cache (#2335) 2024-01-03 17:32:05 -08:00
fd4ea8ef5c Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
77af974b40 [FIX] Support non-zero CUDA devices in custom kernels (#1959) 2024-01-02 19:09:59 -08:00
e0c6f556e8 [Build] Avoid building too many extensions (#1624) 2023-11-23 16:31:19 -08:00
0ce8647dc5 Fix integer overflows in attention & cache ops (#1514) 2023-10-31 15:19:30 -07:00
ba0bfd40e2 TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) 2023-10-02 15:36:09 -07:00
fbd80ad409 Clean up kernel unit tests (#938) 2023-09-05 16:57:38 -07:00
d6fa1be3a8 [Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
0b98ba15c7 Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
825d8892b5 Use pytest format for unit tests (#107) 2023-05-17 17:11:23 -07:00