youngkingdom/vllm

Go to file

Woosuk Kwon e9d3f2ff77 Add memory analyzer & utomatically configure KV cache size (#6 )

2023-03-11 23:23:14 -08:00

Add memory analyzer & utomatically configure KV cache size (#6 )

2023-03-11 23:23:14 -08:00

Support beam search & parallel generation (#7 )

2023-03-10 09:58:21 -08:00

Use FlashAttention for multi_query_kv_attention (#4 )

2023-03-01 21:13:08 -08:00

.gitignore

Add gitignore

2023-02-16 07:47:21 +00:00

README.md

Add memory analyzer & utomatically configure KV cache size (#6 )

2023-03-11 23:23:14 -08:00

server.py

Add memory analyzer & utomatically configure KV cache size (#6 )

2023-03-11 23:23:14 -08:00

setup.py

Implement single_query_cached_kv_attention kernel (#3 )

2023-03-01 15:02:19 -08:00

README.md

CacheFlow

Installation

pip install psutil numpy torch transformers
pip install flash-attn # This may take up to 10 mins.
pip install -e .

Run

python server.py

Languages

Python 86.1%

Cuda 8.1%

C++ 4.2%

Shell 0.7%

C 0.4%

Other 0.4%