youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Woosuk Kwon	c3442c1f6f	Refactor system architecture (#109 )	2023-05-20 13:06:59 -07:00
Woosuk Kwon	7addca5935	Specify python package dependencies in requirements.txt (#78 )	2023-05-07 16:30:43 -07:00
Woosuk Kwon	c9d5b6d4a8	Replace FlashAttention with xformers (#70 )	2023-05-05 02:01:08 -07:00
Woosuk Kwon	2c5cd0defe	Add ninja to dependency (#21 )	2023-04-01 19:00:20 -07:00
Zhuohan Li	e3f00d191e	Modify README to include info on loading LLaMA (#18 )	2023-04-01 01:07:57 +08:00
Woosuk Kwon	80a2f812f1	Implement LLaMA (#9 ) Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2023-03-30 12:25:32 +08:00
Zhuohan Li	721fa3df15	FastAPI-based working frontend (#10 )	2023-03-29 14:48:56 +08:00
Zhuohan Li	2f49f15585	Support tensor parallel (#2 )	2023-03-21 13:45:42 -07:00
Woosuk Kwon	e9d3f2ff77	Add memory analyzer & utomatically configure KV cache size (#6 )	2023-03-11 23:23:14 -08:00
Woosuk Kwon	3e9f991d6a	Use FlashAttention for `multi_query_kv_attention` (#4 )	2023-03-01 21:13:08 -08:00
Woosuk Kwon	c84c708a1d	Add README	2023-02-24 12:04:49 +00:00
Woosuk Kwon	e7d9d9c08c	Initial commit	2023-02-09 11:24:15 +00:00