youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Zhuohan Li	f756799b84	Use runtime profiling to replace manual memory analyzers (#81 )	2023-05-19 11:35:44 -06:00
Woosuk Kwon	8d66a7b6d7	Rename variables and methods (#91 )	2023-05-10 00:58:31 -07:00
Woosuk Kwon	7c041ab578	Refactor system architecture (#82 )	2023-05-09 15:30:12 -07:00
Zhuohan Li	27f1410d06	New weight loader without np copy (#52 )	2023-05-03 15:32:04 +08:00
Zhuohan Li	4858f3bb45	Add an option to launch cacheflow without ray (#51 )	2023-04-30 15:42:17 +08:00
Woosuk Kwon	ee88a7e5f3	Add an option to use dummy model weights (#33 )	2023-04-08 23:36:12 -07:00
Woosuk Kwon	12659a0bd7	Add CUDA graph-based all reduce launcher (#26 )	2023-04-05 11:16:57 -07:00
Zhuohan Li	2f49f15585	Support tensor parallel (#2 )	2023-03-21 13:45:42 -07:00
Woosuk Kwon	1a7eb7da61	Support beam search & parallel generation (#7 )	2023-03-10 09:58:21 -08:00
Woosuk Kwon	1ce1333573	Set default dtype to half	2023-02-23 21:31:39 +00:00
Woosuk Kwon	1f6c7ef437	Add controller	2023-02-23 09:32:19 +00:00