youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Michael Goin	235366fe2e	[CI] Prune back the number of tests in tests/kernels/* (#9932 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-11-05 16:02:32 -05:00
wangshuai09	622b7ab955	[Hardware] using current_platform.seed_everything (#9785 ) Signed-off-by: wangshuai09 <391746016@qq.com>	2024-10-29 14:47:44 +00:00
Cyrus Leung	6ffa3f314c	[CI/Build] Avoid CUDA initialization (#8534 )	2024-09-18 10:38:11 +00:00
Luka Govedič	5d73ae49d6	[Kernel] AQ AZP 3/4: Asymmetric quantization kernels (#7270 )	2024-09-16 11:52:40 -07:00
bnellnm	73202dbe77	[Kernel][Misc] register ops to prevent graph breaks (#6917 ) Co-authored-by: Sage Moore <sage@neuralmagic.com>	2024-09-11 12:52:19 -07:00
jon-chuang	50b8d08dbd	[Misc/Testing] Use `torch.testing.assert_close` (#7324 )	2024-08-16 04:24:04 +00:00
Lucas Wilkinson	a8d604ca2a	[Misc] Disambiguate quantized types via a new ScalarType (#6396 )	2024-08-02 13:51:58 -07:00
Varun Sundar Rabindranath	b5241e41d9	[ Kernel ] FP8 Dynamic-Per-Token Quant Kernel (#6511 ) Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-07-18 01:38:35 +00:00
bnellnm	5467ac3196	[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )	2024-06-09 16:23:30 -04:00
Dipika Sikka	ca3ea51bde	[Kernel] Dynamic Per-Token Activation Quantization (#5037 ) Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-06-07 09:36:26 -07:00
Tyler Michael Smith	cbb2f59cc8	[Kernel] Pass a device pointer into the quantize kernel for the scales (#5159 )	2024-06-03 09:52:30 -07:00
Dipika Sikka	a1242324c9	[Kernel] Initial Activation Quantization Support (#4525 ) Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2024-05-23 21:29:18 +00:00

12 Commits