cutlass

Author	SHA1	Message	Date
Vijay Thakkar	7d49e6c7e2	Updates for CUTLASS 3.5.0 (#1468 )	2024-04-11 21:33:40 -04:00
Vijay Thakkar	629f4653c3	CUTLASS 3.5.0 (#1411 )	2024-03-19 17:51:04 -04:00
ANIKET SHIVAM	bbe579a9e3	Updates for CUTLASS 3.4.1 (#1346 ) * Updates for CUTLASS 3.4.1 * minor epi change	2024-02-15 15:48:34 -05:00
ANIKET SHIVAM	751eb9a885	Update license year (#1306 )	2024-01-16 14:37:22 -05:00
ANIKET SHIVAM	2f589ffa76	Updates for 3.4 release. (#1305 )	2024-01-16 13:42:51 -05:00
Pradeep Ramani	8236f30675	CUTLASS 3.4.0 (#1286 ) * CUTLASS 3.4.0 * Update CHANGELOG.md --------- Co-authored-by: Pradeep Ramani <prramani@nvidia.com>	2023-12-29 15:21:31 -05:00
Pradeep Ramani	e9e30c2304	Updates and Bug fixes to CUTLASS 3.3 (#1232 )	2023-12-05 09:50:49 -05:00
Manish Gupta	5ae8133cfa	Doc only change changelog 3.3 (#1180 )	2023-11-13 13:29:22 -05:00
Pradeep Ramani	c008b4aea8	CUTLASS 3.3.0 (#1167 ) * Release 3.3.0 Adds support for mixed precision GEMMs On Hopper and Ampere Adds support for < 16B aligned GEMMs on Hopper Enhancements to EVT Enhancements to Python interface Enhancements to Sub-byte type handling in CuTe Several other bug-fixes and performance improvements. * minor doc update	2023-11-02 11:09:05 -04:00
ANIKET SHIVAM	90d3b0fb18	CUTLASS 3.2.1 (#1113 ) * Updates for 3.2.1 release. * Minor fix in gemm op profiler for raster order. * Add scheduler mapping for raster order in the kernels.	2023-09-26 17:24:26 -04:00
ANIKET SHIVAM	4575443d44	CUTLASS 3.2 (#1024 ) * CUTLASS 3.2	2023-08-07 20:50:32 -04:00
Vijay Thakkar	fde824af21	Update Hopper performance plot for CUTLASS 3.1 + CTK 12.1 (#967 )	2023-06-01 14:52:40 -04:00
Haicheng Wu	6f47420213	Update README.md	2023-05-24 12:40:31 -04:00
ANIKET SHIVAM	f079619f5e	More updates for 3.1 (#958 ) * Updates for 3.1 * Minor change * doc link fix * Minor updates	2023-05-24 10:17:16 -04:00
ANIKET SHIVAM	d572cc1aab	CUTLASS 3.1 (#915 ) Co-authored-by: Aniket Shivam <ashivam@nvidia.com>	2023-04-14 23:19:34 -04:00
Alexander Pivovarov	7e370c9637	Fix typos 2 (#842 ) Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>	2023-03-09 23:22:56 -05:00
ANIKET SHIVAM	c4f6b8c6bc	Updates for 3.0 (#857 ) Co-authored-by: Aniket Shivam <ashivam@nvidia.com>	2023-03-09 15:27:40 -05:00
Vijay Thakkar	277bd6e537	CUTLASS 3.0.0 (#786 ) * CUTLASS 3.0.0	2023-01-23 20:55:28 -05:00
ANIKET SHIVAM	66d9cddc83	New updates for 2.11 (#775 ) * New updates. * Minor profiler updates Co-authored-by: Aniket Shivam <ashivam@nvidia.com>	2023-01-20 16:32:57 -05:00
Haicheng Wu	764b840d6f	streamk example and performance tuning (#760 ) * streamk example and performance tuning * one missing file Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2023-01-10 16:10:02 -05:00
Matthew Nicely	78b30d3191	Update README.md	2022-12-21 11:58:19 -05:00
Matthew Nicely	59de82688b	Update README.md	2022-12-21 11:57:55 -05:00
Haicheng Wu	9f1f37aa21	misc (#719 ) * misc * minor Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2022-12-05 12:07:20 -05:00
Aditya Atluri	c975e2ccbb	releaase 2.11 (#703 )	2022-11-19 09:02:15 -05:00
ANIKET SHIVAM	50ceed7154	Minor README fix (#623 ) * minor fix * Minor fix	2022-09-12 22:40:25 -04:00
ANIKET SHIVAM	e773429f7e	CUTLASS 2.10 updates (#622 ) Co-authored-by: Aniket Shivam <ashivam@nvidia.com>	2022-09-12 21:26:30 -04:00
Yujia Zhai	beae168f90	fix broken link (#620 ) Co-authored-by: yuzhai <yuzhai@nvidia.com>	2022-09-06 16:32:44 -04:00
ANIKET SHIVAM	b72cbf957d	CUTLASS 2.10 (#615 ) Co-authored-by: Aniket Shivam <ashivam@nvidia.com>	2022-09-03 18:48:46 -04:00
Haicheng Wu	ba18ea9c32	Update README.md	2022-06-27 23:25:26 -04:00
Haicheng Wu	cc2ea4c3fc	Update README.md	2022-04-28 10:50:11 -04:00
Jack Kosaian	167ac54c65	Fix link to Python example (#469 )	2022-04-23 15:37:38 -04:00
Andrew Kerr	12f4108ac2	CUTLASS 2.9 (#468 )	2022-04-23 15:02:38 -04:00
Andrew Kerr	4e666e1dfd	Updated README and added issue templates. (#382 )	2021-12-17 09:26:20 -05:00
Andrew Kerr	5fe09c2d67	Updated GEMM performance plot with CUTLASS 2.8 compiled with CUDA 11.5 Toolkit (#375 ) Updated GEMM performance plot with CUTLASS 2.8 compiled using CUDA 11.5 Toolkit. GPUs under test: NVIDIA A100 NVIDIA A2 NVIDIA TitanV NVIDIA GeForce 2080 Ti	2021-12-06 14:21:33 -05:00
Andrew Kerr	62e438f450	Listed Matthew Nicely as the CUTLASS product manager.. (#364 )	2021-11-19 17:51:21 -08:00
Manish Gupta	808c25337a	CUTLASS 2.8 (#363 ) CUTLASS 2.8	2021-11-19 13:26:35 -08:00
Manish Gupta	2e07c4cc2f	CUTLASS 2.7 (#318 ) CUTLASS 2.7 Mainloop fusion for GEMM: summation over A or B Strided DGRAD (optimized iterators) Half-precision GELU_taylor activation functions Use these when accumulation and epilogue compute types are all cutlass::half_t Tuning and bug fixes to fused GEMM + GEMM example Support for smaller than 128b aligned Convolutions: see examples Caching of results to accelerate Convolution unit tests Can be enabled or disabled by running cmake .. -DCUTLASS_TEST_ENABLE_CACHED_RESULTS=OFF Corrections and bug fixes reported by the CUTLASS community Thank you for filing these issues! authored-by: Haicheng Wu haichengw@nvidia.com, Manish Gupta manigupta@nvidia.com, Dustyn Blasig dblasig@nvidia.com, Andrew Kerr akerr@nvidia.com	2021-09-20 11:02:22 -07:00
Manish Gupta	6c2f8f2fb8	CUTLASS 2.6.1 - functional and performance enhancements to strided DGRAD, fixes, and tuning * cutlass 2.6 update * remove debug prints * cutlass 2.6.1 (minor update) * Updated CHANGELOG. * Minor edit to readme to indicate patch version. * Minor edit to readme. Co-authored-by: Haicheng Wu <haichengw@nvidia.com>, Andrew Kerr <akerr@nvidia.com>	2021-09-03 10:26:15 -07:00
Manish Gupta	1ac4559d12	Cutlass 2.6 Update 1 (#301 ) * cutlass 2.6 update * remove debug prints	2021-07-27 17:58:30 -07:00
Manish Gupta	e5d51840e8	CUTLASS 2.6 (#298 ) CUTLASS 2.6	2021-07-23 00:40:53 -04:00
Andrew Kerr	200a5a5146	Enabled reduction unit tests.	2021-02-26 15:46:57 -05:00
Andrew Kerr	abdf16a4d9	Updated release notes.	2021-02-26 13:55:04 -05:00
Andrew Kerr	0e13748649	CUTLASS 2.5	2021-02-26 09:58:26 -05:00
Manish Gupta	ccb697bac7	cutlass 2.4 documentation only update	2020-11-23 06:59:45 -06:00
Manish Gupta	6615010cd0	CUTLASS 2.4 (Implicit GEMM convolution) (#147 ) CUTLASS 2.4 (Implicit GEMM Convolution) Co-authored-by: Manish Gupta <manigupta@nvidia.com>, Haicheng Wu <haichengw@nvidia.com>, Dustyn Blasig <dblasig@nvidia.com>, Andrew Kerr <akerr@nvidia.com>	2020-11-19 21:25:25 -08:00
Andrew Kerr	c53f3339bb	CUTLASS 2.3 initial commit (#134 ) CUTLASS 2.3 adds GEMMs targeting Sparse Tensor Cores on the NVIDIA Ampere Architecture, fast SGEMM, and small matrix classes, bug fixes, and performance enhancements.	2020-09-23 14:00:58 -07:00
Andrew Kerr	1ab1027954	Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. (#100 ) - Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. - Enhancement to CUTLASS Utility Library's HostTensorPlanarComplex template to support copy-in and copy-out - Added test_examples target to build and test all CUTLASS examples - Minor edits to documentation to point to GTC 2020 webinar	2020-06-15 10:47:01 -07:00
Andrew Kerr	86931fef85	CUTLASS 2.2 (#96 ) Adds support for NVIDIA Ampere Architecture features. CUDA 11 Toolkit recommended.	2020-06-08 16:17:35 -07:00
Andrew Kerr	96dab34ad9	CUTLASS 2.1 (#83 ) CUTLASS 2.1 contributes: - BLAS-style host-side API added to CUTLASS Library - Planar Complex GEMM kernels targeting Volta and Turing Tensor Cores - Minor enhancements and bug fixes	2020-04-07 13:51:25 -07:00
Andrew Kerr	8aca98f9a7	Improved formatting, clarity, and content of several documents. (#64 ) * Improved formatting, clarity, and content of several documents.	2019-11-20 10:42:15 -08:00

1 2

83 Commits