cutlass

Author	SHA1	Message	Date
Kihiro Bando	f115c3f854	Release v4.0.0 (#2294 )	2025-05-13 15:55:29 -04:00
Yujia Zhai	62750a2b75	v3.9 (#2185 ) * v3.8 update x * fix blackwell gg * doc change * doc change * doc change --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com> Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>	2025-03-21 01:52:23 -04:00
ANIKET SHIVAM	9b3772dfa6	Hopper Grouped GEMM support for FP8 Accum (#2123 ) * Add support for fp8accum, with profiler extension * Update .gitignore * contri --------- Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2025-02-20 21:55:26 -05:00
Yujia Zhai	b84e9802d8	update 3.8 v2 (#2112 ) * update 3.8 v2 * update 3.8 --------- Co-authored-by: yuzhai <yuzhai@nvidia.com>	2025-02-19 22:03:14 -05:00
mihir-awatramani	389e493055	CUTLASS 3.8 Release (#2059 ) * CUTLASS 3.8 Release * update * Update README.md * Revert "Update README.md" This reverts commit `b353e36fe8`. * update * update --------- Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2025-01-25 02:44:06 -05:00
Vijay Thakkar	629f4653c3	CUTLASS 3.5.0 (#1411 )	2024-03-19 17:51:04 -04:00
Pradeep Ramani	8236f30675	CUTLASS 3.4.0 (#1286 ) * CUTLASS 3.4.0 * Update CHANGELOG.md --------- Co-authored-by: Pradeep Ramani <prramani@nvidia.com>	2023-12-29 15:21:31 -05:00
Vijay Thakkar	277bd6e537	CUTLASS 3.0.0 (#786 ) * CUTLASS 3.0.0	2023-01-23 20:55:28 -05:00
Aditya Atluri	c975e2ccbb	releaase 2.11 (#703 )	2022-11-19 09:02:15 -05:00
ANIKET SHIVAM	b72cbf957d	CUTLASS 2.10 (#615 ) Co-authored-by: Aniket Shivam <ashivam@nvidia.com>	2022-09-03 18:48:46 -04:00
Andrew Kerr	6b69c79ac3	Fixed contributor formatting. (#365 )	2021-11-22 11:30:53 -08:00
Andrew Kerr	62e438f450	Listed Matthew Nicely as the CUTLASS product manager.. (#364 )	2021-11-19 17:51:21 -08:00
Manikandan Ananth	c5f1ef4dff	update contributors	2021-06-02 10:11:42 -07:00
Andrew Kerr	c53f3339bb	CUTLASS 2.3 initial commit (#134 ) CUTLASS 2.3 adds GEMMs targeting Sparse Tensor Cores on the NVIDIA Ampere Architecture, fast SGEMM, and small matrix classes, bug fixes, and performance enhancements.	2020-09-23 14:00:58 -07:00
Andrew Kerr	86931fef85	CUTLASS 2.2 (#96 ) Adds support for NVIDIA Ampere Architecture features. CUDA 11 Toolkit recommended.	2020-06-08 16:17:35 -07:00
Andrew Kerr	8aca98f9a7	Improved formatting, clarity, and content of several documents. (#64 ) * Improved formatting, clarity, and content of several documents.	2019-11-20 10:42:15 -08:00
Andrew Kerr	fb335f6a5f	CUTLASS 2.0 (#62 ) CUTLASS 2.0 Substantially refactored for - Better performance, particularly for native Turing Tensor Cores - Robust and durable templates spanning the design space - Encapsulated functionality embodying modern C++11 programming techniques - Optimized containers and data types for efficient, generic, portable device code Updates to: - Quick start guide - Documentation - Utilities - CUTLASS Profiler Native Turing Tensor Cores - Efficient GEMM kernels targeting Turing Tensor Cores - Mixed-precision floating point, 8-bit integer, 4-bit integer, and binarized operands Coverage of existing CUTLASS functionality: - GEMM kernels targeting CUDA and Tensor Cores in NVIDIA GPUs - Volta Tensor Cores through native mma.sync and through WMMA API - Optimizations such as parallel reductions, threadblock rasterization, and intra-threadblock reductions - Batched GEMM operations - Complex-valued GEMMs Note: this commit and all that follow require a host compiler supporting C++11 or greater.	2019-11-19 16:55:34 -08:00

17 Commits