Commit Graph

25 Commits

Author SHA1 Message Date
a1aaf2300a v4.1 release 2025-07-03 08:07:53 -04:00
8bdbfca682 v4.0 update. (#2371) 2025-06-06 02:39:20 -04:00
331a1f5b3f cutlass 3.9 update (#2255)
* cutlass 3.9 update

* rebase

* fixes out of shared memory for blockwise Blackwell

* doc format

* fix issue 2253

* disable host ref by default

* fix sm120 smem capacity

---------

Co-authored-by: yuzhai <yuzhai@nvidia.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2025-04-24 15:42:40 -04:00
6f4921858b v3.9 update (#2203)
* v3.9 update

* voidD

---------

Co-authored-by: yuzhai <yuzhai@nvidia.com>
2025-04-02 15:11:18 -04:00
833f6990e0 v3.8.0 update (#2082)
* 3.8 update

* fix Markus' name

---------

Co-authored-by: yuzhai <yuzhai@nvidia.com>
2025-02-06 21:33:40 -05:00
b78588d163 CUTLASS 3.7 (#2045)
* CUTLASS 3.7

* clean up changelog

---------

Co-authored-by: yuzhai <yuzhai@nvidia.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2025-01-18 09:53:07 -05:00
3d261a5974 3.6.0 update (#2005)
* 3.6.0 update

* doc and swap stuff

---------

Co-authored-by: yuzhai <yuzhai@nvidia.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2024-12-25 01:34:40 -05:00
08101d9d0c Improve sm90 mixed dtype kernel (#1883) 2024-10-17 20:06:38 -04:00
cc3c29a81a CUTLASS 3.6.0 (#1850)
* v3.6

* update changelog

* update readme

* fix typo

* fixing typos

* hopper gemm with weight prefetch

---------

Co-authored-by: yuzhai <yuzhai@nvidia.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2024-10-09 15:33:27 -04:00
be60a0b272 CUTLASS 3.5.1 (#1623)
* CUTLASS 3.5.1

* updates, optimizations, fixes
2024-07-29 08:46:24 -04:00
7d49e6c7e2 Updates for CUTLASS 3.5.0 (#1468) 2024-04-11 21:33:40 -04:00
629f4653c3 CUTLASS 3.5.0 (#1411) 2024-03-19 17:51:04 -04:00
751eb9a885 Update license year (#1306) 2024-01-16 14:37:22 -05:00
2f589ffa76 Updates for 3.4 release. (#1305) 2024-01-16 13:42:51 -05:00
e9e30c2304 Updates and Bug fixes to CUTLASS 3.3 (#1232) 2023-12-05 09:50:49 -05:00
557be3ab0e Fix several typos (#1169)
Co-authored-by: isaacw <isaacw@nvidia.com>
2023-11-02 23:54:46 -04:00
c008b4aea8 CUTLASS 3.3.0 (#1167)
* Release 3.3.0

Adds support for mixed precision GEMMs On Hopper and Ampere
Adds support for < 16B aligned GEMMs on Hopper
Enhancements to EVT
Enhancements to Python interface
Enhancements to Sub-byte type handling in CuTe
Several other bug-fixes and performance improvements.

* minor doc update
2023-11-02 11:09:05 -04:00
922fb5108b clean the format (#1140) 2023-10-24 22:59:06 -04:00
fa8dfe631f fix missing return warning for repeat and axpby (#1124) 2023-10-12 00:05:45 -04:00
90d3b0fb18 CUTLASS 3.2.1 (#1113)
* Updates for 3.2.1 release.

* Minor fix in gemm op profiler for raster order.

* Add scheduler mapping for raster order in the kernels.
2023-09-26 17:24:26 -04:00
6673df0e48 fix typos (#1059) 2023-08-27 00:49:26 -04:00
4575443d44 CUTLASS 3.2 (#1024)
* CUTLASS 3.2
2023-08-07 20:50:32 -04:00
f079619f5e More updates for 3.1 (#958)
* Updates for 3.1

* Minor change

* doc link fix

* Minor updates
2023-05-24 10:17:16 -04:00
d572cc1aab CUTLASS 3.1 (#915)
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2023-04-14 23:19:34 -04:00
277bd6e537 CUTLASS 3.0.0 (#786)
* CUTLASS 3.0.0
2023-01-23 20:55:28 -05:00