a1aaf2300a
v4.1 release
2025-07-03 08:07:53 -04:00
8bdbfca682
v4.0 update. ( #2371 )
2025-06-06 02:39:20 -04:00
331a1f5b3f
cutlass 3.9 update ( #2255 )
...
* cutlass 3.9 update
* rebase
* fixes out of shared memory for blockwise Blackwell
* doc format
* fix issue 2253
* disable host ref by default
* fix sm120 smem capacity
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-04-24 15:42:40 -04:00
62750a2b75
v3.9 ( #2185 )
...
* v3.8 update x
* fix blackwell gg
* doc change
* doc change
* doc change
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
2025-03-21 01:52:23 -04:00
b84e9802d8
update 3.8 v2 ( #2112 )
...
* update 3.8 v2
* update 3.8
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-02-19 22:03:14 -05:00
b78588d163
CUTLASS 3.7 ( #2045 )
...
* CUTLASS 3.7
* clean up changelog
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-01-18 09:53:07 -05:00
3d261a5974
3.6.0 update ( #2005 )
...
* 3.6.0 update
* doc and swap stuff
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2024-12-25 01:34:40 -05:00
cc3c29a81a
CUTLASS 3.6.0 ( #1850 )
...
* v3.6
* update changelog
* update readme
* fix typo
* fixing typos
* hopper gemm with weight prefetch
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2024-10-09 15:33:27 -04:00
be60a0b272
CUTLASS 3.5.1 ( #1623 )
...
* CUTLASS 3.5.1
* updates, optimizations, fixes
2024-07-29 08:46:24 -04:00
7d49e6c7e2
Updates for CUTLASS 3.5.0 ( #1468 )
2024-04-11 21:33:40 -04:00
629f4653c3
CUTLASS 3.5.0 ( #1411 )
2024-03-19 17:51:04 -04:00
751eb9a885
Update license year ( #1306 )
2024-01-16 14:37:22 -05:00
2f589ffa76
Updates for 3.4 release. ( #1305 )
2024-01-16 13:42:51 -05:00
8236f30675
CUTLASS 3.4.0 ( #1286 )
...
* CUTLASS 3.4.0
* Update CHANGELOG.md
---------
Co-authored-by: Pradeep Ramani <prramani@nvidia.com >
2023-12-29 15:21:31 -05:00
c008b4aea8
CUTLASS 3.3.0 ( #1167 )
...
* Release 3.3.0
Adds support for mixed precision GEMMs On Hopper and Ampere
Adds support for < 16B aligned GEMMs on Hopper
Enhancements to EVT
Enhancements to Python interface
Enhancements to Sub-byte type handling in CuTe
Several other bug-fixes and performance improvements.
* minor doc update
2023-11-02 11:09:05 -04:00
90d3b0fb18
CUTLASS 3.2.1 ( #1113 )
...
* Updates for 3.2.1 release.
* Minor fix in gemm op profiler for raster order.
* Add scheduler mapping for raster order in the kernels.
2023-09-26 17:24:26 -04:00
a88c41cf8d
Updates for 3.2 release ( #1065 )
2023-08-25 23:05:46 -04:00
4575443d44
CUTLASS 3.2 ( #1024 )
...
* CUTLASS 3.2
2023-08-07 20:50:32 -04:00
d572cc1aab
CUTLASS 3.1 ( #915 )
...
Co-authored-by: Aniket Shivam <ashivam@nvidia.com >
2023-04-14 23:19:34 -04:00
277bd6e537
CUTLASS 3.0.0 ( #786 )
...
* CUTLASS 3.0.0
2023-01-23 20:55:28 -05:00