b1d6e2c9b3
v4.3 update. ( #2709 )
...
* v4.3 update.
* Update the cute_dsl_api changelog's doc link
* Update version to 4.3.0
* Update the example link
* Update doc to encourage user to install DSL from requirements.txt
---------
Co-authored-by: Larry Wu <larwu@nvidia.com >
2025-10-21 14:26:30 -04:00
f874df19ac
4.2.1 update
2025-09-23 13:45:13 -07:00
57e3cfb47a
doc change for 4.2 ( #2639 )
...
* doc change
* fix broken links
* ragged gemm doc update
* move around texts about moe gemm
2025-09-15 22:02:45 -04:00
6a35b4d22f
v4.2 tag release. ( #2638 )
2025-09-15 12:21:53 -04:00
a49a78ffef
v4.2 release. ( #2587 )
...
* Fix default cluster callback values to 1 to avoid profiler failure when these values are not set in command line.
* v4.2 release.
2025-08-22 18:11:24 -04:00
6fb5e667c1
[Doc fix] incorrect compute cap. for Blackwell RTX ( #2511 )
...
Blackwell RTX is compute capability 12.0 (SM120) but incorrectly listed
as SM100 in the README.
2025-07-30 22:14:13 -04:00
fd6cfe1ed0
v4.1 release update v2. ( #2481 )
2025-07-21 22:03:55 -04:00
a1aaf2300a
v4.1 release
2025-07-03 08:07:53 -04:00
b995f93317
4.0 doc change ( #2425 )
2025-06-27 09:35:06 -04:00
c2ad7c5b20
fix link in readme ( #2379 )
2025-06-07 07:38:38 -04:00
5a287538c2
"Update CHANGELOG for 4.0 tagging" ( #2374 )
2025-06-06 10:07:36 -04:00
8bdbfca682
v4.0 update. ( #2371 )
2025-06-06 02:39:20 -04:00
6316b6f867
Fix typos ( #2311 )
...
Signed-off-by: co63oc <co63oc@users.noreply.github.com >
2025-05-23 08:30:10 -04:00
f115c3f854
Release v4.0.0 ( #2294 )
2025-05-13 15:55:29 -04:00
ad7b2f5e84
3.9.2 doc/version ( #2279 )
...
* 3.9.2 doc/version
* whitespace
2025-05-04 00:00:15 -04:00
f535c33634
3.9.1 doc/version change ( #2273 )
2025-05-01 00:27:00 -04:00
f02a7c2976
Update README.md for 3.9
2025-04-24 16:51:45 -04:00
331a1f5b3f
cutlass 3.9 update ( #2255 )
...
* cutlass 3.9 update
* rebase
* fixes out of shared memory for blockwise Blackwell
* doc format
* fix issue 2253
* disable host ref by default
* fix sm120 smem capacity
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-04-24 15:42:40 -04:00
79fc51f4b8
v3.9 update ( #2213 )
...
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-04-03 02:10:16 -04:00
6f4921858b
v3.9 update ( #2203 )
...
* v3.9 update
* voidD
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-04-02 15:11:18 -04:00
62750a2b75
v3.9 ( #2185 )
...
* v3.8 update x
* fix blackwell gg
* doc change
* doc change
* doc change
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
2025-03-21 01:52:23 -04:00
b84e9802d8
update 3.8 v2 ( #2112 )
...
* update 3.8 v2
* update 3.8
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-02-19 22:03:14 -05:00
833f6990e0
v3.8.0 update ( #2082 )
...
* 3.8 update
* fix Markus' name
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-02-06 21:33:40 -05:00
bdd641790a
Update README.md
2025-01-28 18:08:13 -05:00
389e493055
CUTLASS 3.8 Release ( #2059 )
...
* CUTLASS 3.8 Release
* update
* Update README.md
* Revert "Update README.md"
This reverts commit b353e36fe8 .
* update
* update
---------
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-01-25 02:44:06 -05:00
9eb01fa0b0
update 3.7 docs ( #2051 )
...
* update docs
* update docs
* update docs
* update docs
* update docs
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-01-23 15:13:50 -05:00
b78588d163
CUTLASS 3.7 ( #2045 )
...
* CUTLASS 3.7
* clean up changelog
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-01-18 09:53:07 -05:00
3d261a5974
3.6.0 update ( #2005 )
...
* 3.6.0 update
* doc and swap stuff
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2024-12-25 01:34:40 -05:00
e5f3caf145
Fix README ( #1658 )
...
* Fix README
* Improve README
---------
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
2024-10-23 12:52:43 -04:00
cc3c29a81a
CUTLASS 3.6.0 ( #1850 )
...
* v3.6
* update changelog
* update readme
* fix typo
* fixing typos
* hopper gemm with weight prefetch
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2024-10-09 15:33:27 -04:00
8d8cfdf375
update 3.5.1 readme/changelog
2024-08-14 21:12:44 -07:00
4e5a8f6853
3.5.1 plots and updated readme ( #1708 )
...
Co-authored-by: dePaul Miller <23461061+depaulmillz@users.noreply.github.com >
2024-08-12 18:55:55 -04:00
be60a0b272
CUTLASS 3.5.1 ( #1623 )
...
* CUTLASS 3.5.1
* updates, optimizations, fixes
2024-07-29 08:46:24 -04:00
7d49e6c7e2
Updates for CUTLASS 3.5.0 ( #1468 )
2024-04-11 21:33:40 -04:00
629f4653c3
CUTLASS 3.5.0 ( #1411 )
2024-03-19 17:51:04 -04:00
bbe579a9e3
Updates for CUTLASS 3.4.1 ( #1346 )
...
* Updates for CUTLASS 3.4.1
* minor epi change
2024-02-15 15:48:34 -05:00
751eb9a885
Update license year ( #1306 )
2024-01-16 14:37:22 -05:00
2f589ffa76
Updates for 3.4 release. ( #1305 )
2024-01-16 13:42:51 -05:00
8236f30675
CUTLASS 3.4.0 ( #1286 )
...
* CUTLASS 3.4.0
* Update CHANGELOG.md
---------
Co-authored-by: Pradeep Ramani <prramani@nvidia.com >
2023-12-29 15:21:31 -05:00
e9e30c2304
Updates and Bug fixes to CUTLASS 3.3 ( #1232 )
2023-12-05 09:50:49 -05:00
5ae8133cfa
Doc only change changelog 3.3 ( #1180 )
2023-11-13 13:29:22 -05:00
c008b4aea8
CUTLASS 3.3.0 ( #1167 )
...
* Release 3.3.0
Adds support for mixed precision GEMMs On Hopper and Ampere
Adds support for < 16B aligned GEMMs on Hopper
Enhancements to EVT
Enhancements to Python interface
Enhancements to Sub-byte type handling in CuTe
Several other bug-fixes and performance improvements.
* minor doc update
2023-11-02 11:09:05 -04:00
90d3b0fb18
CUTLASS 3.2.1 ( #1113 )
...
* Updates for 3.2.1 release.
* Minor fix in gemm op profiler for raster order.
* Add scheduler mapping for raster order in the kernels.
2023-09-26 17:24:26 -04:00
4575443d44
CUTLASS 3.2 ( #1024 )
...
* CUTLASS 3.2
2023-08-07 20:50:32 -04:00
fde824af21
Update Hopper performance plot for CUTLASS 3.1 + CTK 12.1 ( #967 )
2023-06-01 14:52:40 -04:00
6f47420213
Update README.md
2023-05-24 12:40:31 -04:00
f079619f5e
More updates for 3.1 ( #958 )
...
* Updates for 3.1
* Minor change
* doc link fix
* Minor updates
2023-05-24 10:17:16 -04:00
d572cc1aab
CUTLASS 3.1 ( #915 )
...
Co-authored-by: Aniket Shivam <ashivam@nvidia.com >
2023-04-14 23:19:34 -04:00
7e370c9637
Fix typos 2 ( #842 )
...
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
2023-03-09 23:22:56 -05:00
c4f6b8c6bc
Updates for 3.0 ( #857 )
...
Co-authored-by: Aniket Shivam <ashivam@nvidia.com >
2023-03-09 15:27:40 -05:00