7 Commits

Author SHA1 Message Date
a49a78ffef v4.2 release. (#2587)
* Fix default cluster callback values to 1 to avoid profiler failure when these values are not set in command line.

* v4.2 release.
2025-08-22 18:11:24 -04:00
fd6cfe1ed0 v4.1 release update v2. (#2481) 2025-07-21 22:03:55 -04:00
a1aaf2300a v4.1 release 2025-07-03 08:07:53 -04:00
8bdbfca682 v4.0 update. (#2371) 2025-06-06 02:39:20 -04:00
f115c3f854 Release v4.0.0 (#2294) 2025-05-13 15:55:29 -04:00
331a1f5b3f cutlass 3.9 update (#2255)
* cutlass 3.9 update

* rebase

* fixes out of shared memory for blockwise Blackwell

* doc format

* fix issue 2253

* disable host ref by default

* fix sm120 smem capacity

---------

Co-authored-by: yuzhai <yuzhai@nvidia.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2025-04-24 15:42:40 -04:00
62750a2b75 v3.9 (#2185)
* v3.8 update x

* fix blackwell gg

* doc change

* doc change

* doc change

---------

Co-authored-by: yuzhai <yuzhai@nvidia.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
2025-03-21 01:52:23 -04:00