|
|
b6ccf34aef
|
Fix Copy_Atom type mismatch in sgemm_sm80.cu (#2582)
|
2025-09-04 16:56:17 -07:00 |
|
|
|
a49a78ffef
|
v4.2 release. (#2587)
* Fix default cluster callback values to 1 to avoid profiler failure when these values are not set in command line.
* v4.2 release.
|
2025-08-22 18:11:24 -04:00 |
|
|
|
da47886e34
|
Fix example bug (#2351)
|
2025-07-30 22:12:33 -04:00 |
|
|
|
84a27b3926
|
fix: examples/cute/tutorial/blackwell/04_mma_tma_2sm_sm100.cu GridDim miscalculated (#2492)
* fix: examples/cute/tutorial/blackwell/04_mma_tma_2sm_sm100.cu Launch dimGrid error
* feat: add cta tiler
* Update examples/cute/tutorial/blackwell/04_mma_tma_2sm_sm100.cu
use cluster_layout_vmnk instead of cta_tiler
Co-authored-by: Junkai-Wu <junkaiw@nvidia.com>
* feat: remove cta_tiler
---------
Co-authored-by: qinghongzeng <qinghongzeng@deeproute.ai>
Co-authored-by: Junkai-Wu <junkaiw@nvidia.com>
|
2025-07-30 22:11:04 -04:00 |
|
|
|
e093b4f691
|
Fix tutorial comment in sgemm_1.cu: use tCrC instead of tCsA in axpby explanation (#2448)
|
2025-07-30 22:09:55 -04:00 |
|
|
|
fd6cfe1ed0
|
v4.1 release update v2. (#2481)
|
2025-07-21 22:03:55 -04:00 |
|
|
|
8bdbfca682
|
v4.0 update. (#2371)
|
2025-06-06 02:39:20 -04:00 |
|
|
|
f115c3f854
|
Release v4.0.0 (#2294)
|
2025-05-13 15:55:29 -04:00 |
|
|
|
331a1f5b3f
|
cutlass 3.9 update (#2255)
* cutlass 3.9 update
* rebase
* fixes out of shared memory for blockwise Blackwell
* doc format
* fix issue 2253
* disable host ref by default
* fix sm120 smem capacity
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
|
2025-04-24 15:42:40 -04:00 |
|
|
|
62750a2b75
|
v3.9 (#2185)
* v3.8 update x
* fix blackwell gg
* doc change
* doc change
* doc change
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
|
2025-03-21 01:52:23 -04:00 |
|
|
|
b78588d163
|
CUTLASS 3.7 (#2045)
* CUTLASS 3.7
* clean up changelog
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
|
2025-01-18 09:53:07 -05:00 |
|
|
|
52b35e90ce
|
Fix Typos (#2021)
* Fix Typo
* Fix Typo
|
2025-01-08 23:46:28 -05:00 |
|
|
|
3d261a5974
|
3.6.0 update (#2005)
* 3.6.0 update
* doc and swap stuff
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
|
2024-12-25 01:34:40 -05:00 |
|
|
|
cc3c29a81a
|
CUTLASS 3.6.0 (#1850)
* v3.6
* update changelog
* update readme
* fix typo
* fixing typos
* hopper gemm with weight prefetch
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
|
2024-10-09 15:33:27 -04:00 |
|
|
|
be60a0b272
|
CUTLASS 3.5.1 (#1623)
* CUTLASS 3.5.1
* updates, optimizations, fixes
|
2024-07-29 08:46:24 -04:00 |
|
|
|
843adf0408
|
Fix SMEM index for C in CuTe examples (#1477)
|
2024-07-10 11:14:15 -04:00 |
|
|
|
629f4653c3
|
CUTLASS 3.5.0 (#1411)
|
2024-03-19 17:51:04 -04:00 |
|
|
|
751eb9a885
|
Update license year (#1306)
|
2024-01-16 14:37:22 -05:00 |
|
|
|
8236f30675
|
CUTLASS 3.4.0 (#1286)
* CUTLASS 3.4.0
* Update CHANGELOG.md
---------
Co-authored-by: Pradeep Ramani <prramani@nvidia.com>
|
2023-12-29 15:21:31 -05:00 |
|
|
|
d572cc1aab
|
CUTLASS 3.1 (#915)
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
|
2023-04-14 23:19:34 -04:00 |
|
|
|
277bd6e537
|
CUTLASS 3.0.0 (#786)
* CUTLASS 3.0.0
|
2023-01-23 20:55:28 -05:00 |
|