9354bfd7c1
Keep the documentation consistent with the sgemm_1.cu code. ( #2285 )
...
* Keep the documentation consistent with the sgemm_1.cu code.
* fix typo
---------
Co-authored-by: zky <zky@126.com >
2025-05-19 22:53:15 -04:00
5e9b8e2a25
fix docx ( #2290 )
...
Co-authored-by: xiayongqiang <xiayq1@chinatelecom.cn >
2025-05-19 22:52:37 -04:00
f115c3f854
Release v4.0.0 ( #2294 )
2025-05-13 15:55:29 -04:00
331a1f5b3f
cutlass 3.9 update ( #2255 )
...
* cutlass 3.9 update
* rebase
* fixes out of shared memory for blockwise Blackwell
* doc format
* fix issue 2253
* disable host ref by default
* fix sm120 smem capacity
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-04-24 15:42:40 -04:00
bb4dd682dd
Fix broken links and alt text in cluster launch control docs ( #2234 )
...
* Fix broken links in cluster launch control docs
* Improve titles and alt text
2025-04-21 00:01:12 -04:00
5e497243f7
fix: fig link in cute docs ( #2216 )
2025-04-10 14:51:41 -04:00
dd76dec4ef
[Doc] Make C++ code more plausible ( #2156 )
...
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-04-10 14:35:46 -04:00
09df6ac464
[Doc]fix typo ( #2174 )
...
Co-authored-by: wenju.li <wenju.li@deepctr.cn >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-04-10 12:46:53 -04:00
79fc51f4b8
v3.9 update ( #2213 )
...
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-04-03 02:10:16 -04:00
6f4921858b
v3.9 update ( #2203 )
...
* v3.9 update
* voidD
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-04-02 15:11:18 -04:00
62750a2b75
v3.9 ( #2185 )
...
* v3.8 update x
* fix blackwell gg
* doc change
* doc change
* doc change
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
2025-03-21 01:52:23 -04:00
3fe62887d8
adding blackwell ( #2143 )
2025-03-17 22:20:40 -04:00
bd03b22f64
fix typo ( #2136 )
...
Co-authored-by: XiaoDong <xiaod@nvidia.com >
2025-03-17 22:19:43 -04:00
b84e9802d8
update 3.8 v2 ( #2112 )
...
* update 3.8 v2
* update 3.8
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-02-19 22:03:14 -05:00
0642d46dd4
Update 0x_gemm_tutorial.md ( #2090 )
2025-02-10 16:46:43 -05:00
833f6990e0
v3.8.0 update ( #2082 )
...
* 3.8 update
* fix Markus' name
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-02-06 21:33:40 -05:00
cc19d4d22b
fix a readme broken link ( #2069 )
2025-01-28 18:03:34 -05:00
389e493055
CUTLASS 3.8 Release ( #2059 )
...
* CUTLASS 3.8 Release
* update
* Update README.md
* Revert "Update README.md"
This reverts commit b353e36fe8 .
* update
* update
---------
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-01-25 02:44:06 -05:00
b78588d163
CUTLASS 3.7 ( #2045 )
...
* CUTLASS 3.7
* clean up changelog
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-01-18 09:53:07 -05:00
cffd5d32b7
Update 0x_gemm_tutorial.md ( #1982 )
...
Shouldn't this be BLK_M, BLK_**K**, k
2025-01-06 22:04:35 -05:00
3d261a5974
3.6.0 update ( #2005 )
...
* 3.6.0 update
* doc and swap stuff
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2024-12-25 01:34:40 -05:00
33c584364e
Fix CuTe README Typo ( #1951 )
2024-12-10 22:05:40 -05:00
e5f3caf145
Fix README ( #1658 )
...
* Fix README
* Improve README
---------
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
2024-10-23 12:52:43 -04:00
ea69cc2849
fix typo ( #1853 )
2024-10-23 12:45:28 -04:00
cc3c29a81a
CUTLASS 3.6.0 ( #1850 )
...
* v3.6
* update changelog
* update readme
* fix typo
* fixing typos
* hopper gemm with weight prefetch
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2024-10-09 15:33:27 -04:00
b27c49e84a
Fix cute doc ( #1529 )
2024-10-07 12:38:32 -04:00
4e5a8f6853
3.5.1 plots and updated readme ( #1708 )
...
Co-authored-by: dePaul Miller <23461061+depaulmillz@users.noreply.github.com >
2024-08-12 18:55:55 -04:00
8b2a0408bd
Profiler docs and argument update for raster order ( #1667 )
2024-07-31 16:40:10 -04:00
be60a0b272
CUTLASS 3.5.1 ( #1623 )
...
* CUTLASS 3.5.1
* updates, optimizations, fixes
2024-07-29 08:46:24 -04:00
843adf0408
Fix SMEM index for C in CuTe examples ( #1477 )
2024-07-10 11:14:15 -04:00
2448bb56e6
Update gemm_api_3x.md ( #1386 )
...
Fixed what it seems to be an obvious typo.
2024-07-10 10:59:02 -04:00
033d9efd2d
[Documentation] Fixes the confusion between concatenated vs. composed layout in CuTe documentation ( #1498 )
...
* Update 02_layout_algebra.md
* Update 02_layout_algebra.md
2024-05-02 15:35:12 -04:00
acc3ee18a1
Fix typos in cute docs ( #1486 )
...
* fix typos in 02_layout_algebra.md
* fix typos in 03_tensor.md
2024-05-02 15:34:36 -04:00
7d49e6c7e2
Updates for CUTLASS 3.5.0 ( #1468 )
2024-04-11 21:33:40 -04:00
a40e08e9d5
Update 02_layout_algebra.md ( #1451 )
...
change line 348 to reflect correct layout.
2024-04-10 10:57:57 -04:00
8f7d2789b8
[NFC] improve doc: fix typo in mma doc ( #1417 )
2024-03-27 14:07:20 -04:00
629f4653c3
CUTLASS 3.5.0 ( #1411 )
2024-03-19 17:51:04 -04:00
ffa34e7075
(NFC) improve doc: Add missing verb to sentence ( #1377 )
...
Co-authored-by: lorenzo chelini <lchelini@nvidia.com >
2024-03-04 15:30:10 -05:00
751eb9a885
Update license year ( #1306 )
2024-01-16 14:37:22 -05:00
2f589ffa76
Updates for 3.4 release. ( #1305 )
2024-01-16 13:42:51 -05:00
8236f30675
CUTLASS 3.4.0 ( #1286 )
...
* CUTLASS 3.4.0
* Update CHANGELOG.md
---------
Co-authored-by: Pradeep Ramani <prramani@nvidia.com >
2023-12-29 15:21:31 -05:00
f188f9b709
Fix typo in quickstart.md ( #1257 )
2023-12-07 09:49:52 -05:00
1d7f2a207e
Fix several broken links ( #1168 )
...
Co-authored-by: isaacw <isaacw@nvidia.com >
2023-11-03 00:01:25 -04:00
557be3ab0e
Fix several typos ( #1169 )
...
Co-authored-by: isaacw <isaacw@nvidia.com >
2023-11-02 23:54:46 -04:00
c008b4aea8
CUTLASS 3.3.0 ( #1167 )
...
* Release 3.3.0
Adds support for mixed precision GEMMs On Hopper and Ampere
Adds support for < 16B aligned GEMMs on Hopper
Enhancements to EVT
Enhancements to Python interface
Enhancements to Sub-byte type handling in CuTe
Several other bug-fixes and performance improvements.
* minor doc update
2023-11-02 11:09:05 -04:00
fb10fa5308
Fix broken pipeline link in docs ( #1143 )
2023-10-18 12:55:46 -04:00
90d3b0fb18
CUTLASS 3.2.1 ( #1113 )
...
* Updates for 3.2.1 release.
* Minor fix in gemm op profiler for raster order.
* Add scheduler mapping for raster order in the kernels.
2023-09-26 17:24:26 -04:00
3930f709ce
Fix typo in 0x_gemm_tutorial.md ( #1035 )
2023-08-17 10:52:20 -04:00
4575443d44
CUTLASS 3.2 ( #1024 )
...
* CUTLASS 3.2
2023-08-07 20:50:32 -04:00
9b923dd4c4
fix minor typos ( #984 )
2023-07-05 09:23:01 -04:00