47daa33c61
fix cuda 12.6 issues ( #2066 )
2025-01-28 17:28:29 -05:00
389e493055
CUTLASS 3.8 Release ( #2059 )
...
* CUTLASS 3.8 Release
* update
* Update README.md
* Revert "Update README.md"
This reverts commit b353e36fe8 .
* update
* update
---------
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-01-25 02:44:06 -05:00
b78588d163
CUTLASS 3.7 ( #2045 )
...
* CUTLASS 3.7
* clean up changelog
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-01-18 09:53:07 -05:00
375e284e6a
Add Line Break ( #2020 )
2025-01-08 23:46:59 -05:00
3d261a5974
3.6.0 update ( #2005 )
...
* 3.6.0 update
* doc and swap stuff
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2024-12-25 01:34:40 -05:00
e1cd8c7866
Fix Typo ( #1962 )
2024-12-10 22:07:37 -05:00
d656afbd2a
fix undefined in device code error ( #1880 )
2024-11-06 14:56:54 -05:00
e8a8b69365
Refactor some GroupedGEMM logic ( #1899 )
2024-10-25 20:14:01 -04:00
08a49953a0
Add a print for the uint{x}b_t type. ( #1871 )
2024-10-24 14:39:22 -04:00
a424ca6cf9
fix wrong A/BLayout in MMA_Traits for binary mma and append other MMA_Traits support ( #1856 )
...
* fix wrong A/BLayout in MMA_Traits<SM80_16x8x256_S32U1U1S32_TN_XORPOPC> and append support for m8n8k128, m16n8k128 mma.and.popc in MMA_Traits instantiation
* add "print" template for subbyte_reference<T>
2024-10-24 14:38:35 -04:00
d65266a868
Add all supported GMMA shapes ( #1890 )
2024-10-22 18:13:36 -04:00
5b50a8faaf
Add GMMA shape m64n40k16 ( #1864 )
2024-10-21 20:41:47 -04:00
08101d9d0c
Improve sm90 mixed dtype kernel ( #1883 )
2024-10-17 20:06:38 -04:00
cc3c29a81a
CUTLASS 3.6.0 ( #1850 )
...
* v3.6
* update changelog
* update readme
* fix typo
* fixing typos
* hopper gemm with weight prefetch
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2024-10-09 15:33:27 -04:00
2991ce18d3
Add print_svg for mma ( #1733 )
...
* add print_svg for mma
* correct the code indentation
2024-09-18 10:37:24 -04:00
dbdae514e0
Support for TMA Epilogue for Group Gemm and add pingpong ptr array & Group Gemm ( #1795 )
2024-09-11 00:07:31 -04:00
21d0534167
fix assertion ( #1790 )
2024-09-09 14:05:27 -04:00
4dbf5dbed2
Use CUDA runtime API to retrieve function pointer to driver API ( #1700 )
...
* Query pfn to driver api
* use default for older toolkits
---------
Co-authored-by: shunfans <shunfans@nvidia.com >
2024-08-19 13:26:09 -04:00
7192f4ab23
Add CLayout_64x208 ( #1680 )
...
Without this I get compilation error when the extended shapes are enabled
2024-08-08 14:00:24 -04:00
36cbfcf483
Add extended wgmma shapes for all data types ( #1666 )
2024-07-31 18:33:14 -04:00
5b283c872c
Add more GMMA shapes ( #1630 )
...
* Add more GMMA shapes
* Add more shapes for BF16
2024-07-29 19:09:51 -04:00
be60a0b272
CUTLASS 3.5.1 ( #1623 )
...
* CUTLASS 3.5.1
* updates, optimizations, fixes
2024-07-29 08:46:24 -04:00
81b06ee0e0
Fix B operand variable name and comments ( #1458 )
2024-07-10 11:06:29 -04:00
7d49e6c7e2
Updates for CUTLASS 3.5.0 ( #1468 )
2024-04-11 21:33:40 -04:00
28cbacbf64
fix stride compilation warning ( #1415 )
2024-03-29 23:50:33 -04:00
629f4653c3
CUTLASS 3.5.0 ( #1411 )
2024-03-19 17:51:04 -04:00
a8f2c80db0
fix tile_size(TiledCopy<Args...> const&) error ( #1357 )
2024-02-24 00:33:01 -05:00
bbe579a9e3
Updates for CUTLASS 3.4.1 ( #1346 )
...
* Updates for CUTLASS 3.4.1
* minor epi change
2024-02-15 15:48:34 -05:00
8825fbf1ef
fix unrecognized print format specifier for int8/uint8 ( #1303 )
...
* fix unrecognized print format specifier for int8/uint8
* use c++ static_cast instead of c cast style
2024-01-29 21:22:40 -05:00
092f14db05
fix tile_size_mnk compilation warning ( #1294 )
2024-01-29 21:21:15 -05:00
751eb9a885
Update license year ( #1306 )
2024-01-16 14:37:22 -05:00
2f589ffa76
Updates for 3.4 release. ( #1305 )
2024-01-16 13:42:51 -05:00
74d1f3e63a
Fix cute::array<T, 0> iterator ( #1273 )
2024-01-08 17:10:09 -05:00
8236f30675
CUTLASS 3.4.0 ( #1286 )
...
* CUTLASS 3.4.0
* Update CHANGELOG.md
---------
Co-authored-by: Pradeep Ramani <prramani@nvidia.com >
2023-12-29 15:21:31 -05:00
b7508e3379
Fix inline ptx escaping for predicates. ( #1264 )
...
* Fix inline ptx escaping for predicates.
Prevents `error: invalid % escape in inline assembly string` when compiling with clang.
* More double-quoting.
2023-12-14 11:16:15 -05:00
e1483d5fa0
Collection of changes to fix clang build. ( #1200 )
...
* Remove unused variables
* Qualify calls to make_fragment_? from templated base class.
Fixes clang build error.
* Add missing `#include <cstdio>`
* Various changes to fix clang compile errors.
* More changes to fix clang build.
Remaining issues:
- `params` initializer of `CollectiveEpilogue`.
- `ops` initializer of `Sm90VisitorImplBase`.
- `__usAtomicCAS` needs to be added to clang upstream.
* Fix remaining clang build issues.
* Qualify `cute::rank()` calls.
* Qualify some more calls that are otherwise ambiguous between `cute` and `std` namespace.
* Double-escape special registers in inline asm.
* small change
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2023-12-08 14:42:12 -05:00
e9e30c2304
Updates and Bug fixes to CUTLASS 3.3 ( #1232 )
2023-12-05 09:50:49 -05:00
2375a07d01
Qualify calls to make_fragment_? from templated base class. ( #1196 )
...
Fixes clang build error.
2023-12-01 09:52:57 -05:00
eb01d5449d
fix cp.async L2 prefetch typo ( #1187 )
2023-11-28 16:58:04 -05:00
6e60b9b17c
enable L2::128B prefetch for cp.async by default ( #1177 )
2023-11-13 13:30:13 -05:00
39c6a83f23
fix missing return warning ( #1173 )
2023-11-03 22:42:59 -04:00
557be3ab0e
Fix several typos ( #1169 )
...
Co-authored-by: isaacw <isaacw@nvidia.com >
2023-11-02 23:54:46 -04:00
c008b4aea8
CUTLASS 3.3.0 ( #1167 )
...
* Release 3.3.0
Adds support for mixed precision GEMMs On Hopper and Ampere
Adds support for < 16B aligned GEMMs on Hopper
Enhancements to EVT
Enhancements to Python interface
Enhancements to Sub-byte type handling in CuTe
Several other bug-fixes and performance improvements.
* minor doc update
2023-11-02 11:09:05 -04:00
922fb5108b
clean the format ( #1140 )
2023-10-24 22:59:06 -04:00
fa8dfe631f
fix missing return warning for repeat and axpby ( #1124 )
2023-10-12 00:05:45 -04:00
14f69bddc8
[fix] fix comparison operator for integer_subbyte ( #1090 )
2023-09-26 17:26:12 -04:00
90d3b0fb18
CUTLASS 3.2.1 ( #1113 )
...
* Updates for 3.2.1 release.
* Minor fix in gemm op profiler for raster order.
* Add scheduler mapping for raster order in the kernels.
2023-09-26 17:24:26 -04:00
e0aaa3c3b3
fix GmmaDescriptor print format string error ( #1102 )
2023-09-19 23:27:58 -04:00
34fd98056b
fix cinttypes issue with STDC_FORMAT_MACROS ( #1068 )
...
* fix cinttypes issue with STDC_FORMAT_MACROS
* Update mma_sm90_desc.hpp
* Update mma_sm90_desc.hpp
---------
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
2023-08-29 14:59:33 -04:00
6673df0e48
fix typos ( #1059 )
2023-08-27 00:49:26 -04:00