b1d6e2c9b3
v4.3 update. ( #2709 )
...
* v4.3 update.
* Update the cute_dsl_api changelog's doc link
* Update version to 4.3.0
* Update the example link
* Update doc to encourage user to install DSL from requirements.txt
---------
Co-authored-by: Larry Wu <larwu@nvidia.com >
2025-10-21 14:26:30 -04:00
f874df19ac
4.2.1 update
2025-09-23 13:45:13 -07:00
7a6d4ee099
v4.2.1 update. ( #2666 )
2025-09-23 13:25:43 -04:00
b234a8c024
Rename python/cutlass to python/cutlass_cppgen ( #2652 )
2025-09-18 14:26:57 -04:00
8825e8be4f
Add required changes for github pipeline. ( #2648 )
2025-09-17 22:22:45 -04:00
6a35b4d22f
v4.2 tag release. ( #2638 )
2025-09-15 12:21:53 -04:00
b2dd65dc86
more robust imports in heuristics.py and heuristics_provider.py ( #2596 )
2025-08-28 22:32:55 -04:00
a49a78ffef
v4.2 release. ( #2587 )
...
* Fix default cluster callback values to 1 to avoid profiler failure when these values are not set in command line.
* v4.2 release.
2025-08-22 18:11:24 -04:00
ec18e8043b
Make swizzle in pycute work ( #2553 )
2025-08-19 22:21:00 -04:00
664c4f7b3e
Update CUTLASS version to 4.1
...
Update CUTLASS version to 4.1.
2025-07-26 20:11:04 -04:00
fd6cfe1ed0
v4.1 release update v2. ( #2481 )
2025-07-21 22:03:55 -04:00
ebe98c549a
cache procedural_name in GemmOperation ( #2317 )
2025-07-16 22:25:02 -04:00
a1aaf2300a
v4.1 release
2025-07-03 08:07:53 -04:00
5c6bca0441
Update requirements.txt ( #2390 )
...
Remove the dev suffix in the wheel version
2025-06-10 02:31:49 -04:00
8bdbfca682
v4.0 update. ( #2371 )
2025-06-06 02:39:20 -04:00
1ec230c4bf
Fix typo ( #2299 )
...
Needs == for pip to parse the file
2025-05-15 09:38:42 -04:00
f115c3f854
Release v4.0.0 ( #2294 )
2025-05-13 15:55:29 -04:00
ad7b2f5e84
3.9.2 doc/version ( #2279 )
...
* 3.9.2 doc/version
* whitespace
2025-05-04 00:00:15 -04:00
f535c33634
3.9.1 doc/version change ( #2273 )
2025-05-01 00:27:00 -04:00
e3cb8a773a
Import cuda, cudart, nvrtc lazily ( #2251 )
...
* Lazy cuda import
* More lazy cuda import
* More lazy cuda imports
* minor fixes
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-04-30 23:10:33 -04:00
c4bdfe821c
Lazy scipy import ( #2250 )
2025-04-30 16:10:00 -04:00
b3ce7e12b7
Make cc a positional argument ( #2249 )
2025-04-30 16:09:25 -04:00
fe75ead92e
Import pydot lazily ( #2248 )
2025-04-30 16:08:17 -04:00
35136f5564
Fix wrong detection of python version for use_rmm. ( #2224 )
2025-04-30 15:29:33 -04:00
331a1f5b3f
cutlass 3.9 update ( #2255 )
...
* cutlass 3.9 update
* rebase
* fixes out of shared memory for blockwise Blackwell
* doc format
* fix issue 2253
* disable host ref by default
* fix sm120 smem capacity
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-04-24 15:42:40 -04:00
6f4921858b
v3.9 update ( #2203 )
...
* v3.9 update
* voidD
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-04-02 15:11:18 -04:00
62750a2b75
v3.9 ( #2185 )
...
* v3.8 update x
* fix blackwell gg
* doc change
* doc change
* doc change
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
2025-03-21 01:52:23 -04:00
afa1772203
truncate name for cutlass profiler ( #2124 )
...
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-02-21 00:16:56 -05:00
9b3772dfa6
Hopper Grouped GEMM support for FP8 Accum ( #2123 )
...
* Add support for fp8accum, with profiler extension
* Update .gitignore
* contri
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-02-20 21:55:26 -05:00
b84e9802d8
update 3.8 v2 ( #2112 )
...
* update 3.8 v2
* update 3.8
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-02-19 22:03:14 -05:00
833f6990e0
v3.8.0 update ( #2082 )
...
* 3.8 update
* fix Markus' name
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
2025-02-06 21:33:40 -05:00
389e493055
CUTLASS 3.8 Release ( #2059 )
...
* CUTLASS 3.8 Release
* update
* Update README.md
* Revert "Update README.md"
This reverts commit b353e36fe8 .
* update
* update
---------
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-01-25 02:44:06 -05:00
b78588d163
CUTLASS 3.7 ( #2045 )
...
* CUTLASS 3.7
* clean up changelog
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2025-01-18 09:53:07 -05:00
24f991e879
Fix typo in library_defaults.py ( #2024 )
2025-01-08 15:44:11 -05:00
3d261a5974
3.6.0 update ( #2005 )
...
* 3.6.0 update
* doc and swap stuff
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2024-12-25 01:34:40 -05:00
b0e09d7cd3
Fix cutlass python library with cuda 12.6.2.post1 ( #1942 )
...
* Fix `cutlass` python library with cuda `12.6.2.post1`
Previously we had this error:
```
File "/storage/home/cutlass/python/cutlass/backend/operation.py", line 39, in <listcomp>
_version_splits = [int(x) for x in __version__.split("rc")[0].split(".")]
^^^^^^
ValueError: invalid literal for int() with base 10: 'post1'
```
* Update sm90_utils.py
* Update generator.py
* Update python/cutlass_library/generator.py
Co-authored-by: Jack Kosaian <jackkosaian@gmail.com >
* Update python/cutlass_library/sm90_utils.py
Co-authored-by: Jack Kosaian <jackkosaian@gmail.com >
---------
Co-authored-by: Jack Kosaian <jackkosaian@gmail.com >
2024-11-18 09:06:32 -05:00
83ae20c740
added mapping for bf16 to torch::kBFloat16 ( #1843 )
...
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
2024-10-23 12:48:31 -04:00
f3a3bfcbf2
add maximum support ( #1833 )
2024-10-23 12:44:56 -04:00
cc3c29a81a
CUTLASS 3.6.0 ( #1850 )
...
* v3.6
* update changelog
* update readme
* fix typo
* fixing typos
* hopper gemm with weight prefetch
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2024-10-09 15:33:27 -04:00
44dae8b90e
Adjust profiler space for SM89 ( #1553 )
2024-09-19 11:40:30 -04:00
e1976daacc
Add support for mixed 4-bit/8-bit data types GEMM ( #1413 )
...
* Add support for mixed 4-bit/8-bit data types GEMM
* fix ( and )
---------
Co-authored-by: Aleksandar Samardžić <asamardzic@matf.bg.ac.rs >
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2024-08-29 23:11:06 -04:00
3f084f7f3c
Add couple configs into generator.py for mixed input MM ( #1350 )
...
* Add couple configs into generator.py for mixed input MM
* change one unit test name; reenable 128x32 in the profiler
* Added U8/BF16 tests.
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
2024-08-16 00:59:29 -04:00
2049c6c5a2
5476 cutlass 3x gemm kernels ( #1695 )
...
Co-authored-by: dePaul Miller <23461061+depaulmillz@users.noreply.github.com >
2024-08-08 13:56:23 -04:00
e22ba590cd
support data type w2 used in cutlass_library ( #1517 )
2024-08-06 11:15:18 -04:00
eee0cab26c
Stamp out 1x1x1 clusters, 128x256 CTA shape ( #1665 )
...
Adds 128x256 tile shapes to FP16/BF16 and FP8 generators.
Also adds 1x1x1 clusters to all existing FP16/BF16/FP8 generators.
NOTE: it is important to set kernel filter (--kernels /
CUTLASS_LIBRARY_KERNELS) to a non empty string and skip pruning to get
all of the new configurations.
If profiling exhaustively, they can be set to `*`.
Number of CUTLASS 3.X GEMMs before this commit: 2868
Number of CUTLASS 3.X GEMMs after this commit: 4016
Co-authored-by: Ali Hassani <ahassani@nvidia.com >
2024-07-31 20:22:29 -04:00
be60a0b272
CUTLASS 3.5.1 ( #1623 )
...
* CUTLASS 3.5.1
* updates, optimizations, fixes
2024-07-29 08:46:24 -04:00
7d49e6c7e2
Updates for CUTLASS 3.5.0 ( #1468 )
2024-04-11 21:33:40 -04:00
f9ece1b42c
Python Gemm tile_descriptions fix ( #1439 )
...
* fix python gemm tile descriptions
* fix formatting
* fix math_operation filtering
* fix formatting
2024-03-30 09:00:46 -04:00
629f4653c3
CUTLASS 3.5.0 ( #1411 )
2024-03-19 17:51:04 -04:00
bbe579a9e3
Updates for CUTLASS 3.4.1 ( #1346 )
...
* Updates for CUTLASS 3.4.1
* minor epi change
2024-02-15 15:48:34 -05:00