cutlass

Author	SHA1	Message	Date
Junkai-Wu	b1d6e2c9b3	v4.3 update. (#2709 ) * v4.3 update. * Update the cute_dsl_api changelog's doc link * Update version to 4.3.0 * Update the example link * Update doc to encourage user to install DSL from requirements.txt --------- Co-authored-by: Larry Wu <larwu@nvidia.com>	2025-10-21 14:26:30 -04:00
Haicheng Wu	f874df19ac	4.2.1 update	2025-09-23 13:45:13 -07:00
Junkai-Wu	7a6d4ee099	v4.2.1 update. (#2666 )	2025-09-23 13:25:43 -04:00
Jack Kosaian	b234a8c024	Rename python/cutlass to python/cutlass_cppgen (#2652 )	2025-09-18 14:26:57 -04:00
Junkai-Wu	8825e8be4f	Add required changes for github pipeline. (#2648 )	2025-09-17 22:22:45 -04:00
Junkai-Wu	6a35b4d22f	v4.2 tag release. (#2638 )	2025-09-15 12:21:53 -04:00
Harrison Barclay	b2dd65dc86	more robust imports in heuristics.py and heuristics_provider.py (#2596 )	2025-08-28 22:32:55 -04:00
Junkai-Wu	a49a78ffef	v4.2 release. (#2587 ) * Fix default cluster callback values to 1 to avoid profiler failure when these values are not set in command line. * v4.2 release.	2025-08-22 18:11:24 -04:00
melonedo	ec18e8043b	Make swizzle in pycute work (#2553 )	2025-08-19 22:21:00 -04:00
Haicheng Wu	664c4f7b3e	Update CUTLASS version to 4.1 Update CUTLASS version to 4.1.	2025-07-26 20:11:04 -04:00
Junkai-Wu	fd6cfe1ed0	v4.1 release update v2. (#2481 )	2025-07-21 22:03:55 -04:00
Colin Peppler	ebe98c549a	cache procedural_name in GemmOperation (#2317 )	2025-07-16 22:25:02 -04:00
Junkai-Wu	a1aaf2300a	v4.1 release	2025-07-03 08:07:53 -04:00
brandonsun	5c6bca0441	Update requirements.txt (#2390 ) Remove the dev suffix in the wheel version	2025-06-10 02:31:49 -04:00
Junkai-Wu	8bdbfca682	v4.0 update. (#2371 )	2025-06-06 02:39:20 -04:00
Ruyman	1ec230c4bf	Fix typo (#2299 ) Needs == for pip to parse the file	2025-05-15 09:38:42 -04:00
Kihiro Bando	f115c3f854	Release v4.0.0 (#2294 )	2025-05-13 15:55:29 -04:00
Haicheng Wu	ad7b2f5e84	3.9.2 doc/version (#2279 ) * 3.9.2 doc/version * whitespace	2025-05-04 00:00:15 -04:00
Haicheng Wu	f535c33634	3.9.1 doc/version change (#2273 )	2025-05-01 00:27:00 -04:00
Michael Lazos	e3cb8a773a	Import cuda, cudart, nvrtc lazily (#2251 ) * Lazy cuda import * More lazy cuda import * More lazy cuda imports * minor fixes --------- Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2025-04-30 23:10:33 -04:00
Michael Lazos	c4bdfe821c	Lazy scipy import (#2250 )	2025-04-30 16:10:00 -04:00
Michael Lazos	b3ce7e12b7	Make cc a positional argument (#2249 )	2025-04-30 16:09:25 -04:00
Michael Lazos	fe75ead92e	Import pydot lazily (#2248 )	2025-04-30 16:08:17 -04:00
Ruoxi	35136f5564	Fix wrong detection of python version for `use_rmm`. (#2224 )	2025-04-30 15:29:33 -04:00
Yujia Zhai	331a1f5b3f	cutlass 3.9 update (#2255 ) * cutlass 3.9 update * rebase * fixes out of shared memory for blockwise Blackwell * doc format * fix issue 2253 * disable host ref by default * fix sm120 smem capacity --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2025-04-24 15:42:40 -04:00
Yujia Zhai	6f4921858b	v3.9 update (#2203 ) * v3.9 update * voidD --------- Co-authored-by: yuzhai <yuzhai@nvidia.com>	2025-04-02 15:11:18 -04:00
Yujia Zhai	62750a2b75	v3.9 (#2185 ) * v3.8 update x * fix blackwell gg * doc change * doc change * doc change --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com> Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>	2025-03-21 01:52:23 -04:00
Yujia Zhai	afa1772203	truncate name for cutlass profiler (#2124 ) Co-authored-by: yuzhai <yuzhai@nvidia.com>	2025-02-21 00:16:56 -05:00
ANIKET SHIVAM	9b3772dfa6	Hopper Grouped GEMM support for FP8 Accum (#2123 ) * Add support for fp8accum, with profiler extension * Update .gitignore * contri --------- Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2025-02-20 21:55:26 -05:00
Yujia Zhai	b84e9802d8	update 3.8 v2 (#2112 ) * update 3.8 v2 * update 3.8 --------- Co-authored-by: yuzhai <yuzhai@nvidia.com>	2025-02-19 22:03:14 -05:00
Yujia Zhai	833f6990e0	v3.8.0 update (#2082 ) * 3.8 update * fix Markus' name --------- Co-authored-by: yuzhai <yuzhai@nvidia.com>	2025-02-06 21:33:40 -05:00
mihir-awatramani	389e493055	CUTLASS 3.8 Release (#2059 ) * CUTLASS 3.8 Release * update * Update README.md * Revert "Update README.md" This reverts commit `b353e36fe8`. * update * update --------- Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2025-01-25 02:44:06 -05:00
Yujia Zhai	b78588d163	CUTLASS 3.7 (#2045 ) * CUTLASS 3.7 * clean up changelog --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2025-01-18 09:53:07 -05:00
ZincCat	24f991e879	Fix typo in library_defaults.py (#2024 )	2025-01-08 15:44:11 -05:00
Yujia Zhai	3d261a5974	3.6.0 update (#2005 ) * 3.6.0 update * doc and swap stuff --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2024-12-25 01:34:40 -05:00
dan_the_3rd	b0e09d7cd3	Fix `cutlass` python library with cuda `12.6.2.post1` (#1942 ) * Fix `cutlass` python library with cuda `12.6.2.post1` Previously we had this error: ``` File "/storage/home/cutlass/python/cutlass/backend/operation.py", line 39, in <listcomp> _version_splits = [int(x) for x in __version__.split("rc")[0].split(".")] ^^^^^^ ValueError: invalid literal for int() with base 10: 'post1' ``` * Update sm90_utils.py * Update generator.py * Update python/cutlass_library/generator.py Co-authored-by: Jack Kosaian <jackkosaian@gmail.com> * Update python/cutlass_library/sm90_utils.py Co-authored-by: Jack Kosaian <jackkosaian@gmail.com> --------- Co-authored-by: Jack Kosaian <jackkosaian@gmail.com>	2024-11-18 09:06:32 -05:00
Bogumil Sapinski Mobica	83ae20c740	added mapping for bf16 to torch::kBFloat16 (#1843 ) Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>	2024-10-23 12:48:31 -04:00
Xinyu Yang	f3a3bfcbf2	add maximum support (#1833 )	2024-10-23 12:44:56 -04:00
Yujia Zhai	cc3c29a81a	CUTLASS 3.6.0 (#1850 ) * v3.6 * update changelog * update readme * fix typo * fixing typos * hopper gemm with weight prefetch --------- Co-authored-by: yuzhai <yuzhai@nvidia.com> Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2024-10-09 15:33:27 -04:00
Wenlei Bao	44dae8b90e	Adjust profiler space for SM89 (#1553 )	2024-09-19 11:40:30 -04:00
Aleksandar Samardžić	e1976daacc	Add support for mixed 4-bit/8-bit data types GEMM (#1413 ) * Add support for mixed 4-bit/8-bit data types GEMM * fix ( and ) --------- Co-authored-by: Aleksandar Samardžić <asamardzic@matf.bg.ac.rs> Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2024-08-29 23:11:06 -04:00
Aleksandar Samardžić	3f084f7f3c	Add couple configs into generator.py for mixed input MM (#1350 ) * Add couple configs into generator.py for mixed input MM * change one unit test name; reenable 128x32 in the profiler * Added U8/BF16 tests. --------- Co-authored-by: Haicheng Wu <haichengw@nvidia.com> Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>	2024-08-16 00:59:29 -04:00
dePaul Miller	2049c6c5a2	5476 cutlass 3x gemm kernels (#1695 ) Co-authored-by: dePaul Miller <23461061+depaulmillz@users.noreply.github.com>	2024-08-08 13:56:23 -04:00
chenwei	e22ba590cd	support data type w2 used in cutlass_library (#1517 )	2024-08-06 11:15:18 -04:00
Ali Hassani	eee0cab26c	Stamp out 1x1x1 clusters, 128x256 CTA shape (#1665 ) Adds 128x256 tile shapes to FP16/BF16 and FP8 generators. Also adds 1x1x1 clusters to all existing FP16/BF16/FP8 generators. NOTE: it is important to set kernel filter (--kernels / CUTLASS_LIBRARY_KERNELS) to a non empty string and skip pruning to get all of the new configurations. If profiling exhaustively, they can be set to `*`. Number of CUTLASS 3.X GEMMs before this commit: 2868 Number of CUTLASS 3.X GEMMs after this commit: 4016 Co-authored-by: Ali Hassani <ahassani@nvidia.com>	2024-07-31 20:22:29 -04:00
Vijay Thakkar	be60a0b272	CUTLASS 3.5.1 (#1623 ) * CUTLASS 3.5.1 * updates, optimizations, fixes	2024-07-29 08:46:24 -04:00
Vijay Thakkar	7d49e6c7e2	Updates for CUTLASS 3.5.0 (#1468 )	2024-04-11 21:33:40 -04:00
jeromeku	f9ece1b42c	Python `Gemm` `tile_descriptions` fix (#1439 ) * fix python gemm tile descriptions * fix formatting * fix math_operation filtering * fix formatting	2024-03-30 09:00:46 -04:00
Vijay Thakkar	629f4653c3	CUTLASS 3.5.0 (#1411 )	2024-03-19 17:51:04 -04:00
ANIKET SHIVAM	bbe579a9e3	Updates for CUTLASS 3.4.1 (#1346 ) * Updates for CUTLASS 3.4.1 * minor epi change	2024-02-15 15:48:34 -05:00

1 2

70 Commits