cutlass

Author	SHA1	Message	Date
Jack Kosaian	b234a8c024	Rename python/cutlass to python/cutlass_cppgen (#2652 )	2025-09-18 14:26:57 -04:00
Junkai-Wu	74825181f2	Remove old-version dsl examples. (#2644 )	2025-09-17 22:23:30 -04:00
Junkai-Wu	8825e8be4f	Add required changes for github pipeline. (#2648 )	2025-09-17 22:22:45 -04:00
wbn	7817e47154	Fxied a typo in pipeline descript docs. (#2623 )	2025-09-15 22:32:27 -04:00
Asuka	25ccb875b8	Fix: a calculation error in the example of dividing out in the 02_layout_algebra doc (#2635 )	2025-09-15 22:31:33 -04:00
Wanshe	29c1ad704a	Fix doc cute 03_tensor.md link typo (#2627 ) * Update 03_tensor.md fix link typo change path to relative path * Update 03_tensor.md --------- Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>	2025-09-15 22:26:43 -04:00
Haicheng Wu	57e3cfb47a	doc change for 4.2 (#2639 ) * doc change * fix broken links * ragged gemm doc update * move around texts about moe gemm	2025-09-15 22:02:45 -04:00
Haicheng Wu	e7e0adddac	Update version.h change version number to 4.2	2025-09-15 12:40:58 -04:00
Junkai-Wu	6a35b4d22f	v4.2 tag release. (#2638 )	2025-09-15 12:21:53 -04:00
Richard Cai	56f0718a97	ex77 backwards GQA (#2556 ) * bwd GQA init * Update examples/77_blackwell_fmha/77_blackwell_fmha_bwd.cu * ref kernel type conversion fix --------- Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>	2025-09-09 12:53:28 -04:00
Lifu Huang	76c96b0be3	Fix incorrect shapes in copy_atom doc comments. (#2575 )	2025-09-04 16:57:24 -07:00
ao jia	d98e7bf7ce	Fix comment in mma_atom.hpp (#2579 )	2025-09-04 16:56:39 -07:00
Lifu Huang	b6ccf34aef	Fix Copy_Atom type mismatch in sgemm_sm80.cu (#2582 )	2025-09-04 16:56:17 -07:00
Andrei Alexandrescu	2288c0c901	Fix bugs in matrix.h (#2598 )	2025-09-04 16:55:11 -07:00
Harrison Barclay	b2dd65dc86	more robust imports in heuristics.py and heuristics_provider.py (#2596 )	2025-08-28 22:32:55 -04:00
Javier	496654bf2c	Fix sm100 gemm wrong static constexpr that breaks compilation on Windows (#2167 ) * Fix a sm100 gemm wrong defined static constexpr that breaks compilation on Windows * Fix a sm100 gemm wrong defined static constexpr that breaks compilation on Windows * More Windows fixes Signed-off-by: Javier <25750030+SystemPanic@users.noreply.github.com> * Revert "More Windows fixes" This reverts commit `2e8cfc1382`. --------- Signed-off-by: Javier <25750030+SystemPanic@users.noreply.github.com>	2025-08-28 22:13:00 -04:00
Linfeng Zheng	9ca7e877b2	fix gqa issue for blackwell fmha.py (#2599 )	2025-08-28 11:15:20 -04:00
Junkai-Wu	a49a78ffef	v4.2 release. (#2587 ) * Fix default cluster callback values to 1 to avoid profiler failure when these values are not set in command line. * v4.2 release.	2025-08-22 18:11:24 -04:00
qqwqqw689	11cad1f67b	fix a typo. (#2561 )	2025-08-19 22:23:09 -04:00
zkyue	931359cec1	Fix typo in functional.h (#2571 )	2025-08-19 22:22:31 -04:00
Inoday Yadav	42e7c546c4	Add movmatrix support (movmatrix.sync.aligned.m8n8.trans.b16) (#2562 )	2025-08-19 22:22:02 -04:00
melonedo	ec18e8043b	Make swizzle in pycute work (#2553 )	2025-08-19 22:21:00 -04:00
Srinath Kailasa	5b76420d6a	[DOC] Add more exposition to composition example (#2536 ) * Add more exposition to composition example * Apply suggestions from code review Co-authored-by: Cris Cecka <ccecka@users.noreply.github.com> --------- Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com> Co-authored-by: Cris Cecka <ccecka@users.noreply.github.com>	2025-08-11 22:20:36 -04:00
Horace He	19772cd63e	Fix typo in smem_allocator.py (#2517 )	2025-08-10 22:44:22 -04:00
zkyue	052afcd314	fix typo (#2529 )	2025-08-10 22:44:02 -04:00
Srinath Kailasa	86cf63e2d4	NIT: Grammar (#2537 )	2025-08-10 22:42:45 -04:00
Tarun Paparaju	a267d47f9b	Update batched_gemm.cu (#2538 )	2025-08-10 22:42:21 -04:00
starwang1024	9e6ab77d27	Fix a copy error in the SM70 main loop when loading data from smem to rmem (#2540 )	2025-08-10 22:42:01 -04:00
Robert Maynard	d0eada85a3	Support both CUDA 12 and 13 cccl header locations (#2543 )	2025-08-10 22:41:25 -04:00
Lifu Huang	23139309e9	Fix incorrect K dim in CuTe MMA Atom doc. (#2544 )	2025-08-10 22:40:56 -04:00
Wenxin Cheng	6dd13d4278	Facebook:This commit makes its files safe for use with -Wimplicit-fallthrough. (#2324 )	2025-07-31 20:55:19 -04:00
Srinath Kailasa	3b054767b3	Fix typo (#2514 )	2025-07-30 22:14:54 -04:00
Ali Hassani	6fb5e667c1	[Doc fix] incorrect compute cap. for Blackwell RTX (#2511 ) Blackwell RTX is compute capability 12.0 (SM120) but incorrectly listed as SM100 in the README.	2025-07-30 22:14:13 -04:00
Wenbo Yang	6c891db9f6	Fix epilogue:🧵:Convert cannot be used with cute::collective::DefaultEpilogue. (#2333 )	2025-07-30 22:12:53 -04:00
botbw	da47886e34	Fix example bug (#2351 )	2025-07-30 22:12:33 -04:00
kf-zhang	26b7450023	support fp16 accmulator for sm89 fp8 mma (#2378 ) * add support for sm89 in cute and the unit tests * support fp16 accmulator for sm89 fp8 mma * format code	2025-07-30 22:12:08 -04:00
Luca Wehrstedt	a39cf6b511	Fix example in CuTe tutorials (#2416 )	2025-07-30 22:11:47 -04:00
Aditya Kane	f09045d660	Corrected minor nit in mma_traits.hpp (#2447 ) * Corrected minor nit in mma_traits.hpp The entry and descriptions were jumbled up. * Update mma_traits.hpp * Update mma_traits.hpp	2025-07-30 22:11:23 -04:00
xiangjiaojun	84a27b3926	fix: examples/cute/tutorial/blackwell/04_mma_tma_2sm_sm100.cu GridDim miscalculated (#2492 ) * fix: examples/cute/tutorial/blackwell/04_mma_tma_2sm_sm100.cu Launch dimGrid error * feat: add cta tiler * Update examples/cute/tutorial/blackwell/04_mma_tma_2sm_sm100.cu use cluster_layout_vmnk instead of cta_tiler Co-authored-by: Junkai-Wu <junkaiw@nvidia.com> * feat: remove cta_tiler --------- Co-authored-by: qinghongzeng <qinghongzeng@deeproute.ai> Co-authored-by: Junkai-Wu <junkaiw@nvidia.com>	2025-07-30 22:11:04 -04:00
kernyan	e093b4f691	Fix tutorial comment in sgemm_1.cu: use tCrC instead of tCsA in axpby explanation (#2448 )	2025-07-30 22:09:55 -04:00
Haicheng Wu	664c4f7b3e	Update CUTLASS version to 4.1 Update CUTLASS version to 4.1.	2025-07-26 20:11:04 -04:00
Zeyu WANG	0e026982ce	Example 77 add blackwell fmha bwd for MLA shape (#2466 ) * Update examples/77_blackwell_fmha/device/fmha_device_bwd.hpp Co-authored-by: Vijay Thakkar <vijaythakkar@me.com> * bug fix & use existing value rather than pass one more argument to support different dim in bwd_convert * Fix casual mask cnt when IsQBegin==false * bug fix in casual mask backward * code sync --------- Co-authored-by: Vijay Thakkar <vijaythakkar@me.com>	2025-07-24 18:41:11 -04:00
Larry Wu	9a9a579714	Merge pull request #2489 from NVIDIA/update_workflow_script Support "CuTe DSL" auto-labeling in workflow	2025-07-23 15:33:43 +08:00
Larry Wu	51d730b8be	Support "CuTe DSL" auto-labeling in workflow	2025-07-23 00:28:01 -07:00
Larry Wu	6c0c8b7484	1. Update bug/feature report template to add component selection. (#2485 ) 2. Add workflow to apply component label automatically	2025-07-22 12:38:03 -04:00
Haicheng Wu	e51efbfe18	Update CHANGELOG.md v4.1.0	2025-07-21 22:09:56 -04:00
Junkai-Wu	fd6cfe1ed0	v4.1 release update v2. (#2481 )	2025-07-21 22:03:55 -04:00
zhang	9baa06dd57	Add Blackwell MLA forward (shape: d=192, dv=128) implementation in example_77 (#2472 )	2025-07-18 01:27:48 -04:00
Colin Peppler	ebe98c549a	cache procedural_name in GemmOperation (#2317 )	2025-07-16 22:25:02 -04:00
Oleksandr Pavlyk	9892624b66	Fix typos in the text (#2417 )	2025-07-16 21:51:12 -04:00

1 2 3 4 5 ...

714 Commits