b234a8c024
Rename python/cutlass to python/cutlass_cppgen ( #2652 )
2025-09-18 14:26:57 -04:00
74825181f2
Remove old-version dsl examples. ( #2644 )
2025-09-17 22:23:30 -04:00
8825e8be4f
Add required changes for github pipeline. ( #2648 )
2025-09-17 22:22:45 -04:00
7817e47154
Fxied a typo in pipeline descript docs. ( #2623 )
2025-09-15 22:32:27 -04:00
25ccb875b8
Fix: a calculation error in the example of dividing out in the 02_layout_algebra doc ( #2635 )
2025-09-15 22:31:33 -04:00
29c1ad704a
Fix doc cute 03_tensor.md link typo ( #2627 )
...
* Update 03_tensor.md fix link typo
change path to relative path
* Update 03_tensor.md
---------
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
2025-09-15 22:26:43 -04:00
57e3cfb47a
doc change for 4.2 ( #2639 )
...
* doc change
* fix broken links
* ragged gemm doc update
* move around texts about moe gemm
2025-09-15 22:02:45 -04:00
e7e0adddac
Update version.h
...
change version number to 4.2
2025-09-15 12:40:58 -04:00
6a35b4d22f
v4.2 tag release. ( #2638 )
2025-09-15 12:21:53 -04:00
56f0718a97
ex77 backwards GQA ( #2556 )
...
* bwd GQA init
* Update examples/77_blackwell_fmha/77_blackwell_fmha_bwd.cu
* ref kernel type conversion fix
---------
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
2025-09-09 12:53:28 -04:00
76c96b0be3
Fix incorrect shapes in copy_atom doc comments. ( #2575 )
2025-09-04 16:57:24 -07:00
d98e7bf7ce
Fix comment in mma_atom.hpp ( #2579 )
2025-09-04 16:56:39 -07:00
b6ccf34aef
Fix Copy_Atom type mismatch in sgemm_sm80.cu ( #2582 )
2025-09-04 16:56:17 -07:00
2288c0c901
Fix bugs in matrix.h ( #2598 )
2025-09-04 16:55:11 -07:00
b2dd65dc86
more robust imports in heuristics.py and heuristics_provider.py ( #2596 )
2025-08-28 22:32:55 -04:00
496654bf2c
Fix sm100 gemm wrong static constexpr that breaks compilation on Windows ( #2167 )
...
* Fix a sm100 gemm wrong defined static constexpr that breaks compilation on Windows
* Fix a sm100 gemm wrong defined static constexpr that breaks compilation on Windows
* More Windows fixes
Signed-off-by: Javier <25750030+SystemPanic@users.noreply.github.com >
* Revert "More Windows fixes"
This reverts commit 2e8cfc1382 .
---------
Signed-off-by: Javier <25750030+SystemPanic@users.noreply.github.com >
2025-08-28 22:13:00 -04:00
9ca7e877b2
fix gqa issue for blackwell fmha.py ( #2599 )
2025-08-28 11:15:20 -04:00
a49a78ffef
v4.2 release. ( #2587 )
...
* Fix default cluster callback values to 1 to avoid profiler failure when these values are not set in command line.
* v4.2 release.
2025-08-22 18:11:24 -04:00
11cad1f67b
fix a typo. ( #2561 )
2025-08-19 22:23:09 -04:00
931359cec1
Fix typo in functional.h ( #2571 )
2025-08-19 22:22:31 -04:00
42e7c546c4
Add movmatrix support (movmatrix.sync.aligned.m8n8.trans.b16) ( #2562 )
2025-08-19 22:22:02 -04:00
ec18e8043b
Make swizzle in pycute work ( #2553 )
2025-08-19 22:21:00 -04:00
5b76420d6a
[DOC] Add more exposition to composition example ( #2536 )
...
* Add more exposition to composition example
* Apply suggestions from code review
Co-authored-by: Cris Cecka <ccecka@users.noreply.github.com >
---------
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com >
Co-authored-by: Cris Cecka <ccecka@users.noreply.github.com >
2025-08-11 22:20:36 -04:00
19772cd63e
Fix typo in smem_allocator.py ( #2517 )
2025-08-10 22:44:22 -04:00
052afcd314
fix typo ( #2529 )
2025-08-10 22:44:02 -04:00
86cf63e2d4
NIT: Grammar ( #2537 )
2025-08-10 22:42:45 -04:00
a267d47f9b
Update batched_gemm.cu ( #2538 )
2025-08-10 22:42:21 -04:00
9e6ab77d27
Fix a copy error in the SM70 main loop when loading data from smem to rmem ( #2540 )
2025-08-10 22:42:01 -04:00
d0eada85a3
Support both CUDA 12 and 13 cccl header locations ( #2543 )
2025-08-10 22:41:25 -04:00
23139309e9
Fix incorrect K dim in CuTe MMA Atom doc. ( #2544 )
2025-08-10 22:40:56 -04:00
6dd13d4278
Facebook:This commit makes its files safe for use with -Wimplicit-fallthrough. ( #2324 )
2025-07-31 20:55:19 -04:00
3b054767b3
Fix typo ( #2514 )
2025-07-30 22:14:54 -04:00
6fb5e667c1
[Doc fix] incorrect compute cap. for Blackwell RTX ( #2511 )
...
Blackwell RTX is compute capability 12.0 (SM120) but incorrectly listed
as SM100 in the README.
2025-07-30 22:14:13 -04:00
6c891db9f6
Fix epilogue: 🧵 :Convert cannot be used with cute::collective::DefaultEpilogue. ( #2333 )
2025-07-30 22:12:53 -04:00
da47886e34
Fix example bug ( #2351 )
2025-07-30 22:12:33 -04:00
26b7450023
support fp16 accmulator for sm89 fp8 mma ( #2378 )
...
* add support for sm89 in cute and the unit tests
* support fp16 accmulator for sm89 fp8 mma
* format code
2025-07-30 22:12:08 -04:00
a39cf6b511
Fix example in CuTe tutorials ( #2416 )
2025-07-30 22:11:47 -04:00
f09045d660
Corrected minor nit in mma_traits.hpp ( #2447 )
...
* Corrected minor nit in mma_traits.hpp
The entry and descriptions were jumbled up.
* Update mma_traits.hpp
* Update mma_traits.hpp
2025-07-30 22:11:23 -04:00
84a27b3926
fix: examples/cute/tutorial/blackwell/04_mma_tma_2sm_sm100.cu GridDim miscalculated ( #2492 )
...
* fix: examples/cute/tutorial/blackwell/04_mma_tma_2sm_sm100.cu Launch dimGrid error
* feat: add cta tiler
* Update examples/cute/tutorial/blackwell/04_mma_tma_2sm_sm100.cu
use cluster_layout_vmnk instead of cta_tiler
Co-authored-by: Junkai-Wu <junkaiw@nvidia.com >
* feat: remove cta_tiler
---------
Co-authored-by: qinghongzeng <qinghongzeng@deeproute.ai >
Co-authored-by: Junkai-Wu <junkaiw@nvidia.com >
2025-07-30 22:11:04 -04:00
e093b4f691
Fix tutorial comment in sgemm_1.cu: use tCrC instead of tCsA in axpby explanation ( #2448 )
2025-07-30 22:09:55 -04:00
664c4f7b3e
Update CUTLASS version to 4.1
...
Update CUTLASS version to 4.1.
2025-07-26 20:11:04 -04:00
0e026982ce
Example 77 add blackwell fmha bwd for MLA shape ( #2466 )
...
* Update examples/77_blackwell_fmha/device/fmha_device_bwd.hpp
Co-authored-by: Vijay Thakkar <vijaythakkar@me.com >
* bug fix & use existing value rather than pass one more argument to support different dim in bwd_convert
* Fix casual mask cnt when IsQBegin==false
* bug fix in casual mask backward
* code sync
---------
Co-authored-by: Vijay Thakkar <vijaythakkar@me.com >
2025-07-24 18:41:11 -04:00
9a9a579714
Merge pull request #2489 from NVIDIA/update_workflow_script
...
Support "CuTe DSL" auto-labeling in workflow
2025-07-23 15:33:43 +08:00
51d730b8be
Support "CuTe DSL" auto-labeling in workflow
2025-07-23 00:28:01 -07:00
6c0c8b7484
1. Update bug/feature report template to add component selection. ( #2485 )
...
2. Add workflow to apply component label automatically
2025-07-22 12:38:03 -04:00
e51efbfe18
Update CHANGELOG.md
v4.1.0
2025-07-21 22:09:56 -04:00
fd6cfe1ed0
v4.1 release update v2. ( #2481 )
2025-07-21 22:03:55 -04:00
9baa06dd57
Add Blackwell MLA forward (shape: d=192, dv=128) implementation in example_77 ( #2472 )
2025-07-18 01:27:48 -04:00
ebe98c549a
cache procedural_name in GemmOperation ( #2317 )
2025-07-16 22:25:02 -04:00
9892624b66
Fix typos in the text ( #2417 )
2025-07-16 21:51:12 -04:00