Commit Graph

726 Commits

Author SHA1 Message Date
8afb19d904 update CITATION.cff 2025-10-28 23:42:37 -04:00
b2ca083d2b Fixed compilation error when using StreamK scheduler + PDL. (#2686) 2025-10-21 23:11:14 -04:00
b1d6e2c9b3 v4.3 update. (#2709)
* v4.3 update.

* Update the cute_dsl_api changelog's doc link

* Update version to 4.3.0

* Update the example link

* Update doc to encourage user to install DSL from requirements.txt

---------

Co-authored-by: Larry Wu <larwu@nvidia.com>
2025-10-21 14:26:30 -04:00
e6e2cc29f5 fix (#2684) 2025-10-15 14:46:38 -04:00
c6aeb9179c Update pyproject.toml
update version to 4.2.1
2025-09-24 01:18:51 -04:00
95a5ff14c0 Update CHANGELOG.md
format change
2025-09-23 17:33:00 -04:00
fb8b43ef05 Merge pull request #2669 from NVIDIA/421_update
4.2.1 update
2025-09-23 14:02:29 -07:00
f874df19ac 4.2.1 update 2025-09-23 13:45:13 -07:00
7a6d4ee099 v4.2.1 update. (#2666) 2025-09-23 13:25:43 -04:00
GTO
2b8dff1f90 Fix bfloat16 epsilon (#2607)
* Fix bfloat16 epsilon

* just use constants

---------

Co-authored-by: Konstantin <konstantin@MacBook-Air.local>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2025-09-21 23:43:59 -04:00
fd0312ddf6 Remove duplicate function calls (#1584) 2025-09-21 23:16:59 -04:00
64579189ec Feature/add bottom causal mask (#2480)
* Rebase to latest

* update

* upd

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

* Update fmha_fusion.hpp

* Update fmha_fusion.hpp

fixed flipped logic for isQBegin

* Update fmha_fusion.hpp

* Avoid use of booleans

The current expression is confusing

* fmt

* Update fmha_fusion.hpp

Reproduce error/fix with: 
./77_blackwell_fmha_fp16 --verify --b=1 --q=1013 --k=1024 --h=1 --h_k=1 --mask=causal --causal-type=qend

* add test, format

---------

Co-authored-by: Richard Cai <ricai@nvidia.com>
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
2025-09-18 17:11:23 -04:00
b234a8c024 Rename python/cutlass to python/cutlass_cppgen (#2652) 2025-09-18 14:26:57 -04:00
74825181f2 Remove old-version dsl examples. (#2644) 2025-09-17 22:23:30 -04:00
8825e8be4f Add required changes for github pipeline. (#2648) 2025-09-17 22:22:45 -04:00
wbn
7817e47154 Fxied a typo in pipeline descript docs. (#2623) 2025-09-15 22:32:27 -04:00
25ccb875b8 Fix: a calculation error in the example of dividing out in the 02_layout_algebra doc (#2635) 2025-09-15 22:31:33 -04:00
29c1ad704a Fix doc cute 03_tensor.md link typo (#2627)
* Update 03_tensor.md fix link typo

change path to relative path

* Update 03_tensor.md

---------

Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
2025-09-15 22:26:43 -04:00
57e3cfb47a doc change for 4.2 (#2639)
* doc change

* fix broken links

* ragged gemm doc update

* move around texts about moe gemm
2025-09-15 22:02:45 -04:00
e7e0adddac Update version.h
change version number to 4.2
2025-09-15 12:40:58 -04:00
6a35b4d22f v4.2 tag release. (#2638) 2025-09-15 12:21:53 -04:00
56f0718a97 ex77 backwards GQA (#2556)
* bwd GQA init

* Update examples/77_blackwell_fmha/77_blackwell_fmha_bwd.cu

* ref kernel type conversion fix

---------

Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
2025-09-09 12:53:28 -04:00
76c96b0be3 Fix incorrect shapes in copy_atom doc comments. (#2575) 2025-09-04 16:57:24 -07:00
d98e7bf7ce Fix comment in mma_atom.hpp (#2579) 2025-09-04 16:56:39 -07:00
b6ccf34aef Fix Copy_Atom type mismatch in sgemm_sm80.cu (#2582) 2025-09-04 16:56:17 -07:00
2288c0c901 Fix bugs in matrix.h (#2598) 2025-09-04 16:55:11 -07:00
b2dd65dc86 more robust imports in heuristics.py and heuristics_provider.py (#2596) 2025-08-28 22:32:55 -04:00
496654bf2c Fix sm100 gemm wrong static constexpr that breaks compilation on Windows (#2167)
* Fix a sm100 gemm wrong defined static constexpr that breaks compilation on Windows

* Fix a sm100 gemm wrong defined static constexpr that breaks compilation on Windows

* More Windows fixes

Signed-off-by: Javier <25750030+SystemPanic@users.noreply.github.com>

* Revert "More Windows fixes"

This reverts commit 2e8cfc1382.

---------

Signed-off-by: Javier <25750030+SystemPanic@users.noreply.github.com>
2025-08-28 22:13:00 -04:00
9ca7e877b2 fix gqa issue for blackwell fmha.py (#2599) 2025-08-28 11:15:20 -04:00
a49a78ffef v4.2 release. (#2587)
* Fix default cluster callback values to 1 to avoid profiler failure when these values are not set in command line.

* v4.2 release.
2025-08-22 18:11:24 -04:00
11cad1f67b fix a typo. (#2561) 2025-08-19 22:23:09 -04:00
931359cec1 Fix typo in functional.h (#2571) 2025-08-19 22:22:31 -04:00
42e7c546c4 Add movmatrix support (movmatrix.sync.aligned.m8n8.trans.b16) (#2562) 2025-08-19 22:22:02 -04:00
ec18e8043b Make swizzle in pycute work (#2553) 2025-08-19 22:21:00 -04:00
5b76420d6a [DOC] Add more exposition to composition example (#2536)
* Add more exposition to composition example

* Apply suggestions from code review

Co-authored-by: Cris Cecka <ccecka@users.noreply.github.com>

---------

Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
Co-authored-by: Cris Cecka <ccecka@users.noreply.github.com>
2025-08-11 22:20:36 -04:00
19772cd63e Fix typo in smem_allocator.py (#2517) 2025-08-10 22:44:22 -04:00
052afcd314 fix typo (#2529) 2025-08-10 22:44:02 -04:00
86cf63e2d4 NIT: Grammar (#2537) 2025-08-10 22:42:45 -04:00
a267d47f9b Update batched_gemm.cu (#2538) 2025-08-10 22:42:21 -04:00
9e6ab77d27 Fix a copy error in the SM70 main loop when loading data from smem to rmem (#2540) 2025-08-10 22:42:01 -04:00
d0eada85a3 Support both CUDA 12 and 13 cccl header locations (#2543) 2025-08-10 22:41:25 -04:00
23139309e9 Fix incorrect K dim in CuTe MMA Atom doc. (#2544) 2025-08-10 22:40:56 -04:00
6dd13d4278 Facebook:This commit makes its files safe for use with -Wimplicit-fallthrough. (#2324) 2025-07-31 20:55:19 -04:00
3b054767b3 Fix typo (#2514) 2025-07-30 22:14:54 -04:00
6fb5e667c1 [Doc fix] incorrect compute cap. for Blackwell RTX (#2511)
Blackwell RTX is compute capability 12.0 (SM120) but incorrectly listed
as SM100 in the README.
2025-07-30 22:14:13 -04:00
6c891db9f6 Fix epilogue:🧵:Convert cannot be used with cute::collective::DefaultEpilogue. (#2333) 2025-07-30 22:12:53 -04:00
da47886e34 Fix example bug (#2351) 2025-07-30 22:12:33 -04:00
26b7450023 support fp16 accmulator for sm89 fp8 mma (#2378)
* add support for sm89 in cute and the unit tests

* support fp16 accmulator for sm89 fp8 mma

* format code
2025-07-30 22:12:08 -04:00
a39cf6b511 Fix example in CuTe tutorials (#2416) 2025-07-30 22:11:47 -04:00
f09045d660 Corrected minor nit in mma_traits.hpp (#2447)
* Corrected minor nit in mma_traits.hpp

The entry and descriptions were jumbled up.

* Update mma_traits.hpp

* Update mma_traits.hpp
2025-07-30 22:11:23 -04:00