Commit Graph

  • 8afb19d904 update CITATION.cff main Haicheng Wu 2025-10-28 23:42:37 -04:00
  • b2ca083d2b Fixed compilation error when using StreamK scheduler + PDL. (#2686) Qi Yuhang 2025-10-22 11:11:14 +08:00
  • b1d6e2c9b3 v4.3 update. (#2709) Junkai-Wu 2025-10-22 02:26:30 +08:00
  • e6e2cc29f5 fix (#2684) Lain 2025-10-15 11:46:38 -07:00
  • 6aa1894093 Enable mxfp8-mxfp4 group gemm on cutlass feature/enable-mxfp-group-gemm-sm120 Faraz Khoubsirat 2025-09-25 00:32:11 +00:00
  • f3fde58372 Update pyproject.toml v4.2.1 release/4.2 Haicheng Wu 2025-09-24 01:19:30 -04:00
  • c6aeb9179c Update pyproject.toml Haicheng Wu 2025-09-24 01:18:51 -04:00
  • a8749e67ba Update CHANGELOG.md Haicheng Wu 2025-09-23 17:33:42 -04:00
  • 95a5ff14c0 Update CHANGELOG.md Haicheng Wu 2025-09-23 17:33:00 -04:00
  • c609b86db2 Feature/add bottom causal mask (#2480) Aya Z. Ibrahim 2025-09-18 14:11:23 -07:00
  • 177a82e251 Rename python/cutlass to python/cutlass_cppgen (#2652) Jack Kosaian 2025-09-18 13:26:57 -05:00
  • 4260d4aef9 4.2.1 update Haicheng Wu 2025-09-23 13:45:13 -07:00
  • fb8b43ef05 Merge pull request #2669 from NVIDIA/421_update ANIKET SHIVAM 2025-09-23 14:02:29 -07:00
  • f874df19ac 4.2.1 update Haicheng Wu 2025-09-23 13:45:13 -07:00
  • ee914c3cec v4.2.1 update. (#2667) Junkai-Wu 2025-09-24 02:25:14 +08:00
  • 7a6d4ee099 v4.2.1 update. (#2666) Junkai-Wu 2025-09-24 01:25:43 +08:00
  • 2b8dff1f90 Fix bfloat16 epsilon (#2607) GTO 2025-09-22 06:43:59 +03:00
  • fd0312ddf6 Remove duplicate function calls (#1584) 103yiran 2025-09-22 11:16:59 +08:00
  • 64579189ec Feature/add bottom causal mask (#2480) Aya Z. Ibrahim 2025-09-18 14:11:23 -07:00
  • b234a8c024 Rename python/cutlass to python/cutlass_cppgen (#2652) Jack Kosaian 2025-09-18 13:26:57 -05:00
  • 59b61c606f add support matrix v4.2.0 Haicheng Wu 2025-09-17 20:20:50 -07:00
  • 6b73aedb11 Fxied a typo in pipeline descript docs. (#2623) wbn 2025-09-16 10:32:27 +08:00
  • ebf5e5effd Fix: a calculation error in the example of dividing out in the 02_layout_algebra doc (#2635) Asuka 2025-09-16 10:31:33 +08:00
  • df3923b0bb Fix doc cute 03_tensor.md link typo (#2627) Wanshe 2025-09-16 10:26:43 +08:00
  • 74825181f2 Remove old-version dsl examples. (#2644) Junkai-Wu 2025-09-18 10:23:30 +08:00
  • a49f8062e3 Remove old-version dsl examples (#2645) Junkai-Wu 2025-09-18 10:23:07 +08:00
  • 8825e8be4f Add required changes for github pipeline. (#2648) Junkai-Wu 2025-09-18 10:22:45 +08:00
  • 7817e47154 Fxied a typo in pipeline descript docs. (#2623) wbn 2025-09-16 10:32:27 +08:00
  • 25ccb875b8 Fix: a calculation error in the example of dividing out in the 02_layout_algebra doc (#2635) Asuka 2025-09-16 10:31:33 +08:00
  • 29c1ad704a Fix doc cute 03_tensor.md link typo (#2627) Wanshe 2025-09-16 10:26:43 +08:00
  • 57e3cfb47a doc change for 4.2 (#2639) v4 Haicheng Wu 2025-09-15 22:02:45 -04:00
  • e7e0adddac Update version.h Haicheng Wu 2025-09-15 12:40:58 -04:00
  • 6a35b4d22f v4.2 tag release. (#2638) Junkai-Wu 2025-09-16 00:21:53 +08:00
  • 56f0718a97 ex77 backwards GQA (#2556) Richard Cai 2025-09-09 09:53:28 -07:00
  • 76c96b0be3 Fix incorrect shapes in copy_atom doc comments. (#2575) Lifu Huang 2025-09-04 16:57:24 -07:00
  • d98e7bf7ce Fix comment in mma_atom.hpp (#2579) ao jia 2025-09-05 07:56:39 +08:00
  • b6ccf34aef Fix Copy_Atom type mismatch in sgemm_sm80.cu (#2582) Lifu Huang 2025-09-04 16:56:17 -07:00
  • 2288c0c901 Fix bugs in matrix.h (#2598) Andrei Alexandrescu 2025-09-04 19:55:11 -04:00
  • b2dd65dc86 more robust imports in heuristics.py and heuristics_provider.py (#2596) Harrison Barclay 2025-08-28 22:32:55 -04:00
  • 496654bf2c Fix sm100 gemm wrong static constexpr that breaks compilation on Windows (#2167) Javier 2025-08-28 21:13:00 -05:00
  • 9ca7e877b2 fix gqa issue for blackwell fmha.py (#2599) Linfeng Zheng 2025-08-28 23:15:20 +08:00
  • a49a78ffef v4.2 release. (#2587) Junkai-Wu 2025-08-23 06:11:24 +08:00
  • 11cad1f67b fix a typo. (#2561) qqwqqw689 2025-08-20 10:23:09 +08:00
  • 931359cec1 Fix typo in functional.h (#2571) zkyue 2025-08-20 10:22:31 +08:00
  • 42e7c546c4 Add movmatrix support (movmatrix.sync.aligned.m8n8.trans.b16) (#2562) Inoday Yadav 2025-08-19 22:22:02 -04:00
  • ec18e8043b Make swizzle in pycute work (#2553) melonedo 2025-08-20 10:21:00 +08:00
  • 5b76420d6a [DOC] Add more exposition to composition example (#2536) Srinath Kailasa 2025-08-12 03:20:36 +01:00
  • 19772cd63e Fix typo in smem_allocator.py (#2517) Horace He 2025-08-10 19:44:22 -07:00
  • 052afcd314 fix typo (#2529) zkyue 2025-08-11 10:44:02 +08:00
  • 86cf63e2d4 NIT: Grammar (#2537) Srinath Kailasa 2025-08-11 03:42:45 +01:00
  • a267d47f9b Update batched_gemm.cu (#2538) Tarun Paparaju 2025-08-10 19:42:21 -07:00
  • 9e6ab77d27 Fix a copy error in the SM70 main loop when loading data from smem to rmem (#2540) starwang1024 2025-08-11 10:42:01 +08:00
  • d0eada85a3 Support both CUDA 12 and 13 cccl header locations (#2543) Robert Maynard 2025-08-10 22:41:25 -04:00
  • 23139309e9 Fix incorrect K dim in CuTe MMA Atom doc. (#2544) Lifu Huang 2025-08-10 19:40:56 -07:00
  • 6dd13d4278 Facebook:This commit makes its files safe for use with -Wimplicit-fallthrough. (#2324) Wenxin Cheng 2025-07-31 17:55:19 -07:00
  • 3b054767b3 Fix typo (#2514) Srinath Kailasa 2025-07-31 03:14:54 +01:00
  • 6fb5e667c1 [Doc fix] incorrect compute cap. for Blackwell RTX (#2511) Ali Hassani 2025-07-30 22:14:13 -04:00
  • 6c891db9f6 Fix epilogue:🧵:Convert cannot be used with cute::collective::DefaultEpilogue. (#2333) Wenbo Yang 2025-07-31 10:12:53 +08:00
  • da47886e34 Fix example bug (#2351) botbw 2025-07-31 10:12:33 +08:00
  • 26b7450023 support fp16 accmulator for sm89 fp8 mma (#2378) kf-zhang 2025-07-31 10:12:08 +08:00
  • a39cf6b511 Fix example in CuTe tutorials (#2416) Luca Wehrstedt 2025-07-31 04:11:47 +02:00
  • f09045d660 Corrected minor nit in mma_traits.hpp (#2447) Aditya Kane 2025-07-30 19:11:23 -07:00
  • 84a27b3926 fix: examples/cute/tutorial/blackwell/04_mma_tma_2sm_sm100.cu GridDim miscalculated (#2492) xiangjiaojun 2025-07-31 10:11:04 +08:00
  • e093b4f691 Fix tutorial comment in sgemm_1.cu: use tCrC instead of tCsA in axpby explanation (#2448) kernyan 2025-07-30 22:09:55 -04:00
  • 664c4f7b3e Update CUTLASS version to 4.1 Haicheng Wu 2025-07-26 20:11:04 -04:00
  • 0e026982ce Example 77 add blackwell fmha bwd for MLA shape (#2466) Zeyu WANG 2025-07-25 06:41:11 +08:00
  • 9a9a579714 Merge pull request #2489 from NVIDIA/update_workflow_script Larry Wu 2025-07-23 15:33:43 +08:00
  • 51d730b8be Support "CuTe DSL" auto-labeling in workflow Larry Wu 2025-07-23 00:28:01 -07:00
  • 6c0c8b7484 1. Update bug/feature report template to add component selection. (#2485) Larry Wu 2025-07-23 00:38:03 +08:00
  • e51efbfe18 Update CHANGELOG.md v4.1.0 Haicheng Wu 2025-07-21 22:09:56 -04:00
  • fd6cfe1ed0 v4.1 release update v2. (#2481) Junkai-Wu 2025-07-22 10:03:55 +08:00
  • 9baa06dd57 Add Blackwell MLA forward (shape: d=192, dv=128) implementation in example_77 (#2472) zhang 2025-07-18 13:27:48 +08:00
  • ebe98c549a cache procedural_name in GemmOperation (#2317) Colin Peppler 2025-07-16 19:25:02 -07:00
  • 9892624b66 Fix typos in the text (#2417) Oleksandr Pavlyk 2025-07-16 20:51:12 -05:00
  • dc9876eeb2 Delete all docs files except index.html redirect Larry Wu 2025-07-06 05:49:10 -07:00
  • 6c584d0e47 Redirect Github pages to NVIDIA Doc hub Larry Wu 2025-07-06 05:46:58 -07:00
  • a1aaf2300a v4.1 release Junkai-Wu 2025-07-03 20:07:53 +08:00
  • b995f93317 4.0 doc change (#2425) v4.0.0 Haicheng Wu 2025-06-27 21:35:06 +08:00
  • f2a17553d5 Added CODEOWNERS oss_ci Zekun Fan 2025-06-03 16:52:29 -07:00
  • 889ff20648 v4.0 update v2. (#2420) Junkai-Wu 2025-06-26 00:56:25 +08:00
  • dc4817921e v4.0 update. (#2398) Junkai-Wu 2025-06-12 21:10:29 +08:00
  • 5c6bca0441 Update requirements.txt (#2390) brandonsun 2025-06-10 14:31:49 +08:00
  • c2ad7c5b20 fix link in readme (#2379) drazi 2025-06-07 19:38:38 +08:00
  • cc23f6d1e9 fix link (#2377) drazi 2025-06-07 18:00:39 +08:00
  • 5a287538c2 "Update CHANGELOG for 4.0 tagging" (#2374) Vijay Thakkar 2025-06-06 10:07:36 -04:00
  • 58a5197b9d "Update CHANGELOG for 4.0 tagging" thakkarv/4.0-changelog Vijay Thakkar 2025-06-06 09:43:11 -04:00
  • 8bdbfca682 v4.0 update. (#2371) Junkai-Wu 2025-06-06 14:39:20 +08:00
  • 2e2af190bd Revert "[ex77] fix mla split; add fwd lse; add bwd varlen (#2366)" (#2370) Manish Gupta 2025-06-05 20:14:57 -07:00
  • f12b1d75c9 [ex77] fix mla split; add fwd lse; add bwd varlen (#2366) Markus Hoehnerbach 2025-06-05 15:39:46 -07:00
  • b244379d9b Merge pull request #2359 from NVIDIA/oss_ci zekunf-nv 2025-06-03 14:04:35 -07:00
  • 9d165a3b8e Handle get_masked_trip_count for small length in fmha example (#2292) Taebum Kim 2025-05-31 11:51:18 +09:00
  • b9b110a9ea Correct divmod order in example 77 (blackwell fmha) (#2291) Taebum Kim 2025-05-31 11:50:40 +09:00
  • 8206e7a0f5 Pre-compile in CuteDsl/ampere/elementwise_apply.py (#2340) Gabriel Wu 2025-05-28 22:24:39 +08:00
  • 6316b6f867 Fix typos (#2311) co63oc 2025-05-23 20:30:10 +08:00
  • 9354bfd7c1 Keep the documentation consistent with the sgemm_1.cu code. (#2285) zkyue 2025-05-20 10:53:15 +08:00
  • 5e9b8e2a25 fix docx (#2290) 1096125073 2025-05-20 10:52:37 +08:00
  • 1ec230c4bf Fix typo (#2299) Ruyman 2025-05-15 14:38:42 +01:00
  • f89cd95b16 Update elementwise_add.ipynb (#2298) Driss Guessous 2025-05-15 06:38:27 -07:00
  • f115c3f854 Release v4.0.0 (#2294) Kihiro Bando 2025-05-13 15:55:29 -04:00
  • ad7b2f5e84 3.9.2 doc/version (#2279) v3.9.2 Haicheng Wu 2025-05-04 00:00:15 -04:00