|
|
84a27b3926
|
fix: examples/cute/tutorial/blackwell/04_mma_tma_2sm_sm100.cu GridDim miscalculated (#2492)
* fix: examples/cute/tutorial/blackwell/04_mma_tma_2sm_sm100.cu Launch dimGrid error
* feat: add cta tiler
* Update examples/cute/tutorial/blackwell/04_mma_tma_2sm_sm100.cu
use cluster_layout_vmnk instead of cta_tiler
Co-authored-by: Junkai-Wu <junkaiw@nvidia.com>
* feat: remove cta_tiler
---------
Co-authored-by: qinghongzeng <qinghongzeng@deeproute.ai>
Co-authored-by: Junkai-Wu <junkaiw@nvidia.com>
|
2025-07-30 22:11:04 -04:00 |
|
|
|
fd6cfe1ed0
|
v4.1 release update v2. (#2481)
|
2025-07-21 22:03:55 -04:00 |
|
|
|
331a1f5b3f
|
cutlass 3.9 update (#2255)
* cutlass 3.9 update
* rebase
* fixes out of shared memory for blockwise Blackwell
* doc format
* fix issue 2253
* disable host ref by default
* fix sm120 smem capacity
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
|
2025-04-24 15:42:40 -04:00 |
|
|
|
62750a2b75
|
v3.9 (#2185)
* v3.8 update x
* fix blackwell gg
* doc change
* doc change
* doc change
---------
Co-authored-by: yuzhai <yuzhai@nvidia.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
|
2025-03-21 01:52:23 -04:00 |
|