Files
cutlass/python/cutlass_library
Ali Hassani eee0cab26c Stamp out 1x1x1 clusters, 128x256 CTA shape (#1665)
Adds 128x256 tile shapes to FP16/BF16 and FP8 generators.
Also adds 1x1x1 clusters to all existing FP16/BF16/FP8 generators.

NOTE: it is important to set kernel filter (--kernels /
CUTLASS_LIBRARY_KERNELS) to a non empty string and skip pruning to get
all of the new configurations.

If profiling exhaustively, they can be set to `*`.

Number of CUTLASS 3.X GEMMs before this commit: 2868
Number of CUTLASS 3.X GEMMs after this commit: 4016

Co-authored-by: Ali Hassani <ahassani@nvidia.com>
2024-07-31 20:22:29 -04:00
..
2024-01-16 14:37:22 -05:00
2024-07-29 08:46:24 -04:00
2024-03-19 17:51:04 -04:00
2024-07-29 08:46:24 -04:00
2024-07-29 08:46:24 -04:00
2024-04-11 21:33:40 -04:00
2024-07-29 08:46:24 -04:00
2024-01-16 14:37:22 -05:00
2024-01-16 14:37:22 -05:00