CUTLASS 3.2.1 (#1113)

* Updates for 3.2.1 release.

* Minor fix in gemm op profiler for raster order.

* Add scheduler mapping for raster order in the kernels.
This commit is contained in:
ANIKET SHIVAM
2023-09-26 14:24:26 -07:00
committed by GitHub
parent e0aaa3c3b3
commit 90d3b0fb18
428 changed files with 22253 additions and 21762 deletions

View File

@ -10,8 +10,6 @@
"This notebook walks through a basic example of using the CUTLASS Python interface to declare\n",
"a grouped GEMM kernel and export it as a PyTorch CUDA extension. Note that GEMM and Conv2d can also be exported as PyTorch CUDA extensions. \n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NVIDIA/cutlass/tree/master/examples/00_basic_gemm.ipynb)\n",
"\n",
"## Background on grouped GEMM\n",
"Grouped GEMM enables one to execute a set of GEMMs (each with potentially different sizes and strides)\n",
"in a single CUDA kernel. It can be thought of as a generalized version of a pointer-array GEMM,\n",