CUTLASS 3.2.1 (#1113)
* Updates for 3.2.1 release. * Minor fix in gemm op profiler for raster order. * Add scheduler mapping for raster order in the kernels.
This commit is contained in:
@ -10,8 +10,6 @@
|
||||
"This notebook walks through a basic example of using the CUTLASS Python interface to declare\n",
|
||||
"a grouped GEMM kernel and export it as a PyTorch CUDA extension. Note that GEMM and Conv2d can also be exported as PyTorch CUDA extensions. \n",
|
||||
"\n",
|
||||
"[](https://colab.research.google.com/github/NVIDIA/cutlass/tree/master/examples/00_basic_gemm.ipynb)\n",
|
||||
"\n",
|
||||
"## Background on grouped GEMM\n",
|
||||
"Grouped GEMM enables one to execute a set of GEMMs (each with potentially different sizes and strides)\n",
|
||||
"in a single CUDA kernel. It can be thought of as a generalized version of a pointer-array GEMM,\n",
|
||||
|
||||
Reference in New Issue
Block a user