CUTLASS 3.2.1 (#1113)

* Updates for 3.2.1 release. * Minor fix in gemm op profiler for raster order. * Add scheduler mapping for raster order in the kernels.
2023-09-26 14:24:26 -07:00
parent e0aaa3c3b3
commit 90d3b0fb18
428 changed files with 22253 additions and 21762 deletions
--- a/examples/python/02_pytorch_extension_grouped_gemm.ipynb
+++ b/examples/python/02_pytorch_extension_grouped_gemm.ipynb
@ -10,8 +10,6 @@
    "This notebook walks through a basic example of using the CUTLASS Python interface to declare\n",
    "a grouped GEMM kernel and export it as a PyTorch CUDA extension. Note that GEMM and Conv2d can also be exported as PyTorch CUDA extensions. \n",
    "\n",
-    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/NVIDIA/cutlass/tree/master/examples/00_basic_gemm.ipynb)\n",
-    "\n",
    "## Background on grouped GEMM\n",
    "Grouped GEMM enables one to execute a set of GEMMs (each with potentially different sizes and strides)\n",
    "in a single CUDA kernel. It can be thought of as a generalized version of a pointer-array GEMM,\n",