CUTLASS 3.2.1 (#1113)

* Updates for 3.2.1 release. * Minor fix in gemm op profiler for raster order. * Add scheduler mapping for raster order in the kernels.
2023-09-26 14:24:26 -07:00
parent e0aaa3c3b3
commit 90d3b0fb18
428 changed files with 22253 additions and 21762 deletions
--- a/media/docs/build/building_in_windows_with_visual_studio.md
+++ b/media/docs/build/building_in_windows_with_visual_studio.md
@ -0,0 +1,93 @@
+[README](../README.md#documentation) > **CUTLASS 3.0: Building on Windows with Visual Studio**
+
+# Building on Windows with Visual Studio
+
+CUTLASS 3.2 reintroduces support for the Microsoft Visual Studio compiler on Windows.
+Users and developers may build either
+in Visual Studio's graphical integrated development environment,
+or on the command line with `cmake --build`.
+
+# Software prerequisites
+
+1. Windows 10 or 11
+
+2. Visual Studio 2019 version 16.11.27, or Visual Studio 2022
+
+3. CUDA Toolkit (at least 12.2; earlier 12.x versions may work)
+
+4. CMake (at least 3.18)
+
+5. git
+
+6. Python (at least 3.6)
+
+Visual Studio must be installed *before* the CUDA Toolkit.
+Otherwise, Visual Studio's build system won't know about CUDA.
+
+# Operating system settings
+
+By default, Windows restricts the maximum file path length (`MAX_PATH`) to 260 characters.
+CUTLASS has many files and directory paths that challenge this requirement.
+As a result, CUTLASS is unlikely to build with this default setting.
+The choice of source and build directories affect path lengths,
+so the kinds of errors and whether they occur may depend on this.
+Symptoms may vary, from errors when running `cmake`
+(e.g., during the "generating library instances" step) to build failures.
+
+CUTLASS recommends changing the maximum file path length setting
+and rebooting the computer before attempting to clone or build CUTLASS.
+Windows 10 (as of version 1607) and 11 permit changing this setting
+by making sure that the following registry key exists,
+and that its value is set to 1.
+
+```
+Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem\LongPathsEnabled
+```
+
+After changing the registry key's value, reboot the computer first
+before attempting to clone or build CUTLASS.
+
+[This Microsoft help article](https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry)
+explains different ways to change the registry setting.
+
+# Limitations
+
+Currently, it's possible to build examples and tests.
+Building the CUTLASS library (e.g., for profiling) with default settings does not currently work,
+because Visual Studio's linker cannot handle more than 65535 symbols in a library.
+(The symptom of this issue is a LNK1189 linker error.)
+The known way to work around this Visual Studio limitation is to disable building CUTLASS's library,
+by setting the CMake option `CUTLASS_ENABLE_LIBRARY` to `OFF`.
+Another approach may be to limit the number of kernels in the library
+by setting the CMake option `CUTLASS_LIBRARY_KERNELS`
+so that CUTLASS tries to put fewer kernels in the library.
+
+# Set up build environment
+
+1. Run "git bash" to get a familiar command-line interface
+
+2. Edit `~/.profile` and set the environment variables as needed to access the CUTLASS repository
+
+3. Clone the CUTLASS repository
+
+4. Create the `build` subdirectory in the CUTLASS clone directory, and run CMake in it,
+    specifying whatever CMake options are desired, e.g.,
+    `cmake .. -DCUTLASS_NVCC_ARCHS=90a -DCUTLASS_ENABLE_LIBRARY=OFF`
+
+Alternate approaches may rely on the CMake GUI and/or Windows' native command line.
+
+# Building
+
+A successful CMake run will create a `CUTLASS.sln` Visual Studio "solution" file in the build directory.
+One can open this in Visual Studio and build the entire solution or any subset of projects as desired.
+It may be necessary to limit maximum build parallelism by setting the appropriate Visual Studio option.
+
+Alternately, one can run `cmake --build . --config Release -j 4` in the build directory.
+Replace 4 with the desired maximum build parallelism.
+It's important to put the `--build` option before the period that signifies the build directory.
+The `--config` option specifies the kind of build;
+`--config Release` builds a Release build, while `--config Debug` builds a Debug build.
+Unlike with CMake's Makefile or Ninja generators,
+`CMAKE_BUILD_TYPE` has no effect on the Visual Studio generator,
+because the Visual Studio generator creates all build configurations.
+
--- a/media/docs/build/building_with_clang_as_host_compiler.md
+++ b/media/docs/build/building_with_clang_as_host_compiler.md
@ -0,0 +1,53 @@
+[README](../README.md#documentation) > **CUTLASS 3: Building with Clang as host compiler**
+
+# Building with Clang as host compiler
+
+CUTLASS 3.2(.1) reintroduces support for building with
+Clang as host compiler, and NVCC as device compiler.
+This is NOT the same as building with
+Clang as both host and device compiler ("CUDA Clang").
+
+# Software prerequisites
+
+1. Clang (tested with Clang 14)
+
+2. CUDA Toolkit (tested with 12.2; other versions likely work)
+
+3. CMake (at least 3.18)
+
+4. git
+
+5. Python (at least 3.6)
+
+Experience with Ubuntu 22.04 LTS is that
+clang requires the following packages to be installed.
+
+```bash
+$ sudo apt-get install clang cmake ninja-build pkg-config libgtk-3-dev liblzma-dev libstdc++-12-dev
+```
+
+A symptom of not installing all needed dependencies
+is the following error when attempting to use clang:
+`"/usr/bin/ld: cannot find -lstdc++: No such file or directory"`.
+
+# Running CMake
+
+The Clang build requires specifying the following three CMake options.
+
+* `CMAKE_CXX_COMPILER=clang++`
+* `CMAKE_CUDA_HOST_COMPILER=clang++`
+
+* `CMAKE_C_COMPILER=clang`
+
+This assumes that `clang++` and `clang` are in the user's `PATH`.
+Please note that both `CMAKE_CXX_COMPILER` and `CMAKE_C_COMPILER`
+must be set, even though CUTLASS is a C++ project, not a C project.
+
+Users can also specify a particular CUDA Toolkit version
+by setting the CMake option `CMAKE_CUDA_COMPILER`
+to the path to the `nvcc` executable
+that lives in the CUDA Toolkit's directory.  For example,
+if `${PATH_TO_CUDA_TOOLKIT}` is the CUDA Toolkit directory,
+then one can set `CMAKE_CUDA_COMPILER` as follows.
+
+* `CMAKE_CUDA_COMPILER=${PATH_TO_CUDA_TOOLKIT}/bin/nvcc`
--- a/media/docs/code_organization.md
+++ b/media/docs/code_organization.md
@ -109,14 +109,15 @@ tools/
          library.h          # defines enums and structs to describe the tiled structure of operator instances          
          manifest.h         # collection of all instances

-    scripts/                 # scripts to procedurally generate CUTLASS template instances
+    src/
+
+python/
+    cutlass_library/       # scripts to procedurally generate CUTLASS template instances

      gemm_operations.py
      library.py
-      generator.py           # entry point of procedural generation scripts - invoked by cmake
+      generator.py            # entry point of procedural generation scripts - invoked by cmake
      manifest.py
-
-    src/
 ```

 When CMake is executed, the CUTLASS Instance Library generator scripts are executed to construct a set of
--- a/media/docs/profiler.md
+++ b/media/docs/profiler.md
@ -242,6 +242,8 @@ Test your changes to gemm kernels with a quick functional test and save results
   --providers=cutlass --output=functional-test.csv
 ```

+The format of tensor argument is followed by `<type>:<layout>`. The type could be `f32` as 32-bit floating point, `s8` as 8-bit signed integer, etc. The available types can be referred to the `NumericTypeID_enumerants` in [util.cu](tools/library/src/util.cu). The layout could be `row` or `column`.
+
 ## Example CUDA Core GEMM Operation

 Example command line for profiling SGEMM kernels is as follows: