CUTLASS 3.2.1 (#1113)

* Updates for 3.2.1 release.

* Minor fix in gemm op profiler for raster order.

* Add scheduler mapping for raster order in the kernels.
This commit is contained in:
ANIKET SHIVAM
2023-09-26 14:24:26 -07:00
committed by GitHub
parent e0aaa3c3b3
commit 90d3b0fb18
428 changed files with 22253 additions and 21762 deletions

View File

@ -0,0 +1,93 @@
[README](../README.md#documentation) > **CUTLASS 3.0: Building on Windows with Visual Studio**
# Building on Windows with Visual Studio
CUTLASS 3.2 reintroduces support for the Microsoft Visual Studio compiler on Windows.
Users and developers may build either
in Visual Studio's graphical integrated development environment,
or on the command line with `cmake --build`.
# Software prerequisites
1. Windows 10 or 11
2. Visual Studio 2019 version 16.11.27, or Visual Studio 2022
3. CUDA Toolkit (at least 12.2; earlier 12.x versions may work)
4. CMake (at least 3.18)
5. git
6. Python (at least 3.6)
Visual Studio must be installed *before* the CUDA Toolkit.
Otherwise, Visual Studio's build system won't know about CUDA.
# Operating system settings
By default, Windows restricts the maximum file path length (`MAX_PATH`) to 260 characters.
CUTLASS has many files and directory paths that challenge this requirement.
As a result, CUTLASS is unlikely to build with this default setting.
The choice of source and build directories affect path lengths,
so the kinds of errors and whether they occur may depend on this.
Symptoms may vary, from errors when running `cmake`
(e.g., during the "generating library instances" step) to build failures.
CUTLASS recommends changing the maximum file path length setting
and rebooting the computer before attempting to clone or build CUTLASS.
Windows 10 (as of version 1607) and 11 permit changing this setting
by making sure that the following registry key exists,
and that its value is set to 1.
```
Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem\LongPathsEnabled
```
After changing the registry key's value, reboot the computer first
before attempting to clone or build CUTLASS.
[This Microsoft help article](https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=registry)
explains different ways to change the registry setting.
# Limitations
Currently, it's possible to build examples and tests.
Building the CUTLASS library (e.g., for profiling) with default settings does not currently work,
because Visual Studio's linker cannot handle more than 65535 symbols in a library.
(The symptom of this issue is a LNK1189 linker error.)
The known way to work around this Visual Studio limitation is to disable building CUTLASS's library,
by setting the CMake option `CUTLASS_ENABLE_LIBRARY` to `OFF`.
Another approach may be to limit the number of kernels in the library
by setting the CMake option `CUTLASS_LIBRARY_KERNELS`
so that CUTLASS tries to put fewer kernels in the library.
# Set up build environment
1. Run "git bash" to get a familiar command-line interface
2. Edit `~/.profile` and set the environment variables as needed to access the CUTLASS repository
3. Clone the CUTLASS repository
4. Create the `build` subdirectory in the CUTLASS clone directory, and run CMake in it,
specifying whatever CMake options are desired, e.g.,
`cmake .. -DCUTLASS_NVCC_ARCHS=90a -DCUTLASS_ENABLE_LIBRARY=OFF`
Alternate approaches may rely on the CMake GUI and/or Windows' native command line.
# Building
A successful CMake run will create a `CUTLASS.sln` Visual Studio "solution" file in the build directory.
One can open this in Visual Studio and build the entire solution or any subset of projects as desired.
It may be necessary to limit maximum build parallelism by setting the appropriate Visual Studio option.
Alternately, one can run `cmake --build . --config Release -j 4` in the build directory.
Replace 4 with the desired maximum build parallelism.
It's important to put the `--build` option before the period that signifies the build directory.
The `--config` option specifies the kind of build;
`--config Release` builds a Release build, while `--config Debug` builds a Debug build.
Unlike with CMake's Makefile or Ninja generators,
`CMAKE_BUILD_TYPE` has no effect on the Visual Studio generator,
because the Visual Studio generator creates all build configurations.

View File

@ -0,0 +1,53 @@
[README](../README.md#documentation) > **CUTLASS 3: Building with Clang as host compiler**
# Building with Clang as host compiler
CUTLASS 3.2(.1) reintroduces support for building with
Clang as host compiler, and NVCC as device compiler.
This is NOT the same as building with
Clang as both host and device compiler ("CUDA Clang").
# Software prerequisites
1. Clang (tested with Clang 14)
2. CUDA Toolkit (tested with 12.2; other versions likely work)
3. CMake (at least 3.18)
4. git
5. Python (at least 3.6)
Experience with Ubuntu 22.04 LTS is that
clang requires the following packages to be installed.
```bash
$ sudo apt-get install clang cmake ninja-build pkg-config libgtk-3-dev liblzma-dev libstdc++-12-dev
```
A symptom of not installing all needed dependencies
is the following error when attempting to use clang:
`"/usr/bin/ld: cannot find -lstdc++: No such file or directory"`.
# Running CMake
The Clang build requires specifying the following three CMake options.
* `CMAKE_CXX_COMPILER=clang++`
* `CMAKE_CUDA_HOST_COMPILER=clang++`
* `CMAKE_C_COMPILER=clang`
This assumes that `clang++` and `clang` are in the user's `PATH`.
Please note that both `CMAKE_CXX_COMPILER` and `CMAKE_C_COMPILER`
must be set, even though CUTLASS is a C++ project, not a C project.
Users can also specify a particular CUDA Toolkit version
by setting the CMake option `CMAKE_CUDA_COMPILER`
to the path to the `nvcc` executable
that lives in the CUDA Toolkit's directory. For example,
if `${PATH_TO_CUDA_TOOLKIT}` is the CUDA Toolkit directory,
then one can set `CMAKE_CUDA_COMPILER` as follows.
* `CMAKE_CUDA_COMPILER=${PATH_TO_CUDA_TOOLKIT}/bin/nvcc`

View File

@ -109,14 +109,15 @@ tools/
library.h # defines enums and structs to describe the tiled structure of operator instances
manifest.h # collection of all instances
scripts/ # scripts to procedurally generate CUTLASS template instances
src/
python/
cutlass_library/ # scripts to procedurally generate CUTLASS template instances
gemm_operations.py
library.py
generator.py # entry point of procedural generation scripts - invoked by cmake
generator.py # entry point of procedural generation scripts - invoked by cmake
manifest.py
src/
```
When CMake is executed, the CUTLASS Instance Library generator scripts are executed to construct a set of

View File

@ -242,6 +242,8 @@ Test your changes to gemm kernels with a quick functional test and save results
--providers=cutlass --output=functional-test.csv
```
The format of tensor argument is followed by `<type>:<layout>`. The type could be `f32` as 32-bit floating point, `s8` as 8-bit signed integer, etc. The available types can be referred to the `NumericTypeID_enumerants` in [util.cu](tools/library/src/util.cu). The layout could be `row` or `column`.
## Example CUDA Core GEMM Operation
Example command line for profiling SGEMM kernels is as follows: