Added examples to enable the unity build (#102)

* Updated documentation of fused GEMM example and removed UNITY BUILD batch size. The default batch size when unity build is enabled tends to be favorable.
This commit is contained in:
Andrew Kerr
2020-06-17 07:09:18 -07:00
committed by GitHub
parent 1ab1027954
commit fd7e058d0c
3 changed files with 34 additions and 5 deletions

View File

@ -15,10 +15,12 @@ $ make cutlass_profiler -j
To limit compilation time, only one tile size (128x128) is instantiated for each data type, math instruction, and layout.
To instantiate all sizes, set the following environment variable when running CMake from an empty `build/` directory.
```bash
$ cmake .. -DCUTLASS_NVCC_ARCHS="70;75;80" -DCUTLASS_LIBRARY_KERNELS=all
$ cmake .. -DCUTLASS_NVCC_ARCHS="70;75;80" -DCUTLASS_LIBRARY_KERNELS=all -DCUTLASS_UNITY_BUILD_ENABLED=ON
...
$ make cutlass_profiler -j
```
Enabling the unity build places multiple kernel instances in one compilation unit, thereby reducing size of the compiled
binary and avoiding linker limitations on some platforms.
The CUTLASS Profiler sources are stored in
```bash

View File

@ -403,7 +403,7 @@ $ cmake .. -DCUTLASS_NVCC_ARCHS=75 -DCUTLASS_LIBRARY_KERNELS=sgemm
Compling only the kernels desired reduces compilation time.
To instantiate kernels of all tile sizes, data types, and alignment constraints, specify
`-DCUTLASS_LIBRARY_KERNELS=all` when running `cmake`.
`-DCUTLASS_LIBRARY_KERNELS=all` when running `cmake`.
Several recipes are defined below for convenience. They may be combined as a comma-delimited list.
@ -412,9 +412,12 @@ Several recipes are defined below for convenience. They may be combined as a com
$ cmake .. -DCUTLASS_NVCC_ARCHS=80 -DCUTLASS_LIBRARY_KERNELS=tensorop*gemm
```
**Example.** All kernels for NVIDIA Volta, Turing, and Ampere architectures.
**Example.** All kernels for NVIDIA Volta, Turing, and Ampere architectures. Enabling
the "unity build" instantiates multiple kernel instances in each compilation unit, thereby
reducing binary size and avoiding linker limitations on some platforms.
```bash
$ cmake .. -DCUTLASS_NVCC_ARCHS="70;75;80" -DCUTLASS_LIBRARY_KERNELS=all
$ cmake .. -DCUTLASS_NVCC_ARCHS="70;75;80" -DCUTLASS_LIBRARY_KERNELS=all \
-DCUTLASS_UNITY_BUILD_ENABLED=ON
```
**Example.** All GEMM kernels targeting Turing Tensor Cores.