Added examples to enable the unity build (#102)
* Updated documentation of fused GEMM example and removed UNITY BUILD batch size. The default batch size when unity build is enabled tends to be favorable.
This commit is contained in:
@ -15,10 +15,12 @@ $ make cutlass_profiler -j
|
||||
To limit compilation time, only one tile size (128x128) is instantiated for each data type, math instruction, and layout.
|
||||
To instantiate all sizes, set the following environment variable when running CMake from an empty `build/` directory.
|
||||
```bash
|
||||
$ cmake .. -DCUTLASS_NVCC_ARCHS="70;75;80" -DCUTLASS_LIBRARY_KERNELS=all
|
||||
$ cmake .. -DCUTLASS_NVCC_ARCHS="70;75;80" -DCUTLASS_LIBRARY_KERNELS=all -DCUTLASS_UNITY_BUILD_ENABLED=ON
|
||||
...
|
||||
$ make cutlass_profiler -j
|
||||
```
|
||||
Enabling the unity build places multiple kernel instances in one compilation unit, thereby reducing size of the compiled
|
||||
binary and avoiding linker limitations on some platforms.
|
||||
|
||||
The CUTLASS Profiler sources are stored in
|
||||
```bash
|
||||
|
||||
@ -403,7 +403,7 @@ $ cmake .. -DCUTLASS_NVCC_ARCHS=75 -DCUTLASS_LIBRARY_KERNELS=sgemm
|
||||
Compling only the kernels desired reduces compilation time.
|
||||
|
||||
To instantiate kernels of all tile sizes, data types, and alignment constraints, specify
|
||||
`-DCUTLASS_LIBRARY_KERNELS=all` when running `cmake`.
|
||||
`-DCUTLASS_LIBRARY_KERNELS=all` when running `cmake`.
|
||||
|
||||
Several recipes are defined below for convenience. They may be combined as a comma-delimited list.
|
||||
|
||||
@ -412,9 +412,12 @@ Several recipes are defined below for convenience. They may be combined as a com
|
||||
$ cmake .. -DCUTLASS_NVCC_ARCHS=80 -DCUTLASS_LIBRARY_KERNELS=tensorop*gemm
|
||||
```
|
||||
|
||||
**Example.** All kernels for NVIDIA Volta, Turing, and Ampere architectures.
|
||||
**Example.** All kernels for NVIDIA Volta, Turing, and Ampere architectures. Enabling
|
||||
the "unity build" instantiates multiple kernel instances in each compilation unit, thereby
|
||||
reducing binary size and avoiding linker limitations on some platforms.
|
||||
```bash
|
||||
$ cmake .. -DCUTLASS_NVCC_ARCHS="70;75;80" -DCUTLASS_LIBRARY_KERNELS=all
|
||||
$ cmake .. -DCUTLASS_NVCC_ARCHS="70;75;80" -DCUTLASS_LIBRARY_KERNELS=all \
|
||||
-DCUTLASS_UNITY_BUILD_ENABLED=ON
|
||||
```
|
||||
|
||||
**Example.** All GEMM kernels targeting Turing Tensor Cores.
|
||||
|
||||
Reference in New Issue
Block a user