CUTLASS 2.6.1 - functional and performance enhancements to strided DGRAD, fixes, and tuning

* cutlass 2.6 update

* remove debug prints

* cutlass 2.6.1 (minor update)

* Updated CHANGELOG.

* Minor edit to readme to indicate patch version.

* Minor edit to readme.

Co-authored-by:  Haicheng Wu <haichengw@nvidia.com>, Andrew Kerr <akerr@nvidia.com>
This commit is contained in:
Manish Gupta
2021-09-03 10:26:15 -07:00
committed by GitHub
parent a01feb93d9
commit 6c2f8f2fb8
55 changed files with 317 additions and 315 deletions

View File

@ -103,7 +103,6 @@ Profiling:
--profiling-enabled=<bool> If true, profiling is actually conducted.
Verification:
--verification-enabled=<bool> Whether to perform verification checks.

View File

@ -206,9 +206,12 @@ $ cmake .. -DCUTLASS_NVCC_ARCHS="50;53" # compiles for NVIDIA Maxwell G
## Clang
For experimental purposes, CUTLASS may be compiled with
[clang 8.0](https://github.com/llvm/llvm-project/releases/download/llvmorg-8.0.1/clang+llvm-8.0.1-amd64-unknown-freebsd11.tar.xz) using the
For experimental purposes, CUTLASS has been verified to compile with the following versions of Clang and CUDA.
* [clang 8.0](https://github.com/llvm/llvm-project/releases/download/llvmorg-8.0.1/clang+llvm-8.0.1-amd64-unknown-freebsd11.tar.xz) using the
[CUDA 10.0 Toolkit](https://developer.nvidia.com/cuda-10.0-download-archive).
* [clang release/13.x](https://github.com/llvm/llvm-project/tree/release/13.x) using [CUDA 11.4](https://developer.nvidia.com/cuda-toolkit-archive)
At this time, compiling with clang enables the CUTLASS SIMT GEMM kernels (sgemm, dgemm, hgemm, igemm)
but does not enable TensorCores.
@ -216,6 +219,8 @@ but does not enable TensorCores.
$ mkdir build && cd build
$ cmake -DCUDA_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ..
# Add -DCMAKE_CXX_FLAGS=-D__NV_NO_HOST_COMPILER_CHECK=1 -DCMAKE_CUDA_FLAGS=-D__NV_NO_HOST_COMPILER_CHECK=1 if compiler
# checks fail during CMake configuration.
$ make test_unit -j
```