CUTLASS 2.6.1 - functional and performance enhancements to strided DGRAD, fixes, and tuning

* cutlass 2.6 update * remove debug prints * cutlass 2.6.1 (minor update) * Updated CHANGELOG. * Minor edit to readme to indicate patch version. * Minor edit to readme. Co-authored-by: Haicheng Wu <haichengw@nvidia.com>, Andrew Kerr <akerr@nvidia.com>
2021-09-03 10:26:15 -07:00
parent a01feb93d9
commit 6c2f8f2fb8
55 changed files with 317 additions and 315 deletions
--- a/media/docs/profiler.md
+++ b/media/docs/profiler.md
@ -103,7 +103,6 @@ Profiling:

  --profiling-enabled=<bool>                       If true, profiling is actually conducted.

-
 Verification:
  --verification-enabled=<bool>                    Whether to perform verification checks.

--- a/media/docs/quickstart.md
+++ b/media/docs/quickstart.md
@ -206,9 +206,12 @@ $ cmake .. -DCUTLASS_NVCC_ARCHS="50;53"          # compiles for NVIDIA Maxwell G

 ## Clang

-For experimental purposes, CUTLASS may be compiled with 
-[clang 8.0](https://github.com/llvm/llvm-project/releases/download/llvmorg-8.0.1/clang+llvm-8.0.1-amd64-unknown-freebsd11.tar.xz) using the 
+For experimental purposes, CUTLASS has been verified to compile with the following versions of Clang and CUDA.
+
+* [clang 8.0](https://github.com/llvm/llvm-project/releases/download/llvmorg-8.0.1/clang+llvm-8.0.1-amd64-unknown-freebsd11.tar.xz) using the 
 [CUDA 10.0 Toolkit](https://developer.nvidia.com/cuda-10.0-download-archive).
+* [clang release/13.x](https://github.com/llvm/llvm-project/tree/release/13.x) using [CUDA 11.4](https://developer.nvidia.com/cuda-toolkit-archive)
+
 At this time, compiling with clang enables the CUTLASS SIMT GEMM kernels (sgemm, dgemm, hgemm, igemm)
 but does not enable TensorCores.

@ -216,6 +219,8 @@ but does not enable TensorCores.
 $ mkdir build && cd build

 $ cmake -DCUDA_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ..
+# Add -DCMAKE_CXX_FLAGS=-D__NV_NO_HOST_COMPILER_CHECK=1 -DCMAKE_CUDA_FLAGS=-D__NV_NO_HOST_COMPILER_CHECK=1 if compiler
+# checks fail during CMake configuration.

 $ make test_unit -j
 ```