Compare commits

...

52 Commits

Author SHA1 Message Date
66d9cddc83 New updates for 2.11 (#775)
* New updates.

* Minor profiler updates

Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2023-01-20 16:32:57 -05:00
d49bef88f9 Enable aarch64 support (#779) 2023-01-20 15:51:58 -05:00
8b42e751c6 streamk paper link (#765)
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-01-10 22:10:43 -05:00
eb7f99d3dd @hwu36 Adding the individual arXiv link for Stream-K paper. (#764)
* Stream-K individual paper entry.

* arXiv links updated.
2023-01-10 20:39:06 -05:00
764b840d6f streamk example and performance tuning (#760)
* streamk example and performance tuning

* one missing file

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-01-10 16:10:02 -05:00
a1046d49c1 Adds missing semicolon (#759) 2023-01-09 21:50:46 -05:00
1cd994b4cf Update PUBLICATIONS.md
@neoblizz @dumerrill 

thesis covering streamk
2023-01-08 00:42:19 -05:00
7bdba07310 Add definitions for tag structs. (#752)
This commit changes the declarations of MMA operator class (SIMT, Tensor Core, WMMA Tensor Core) and operator type (multiply-add and so on) to definitions. This is done so that these tag structs are no longer incomplete types, which allows the `typeid` operator to be used on these tag structs. This is necessary for these tag structs to be used as type parameters in [GoogleTest typed tests](https://google.github.io/googletest/advanced.html#typed-tests).
2023-01-06 09:46:52 -05:00
c54ede3a9e Add const overloads for iterator functions. (#753)
This commit adds `const`-correct overloads for `Array::{begin,end,rbegin,rend}`. These overloads are necessary for usage with [the GMock Container Matchers](http://google.github.io/googletest/reference/matchers.html#container-matchers), which cast the `Container` argument to a constant reference.
2023-01-06 09:46:34 -05:00
ff6e733fe1 restore the old epilogue for everything except streamk (#749)
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2023-01-04 11:02:55 -05:00
5989b7e1d7 Update PUBLICATIONS.md
Add coconet paper to the publication list.  @abhijangda
2023-01-04 09:18:38 -05:00
1e64f153b3 improve streamk load balance (#743)
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-12-25 13:56:33 -05:00
78b30d3191 Update README.md 2022-12-21 11:58:19 -05:00
59de82688b Update README.md 2022-12-21 11:57:55 -05:00
b85865d1ad Add missing #include directives (#741)
This commit adds two `#include` directives so that the definitions of `cutlass::gemm::warp::WarpSize` from "cutlass/gemm/warp/mma.h" and `cutlass::arch::OpClassSimt` from "cutlass/arch/mma.h" are visible to "cutlass/epilogue/threadblock/default_epilogue_simt.h". Without them, there are compiler errors when building the header standalone:

```
In file included from cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.cu:1:
./cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h:351:32: error: no member named 'warp' in namespace 'cutlass::gemm'; did you mean simply 'warp'?
  static int const kWarpSize = cutlass::gemm::warp::WarpSize<arch::OpClassSimt>::value;
                               ^
./cutlass/include/cutlass/epilogue/warp/tile_iterator_simt.h:49:11: note: 'warp' declared here
namespace warp {
          ^
In file included from cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.cu:1:
./cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h:351:53: error: no member named 'WarpSize' in namespace 'cutlass::epilogue::warp'
  static int const kWarpSize = cutlass::gemm::warp::WarpSize<arch::OpClassSimt>::value;
                                              ~~~~~~^
./cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h:351:68: error: no member named 'OpClassSimt' in namespace 'cutlass::arch'
  static int const kWarpSize = cutlass::gemm::warp::WarpSize<arch::OpClassSimt>::value;
                                                             ~~~~~~^
./cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h:351:82: error: no member named 'value' in the global namespace
  static int const kWarpSize = cutlass::gemm::warp::WarpSize<arch::OpClassSimt>::value;
                                                                               ~~^
./cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h:367:5: error: use of class template 'OutputTileThreadMap' requires template arguments
    OutputTileThreadMap,
    ^
./cutlass/include/cutlass/epilogue/threadblock/output_tile_thread_map.h:134:8: note: template is declared here
struct OutputTileThreadMap : public OutputTileThreadMapHelpers<Iterations_, Delta_> {
       ^
In file included from cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.cu:1:
./cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h:391:5: error: use of class template 'OutputTileThreadMap' requires template arguments
    OutputTileThreadMap,
    ^
./cutlass/include/cutlass/epilogue/threadblock/output_tile_thread_map.h:134:8: note: template is declared here
struct OutputTileThreadMap : public OutputTileThreadMapHelpers<Iterations_, Delta_> {
       ^
In file included from cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.cu:1:
./cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h:405:5: error: unknown type name 'OutputTileIterator'; did you mean 'WarpTileIterator'?
    OutputTileIterator,
    ^
./cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h:380:9: note: 'WarpTileIterator' declared here
  using WarpTileIterator = cutlass::epilogue::warp::TileIteratorSimtDirect2dConv<
        ^
./cutlass/include/cutlass/epilogue/threadblock/default_epilogue_simt.h:408:5: error: use of class template 'SharedLoadIterator' requires template arguments
    SharedLoadIterator,
    ^
./cutlass/include/cutlass/epilogue/threadblock/shared_load_iterator.h:67:7: note: template is declared here
class SharedLoadIterator {
      ^
```
2022-12-21 11:40:20 -05:00
3f2bb17722 minor chagnes (#730)
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-12-10 14:44:53 -05:00
38193d76e3 Updates for stream-k (#728)
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2022-12-08 23:48:10 -05:00
1d7772f218 Add missing #include directive (#727) 2022-12-08 18:58:31 -05:00
df81d847d7 Make Python interface work for non-SM80 targets (#726)
* Make Python interface work for non-SM80 targets

* Remove line in README
2022-12-07 21:53:33 -05:00
d6117ca362 Relax stream K gemm alignment constraints (#717)
* Relax stream K gemm alignment constraints

The current alignment requirements are too strict. Make them identical
to the checks for the regular universal gemm.

* Revert "Relax stream K gemm alignment constraints"

This reverts commit 31e80a250e.

* Relax stream K gemm alignment constraints

The current alignment requirements are too strict. Make them identical
to the checks for the regular universal gemm.

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-12-07 11:17:49 -05:00
9c0518608e Fix typos in conv problem sizes (#720)
* Fix typos in conv problem sizes

* Typos
2022-12-05 15:54:58 -05:00
9f1f37aa21 misc (#719)
* misc

* minor

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-12-05 12:07:20 -05:00
84213b0b8e fix: make arch.h self contained (#714) 2022-12-01 19:25:48 -05:00
8567b87d65 Update quickstart.md (#704)
* Update quickstart.md

* Update doxygen_mainpage.md

* Update doxygen_mainpage.md

* Update terminology.md
2022-11-29 21:43:03 -05:00
c975e2ccbb releaase 2.11 (#703) 2022-11-19 09:02:15 -05:00
3c90f6aea6 add #pragma once for header file in example 42 (#698) 2022-11-15 22:50:24 -05:00
06eb90cc0d Fix identity sigmoid activation (#659)
* activation support Identity

* fix Sigmoid activation operator() with CUTLASS_HOST_DEVICE
2022-11-09 14:42:23 -05:00
168ea8b0e1 ensure singleton::get thread safe construct instance (#658)
* ensure singleton::get thread safe construct instance

* fix singleton return reference

Co-authored-by: xuweiqi <xuweiqi117@gmail.com>
2022-11-08 21:44:32 -05:00
012c62c748 bug fixes and enharcement to gemm reductionK fusion (#682)
* add two missing files

* fix bunch of bugs of gemm-reducek fusion and add a device interface

* small changes

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-11-03 11:07:50 -04:00
FZC
cc85b64cf6 fix typo (#677) 2022-11-01 14:07:33 -04:00
1b4e24470a Example 43 - DualGemm (#670)
* Ex50 wip

* IS_PROFILING mode

* MultiStage2 - but is slower

* Add SwiGLU

* Support SplitKSerial reduction
Support not storing D0/D1
Cleanup code

* Option to disable bias

* Renumber example

* Fix build

* Remove references to pb_size_0 / pb_size_1

* Add support for bf16 inputs with float accum

* small changes

Co-authored-by: danthe3rd <danthe3rd>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-10-26 14:04:42 -04:00
8c1bf9b784 Bump CUTLASS Python container version (#672)
* Update example 40 README

* Update CUTLASS Python README
2022-10-22 21:09:39 -04:00
7d0dd6706e Remove excessive includes from examples/41_multi_head_attention (#669)
The rationale behind this change is explained in #563
2022-10-21 22:23:15 -04:00
9b47403b2d Add missing CUTLASS_HOST_DEVICE (#671) 2022-10-21 22:20:38 -04:00
4db6a6140e ex42: Fused MHA imported from xFormers (#662)
* ex42: Fused MHA imported from xFormers

* Remove std:: references

* Support K>128 in the example

* Support causal option

* Support different head size for V, and different seqlength for KV

* Update FLOPS counter

* Remove bit_cast

* fix build: Replace M_LOG2E

* Add doc

* Revert "Remove bit_cast"

This reverts commit 9662fa86bb.

* Explicit casts to int32_t for windows build

Co-authored-by: danthe3rd <danthe3rd>
2022-10-17 10:49:33 -04:00
3bf95e90c2 Update labeler.yml 2022-10-13 08:03:28 -04:00
75fed7493e Update labeler.yml 2022-10-13 08:01:21 -04:00
98b73fc95d Update labeler.yml 2022-10-13 07:55:33 -04:00
4990e3686d Update labeler.yml 2022-10-13 07:52:38 -04:00
4b7365388c Update labeler.yml 2022-10-13 07:32:55 -04:00
0d8405588d Update labeler.yml 2022-10-12 15:32:38 -04:00
cb539dab78 Correct typos in comments (#639)
* Correct typos in comments

Correct comments in code on type of generated distribution. Improve Gaussian RNG to take advantage of Box Muller method

* Inline Box Muller

Added inline function for the Box Muller algorithm and updated code comments to be more concise

* Update tensor_fill.h

* Update tensor_fill.h

* small changes to pass tests

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-09-30 22:51:30 -04:00
dadc881a96 Bug fix for gemm broadcast (#650)
* gemm_universal_with_broadcast, +2 sources.

* Revert "gemm_universal_with_broadcast, +2 sources."

This reverts commit fb063251f2.

* gemm broadcast bug fix
2022-09-30 10:00:38 -04:00
f3eea3a4d7 Create labeler.yml 2022-09-29 15:08:44 -04:00
cd37e82492 change unused class member to local var (#646) 2022-09-28 23:52:35 -04:00
48a9ea223a Fix release version in the citation (#638) 2022-09-22 10:58:45 -04:00
7a458f00a6 fix(permute.h): incorrect comment in Tensor5DPermute20314 (#637)
* fix(permute.h): incorrect comment in `Tensor5DPermute20314`

* typo in usage in example 39
2022-09-22 09:21:13 -04:00
97bff52e8c add two missing files (#636)
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-09-21 15:42:42 -04:00
9f2e3faa69 fix call of GELU_Taylor in LinearCombinationGeneric (#634) 2022-09-20 21:00:55 -04:00
a821280dc7 Gemm broadcast (#632)
* gemm_universal_with_broadcast, +2 sources.

* Revert "gemm_universal_with_broadcast, +2 sources."

This reverts commit fb063251f2.

* gemm_universal_with_broadcast separated version.

* Update copyright banner.

* update banner
2022-09-20 10:37:12 -04:00
f73374a1eb fix:comment typo in example 23 (#633) 2022-09-19 09:54:14 -04:00
faab7536fc add comment (#628) 2022-09-17 21:40:30 -04:00
1376 changed files with 67939 additions and 8967 deletions

View File

@ -1,5 +1,27 @@
# NVIDIA CUTLASS Changelog
## [2.11.0](https://github.com/NVIDIA/cutlass/releases/tag/v2.11.0) (2022-11-19)
* [Stream-K](/examples/47_ampere_gemm_universal_streamk), which is a new general way to do split-K. It can not only improve performance, but can also significantly reduce the number of tile sizes that need to be profiled to find the best one.
* [Fused multi-head attention Kernel](/examples/41_fused_multi_head_attention). It has two variants: one uses batched GEMM for the fixed sequence length, and the other one uses group GEMM for the variable sequence length. Both versions just need one kernel.
* [Dual GEMM](/examples/45_dual_gemm), which can fuse A x B and A x C into one kernel. Two GEMMs has no producer-consumer dependency.
* Hopper improves [double precision matrix multiplication](/test/unit/gemm/device/gemm_f64n_f64t_f64t_tensor_op_f64_sm90.cu) by 2x compared to Ampere at iso-clocks. It is supported since CUDA 11.8.
* [BLAS3](/test/unit/gemm/device/hemm_cf64_cf64_cf64_tensor_op_f64_sm90.cu) functions with Hoppers new double precision matrix multiplication instructions.
* [ELL Block Sparse GEMM](/examples/43_ell_block_sparse_gemm), which uses an [ELL matrix](https://developer.nvidia.com/blog/accelerating-matrix-multiplication-with-block-sparse-format-and-nvidia-tensor-cores/) to describe the sparsity of A matrix. B and output matrices are still dense. The block size can be arbitary.
* Optimized [Group Conv](/examples/42_ampere_tensorop_group_conv) for SingleGroup mode, which requires that the output channel per group is a multiple of Threadblock tile N.
* [Optimized DepthWise Conv](/examples/46_depthwise_simt_conv2dfprop/depthwise_simt_conv2dfprop.cu). Two new modes are added
* [kOptimized](/test/unit/conv/device/depthwise_conv2d_fprop_direct_conv_f16nhwc_f16nhwc_f16nhwc_simt_f16_sm60.cu) - use direct conv to compute instead of implicit GEMM.
* The restrictions are: 1) input ,output channel and group number should be multiple of (128 / sizeof(input element)). 2) The input filter size should be the same as the template parameter configuration.
* [kFixedStrideDilation](/test/unit/conv/device/depthwise_conv2d_fprop_direct_conv_fixed_stride_dilation_f16nhwc_f16nhwc_f16nhwc_simt_f16_sm60.cu) - which puts stride and dilation into templates to further improve the performance. In this mode, kernel persistents some inputs into register to squeeze more performance, so large filter/stride/dilation is not recommanded.
* The restrictions are: 1) input, output channel and group number should be multiple of (128 / sizeof(input element)). 2) input filter size, stride, dilation should same as the template parameter configuration.
* [Scripts](/examples/44_multi_gemm_ir_and_codegen) to fuse multiple back-to-back GEMM. Its implementation was discussed in a GTC'22 Spring [talk](https://www.nvidia.com/en-us/on-demand/session/gtcspring22-s41606/).
* [FP8 data type definition](/include/cutlass/float8.h) and [conversion routines](/include/cutlass/numeric_conversion.h#L1274-2115).
* Updates and bugfixes from the community (thanks!). Big shout out to Meta's [xFormers](https://github.com/facebookresearch/xformers).
* **Deprecation announcement:** CUTLASS plans to deprecate the following:
* Maxwell and Pascal GPU architectures
* Ubuntu 16.04
* CUDA 10.2
## [2.10.0](https://github.com/NVIDIA/cutlass/releases/tag/v2.10.0) (2022-08-23)
* [CUTLASS Python](/examples/40_cutlass_py) now supports GEMM, CONV, Group GEMM for different data types as well as different epilogue flavours.
* Optimizations for CUTLASS's [Grouped GEMM](examples/24_gemm_grouped/gemm_grouped.cu) kernel. Threadblock scheduling part is improved. Some computation can be moved to the host side if applicable. [Grouped Syr2k](examples/38_syr2k_grouped/syr2k_grouped.cu) kernels are added, too.
@ -16,11 +38,6 @@
* Optimal performance using [**CUDA 11.6u2**](https://developer.nvidia.com/cuda-downloads)
* Updates and bugfixes from the community (thanks!)
* **Deprecation announcement:** CUTLASS plans to deprecate the following:
* Maxwell and Pascal GPU architectures
* Ubuntu 16.04
* CUDA 10.2
## [2.9.0](https://github.com/NVIDIA/cutlass/releases/tag/v2.9.0) (2022-04-21)
* [First layer Convolution kernels](/test/unit/conv/device/conv2d_fprop_fixed_channels_f16nhwc_f16nhwc_f16nhwc_tensor_op_f32_sm80.cu) specialized for small channel counts and reduced alignment
@ -268,7 +285,7 @@
## Copyright
Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: BSD-3-Clause
```

82
CITATION.cff Normal file
View File

@ -0,0 +1,82 @@
cff-version: 1.2.0
title: CUTLASS
message: >-
If you use this software, please cite using the
following metadata.
type: software
authors:
- given-names: Andrew
email: akerr@nvidia.com
family-names: Kerr
affiliation: NVIDIA
- given-names: Haicheng
family-names: Wu
affiliation: NVIDIA
email: haichengw@nvidia.com
- given-names: Manish
family-names: Gupta
affiliation: Google
email: manigupta@google.com
- given-names: Dustyn
family-names: Blasig
email: dblasig@nvidia.com
affiliation: NVIDIA
- given-names: Pradeep
family-names: Ramini
email: prramani@nvidia.com
affiliation: NVIDIA
- given-names: Duane
family-names: Merrill
email: dumerrill@nvidia.com
affiliation: NVIDIA
- given-names: Aniket
family-names: Shivam
email: ashivam@nvidia.com
affiliation: NVIDIA
- given-names: Piotr
family-names: Majcher
email: pmajcher@nvidia.com
affiliation: NVIDIA
- given-names: Paul
family-names: Springer
email: pspringer@nvidia.com
affiliation: NVIDIA
- given-names: Markus
family-names: Hohnerbach
affiliation: NVIDIA
email: mhohnerbach@nvidia.com
- given-names: Jin
family-names: Wang
email: jinw@nvidia.com
affiliation: NVIDIA
- given-names: Matt
family-names: Nicely
email: mnicely@nvidia.com
affiliation: NVIDIA
repository-code: 'https://github.com/NVIDIA/cutlass'
abstract: >-
CUTLASS is a collection of CUDA C++ template
abstractions for implementing high-performance
matrix-multiplication (GEMM) and related
computations at all levels and scales within CUDA.
It incorporates strategies for hierarchical
decomposition and data movement similar to those
used to implement cuBLAS and cuDNN. CUTLASS
decomposes these "moving parts" into reusable,
modular software components abstracted by C++
template classes. These thread-wide, warp-wide,
block-wide, and device-wide primitives can be
specialized and tuned via custom tiling sizes, data
types, and other algorithmic policy. The resulting
flexibility simplifies their use as building blocks
within custom kernels and applications.
keywords:
- 'cutlass, tensor cores, cuda'
license: BSD-3-Clause
license-url: https://github.com/NVIDIA/cutlass/blob/v2.11.0/LICENSE.txt
version: '2.11.0'
date-released: '2022-11-19'
identifiers:
- type: url
value: "https://github.com/NVIDIA/cutlass/tree/v2.11.0"
description: The GitHub release URL of tag 2.11.0

View File

@ -1,4 +1,4 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without
@ -37,8 +37,9 @@ else()
endif()
message(STATUS "CMake Version: ${CMAKE_VERSION}")
set(IMPLICIT_CMAKE_CXX_STANDARD OFF CACHE BOOL "Do not explicitly specify -std=c++11 if set")
project(CUTLASS VERSION 2.10.0 LANGUAGES CXX)
project(CUTLASS VERSION 2.11.0 LANGUAGES CXX)
include(${CMAKE_CURRENT_SOURCE_DIR}/CUDA.cmake)
if (CUDA_VERSION VERSION_LESS 10.2)
@ -52,15 +53,19 @@ find_package(Doxygen QUIET)
#
# CUTLASS 2.x requires C++11
#
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
if (NOT IMPLICIT_CMAKE_CXX_STANDARD)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_EXTENSIONS OFF)
endif()
if(CUTLASS_NATIVE_CUDA)
set(CMAKE_CUDA_STANDARD 11)
set(CMAKE_CUDA_STANDARD_REQUIRED ON)
else()
list(APPEND CUTLASS_CUDA_NVCC_FLAGS --std=c++11)
if (NOT IMPLICIT_CMAKE_CXX_STANDARD)
list(APPEND CUTLASS_CUDA_NVCC_FLAGS --std=c++11)
endif()
endif()
if(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT)
@ -87,6 +92,7 @@ set(CUTLASS_ENABLE_EXAMPLES ${CUTLASS_ENABLE_EXAMPLES_INIT} CACHE BOOL "Enable C
set(CUTLASS_ENABLE_TOOLS ${CUTLASS_ENABLE_TOOLS_INIT} CACHE BOOL "Enable CUTLASS Tools")
set(CUTLASS_ENABLE_LIBRARY ${CUTLASS_ENABLE_LIBRARY_INIT} CACHE BOOL "Enable CUTLASS Library")
set(CUTLASS_ENABLE_PROFILER ${CUTLASS_ENABLE_LIBRARY} CACHE BOOL "Enable CUTLASS Profiler")
set(CUTLASS_ENABLE_PERFORMANCE ${CUTLASS_ENABLE_PROFILER} CACHE BOOL "Enable CUTLASS Proformance")
if(${CMAKE_PROJECT_NAME} STREQUAL ${PROJECT_NAME})
set(CUTLASS_ENABLE_TESTS_INIT ${CUTLASS_ENABLE_LIBRARY}})
@ -122,6 +128,9 @@ endif()
if (NOT CUDA_VERSION VERSION_LESS 11.1 AND NOT CUDA_COMPILER MATCHES "[Cc]lang")
list(APPEND CUTLASS_NVCC_ARCHS_SUPPORTED 86)
endif()
if (NOT CUDA_VERSION VERSION_LESS 11.8 AND NOT CUDA_COMPILER MATCHES "[Cc]lang")
list(APPEND CUTLASS_NVCC_ARCHS_SUPPORTED 90)
endif()
set(CUTLASS_NVCC_ARCHS ${CUTLASS_NVCC_ARCHS_SUPPORTED} CACHE STRING "The SM architectures requested.")
set(CUTLASS_NVCC_ARCHS_ENABLED ${CUTLASS_NVCC_ARCHS} CACHE STRING "The SM architectures to build code for.")
@ -347,11 +356,21 @@ if (CMAKE_VERSION VERSION_GREATER_EQUAL 3.18)
endif()
function(cutlass_apply_cuda_gencode_flags TARGET)
set(options)
set(oneValueArgs)
set(multiValueArgs SM_ARCHS)
cmake_parse_arguments(_ "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
if (__SM_ARCHS)
set(ARCHS_ENABLED ${__SM_ARCHS})
else()
set(ARCHS_ENABLED ${CUTLASS_NVCC_ARCHS_ENABLED})
endif()
set(NVCC_FLAGS)
set(CLANG_FLAGS)
set(__CMAKE_CUDA_ARCHS)
foreach(ARCH ${CUTLASS_NVCC_ARCHS_ENABLED})
foreach(ARCH ${ARCHS_ENABLED})
list(APPEND CLANG_FLAGS --cuda-gpu-arch=sm_${ARCH})
set(CODES)
if(CUTLASS_NVCC_EMBED_CUBIN)
@ -365,21 +384,37 @@ function(cutlass_apply_cuda_gencode_flags TARGET)
list(JOIN CODES "," CODES_STR)
list(APPEND NVCC_FLAGS -gencode=arch=compute_${ARCH},code=[${CODES_STR}])
endforeach()
if (CUDA_COMPILER MATCHES "[Cc]lang")
target_compile_options(
${TARGET}
PRIVATE
$<$<COMPILE_LANGUAGE:CXX>:${CLANG_FLAGS}>
)
elseif(CMAKE_VERSION GREATER_EQUAL 3.18)
set_property(TARGET ${TARGET} PROPERTY CUDA_ARCHITECTURES ${__CMAKE_CUDA_ARCHS})
if (NOT __SM_ARCHS)
if (CUDA_COMPILER MATCHES "[Cc]lang")
target_compile_options(
${TARGET}
PRIVATE
$<$<COMPILE_LANGUAGE:CXX>:${CLANG_FLAGS}>
)
elseif(CMAKE_VERSION GREATER_EQUAL 3.18)
set_property(TARGET ${TARGET} PROPERTY CUDA_ARCHITECTURES ${__CMAKE_CUDA_ARCHS})
else()
target_compile_options(
${TARGET}
PRIVATE
$<$<COMPILE_LANGUAGE:CUDA>:${NVCC_FLAGS}>
)
endif()
else()
target_compile_options(
${TARGET}
PRIVATE
$<$<COMPILE_LANGUAGE:CUDA>:${NVCC_FLAGS}>
)
list(JOIN CLANG_FLAGS " " CLANG_FLAGS_STR)
list(JOIN NVCC_FLAGS " " STR_NVCC_FLAGS)
if (CUDA_COMPILER MATCHES "[Cc]lang")
if(${TARGET} MATCHES ".*\.cpp")
set_source_files_properties(${TARGET} PROPERTIES COMPILE_FLAGS ${CLANG_FLAGS_STR})
endif()
elseif(CMAKE_VERSION GREATER_EQUAL 3.18)
set_source_files_properties(${TARGET} PROPERTIES CUDA_ARCHITECTURES ${STR_NVCC_FLAGS})
else()
if(${TARGET} MATCHES ".*\.cu")
set_source_files_properties(${TARGET} PROPERTIES COMPILE_FLAGS ${STR_NVCC_FLAGS})
endif()
endif()
endif()
endfunction()
@ -459,7 +494,10 @@ set(CUTLASS_TOOLS_UTIL_INCLUDE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/tools/util/includ
include_directories(${CUTLASS_INCLUDE_DIR})
target_compile_features(CUTLASS INTERFACE cxx_std_11)
target_compile_definitions(CUTLASS INTERFACE CUTLASS_NAMESPACE=${CUTLASS_NAMESPACE})
if (NOT CUTLASS_NAMESPACE STREQUAL "cutlass")
target_compile_definitions(CUTLASS INTERFACE CUTLASS_NAMESPACE=${CUTLASS_NAMESPACE})
endif()
if (NOT DEFINED CUTLASS_REVISION)
@ -569,6 +607,9 @@ install(DIRECTORY DESTINATION ${CUTLASS_TEST_INSTALL_PREFIX}/ctest)
################################################################################
set(CUTLASS_ENABLE_CUBLAS OFF CACHE BOOL "cuBLAS usage for tests")
set(CUTLASS_ENABLE_CUDNN OFF CACHE BOOL "cuDNN usage for tests")
include(${CMAKE_CURRENT_SOURCE_DIR}/cuBLAS.cmake)
if (CUTLASS_ENABLE_CUBLAS)
@ -732,7 +773,7 @@ if (CUTLASS_ENABLE_TOOLS)
add_subdirectory(tools)
if (CUTLASS_ENABLE_PROFILER)
add_dependencies(test_all test_profiler)
endif()
endif()
endif()
if (CUTLASS_ENABLE_EXAMPLES)
add_subdirectory(examples)
@ -775,15 +816,23 @@ endif()
################################################################################
include(CMakePackageConfigHelpers)
write_basic_package_version_file(
${CMAKE_CURRENT_BINARY_DIR}/NvidiaCutlassConfigVersion.cmake
COMPATIBILITY AnyNewerVersion)
install(
FILES ${CMAKE_CURRENT_SOURCE_DIR}/cmake/NvidiaCutlassConfig.cmake
DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/
FILES
${CMAKE_CURRENT_SOURCE_DIR}/cmake/NvidiaCutlassConfig.cmake
${CMAKE_CURRENT_BINARY_DIR}/NvidiaCutlassConfigVersion.cmake
DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/NvidiaCutlass/
)
install(
EXPORT NvidiaCutlass
NAMESPACE nvidia::cutlass::
DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/
DESTINATION ${CMAKE_INSTALL_LIBDIR}/cmake/NvidiaCutlass/
FILE NvidiaCutlassTargets.cmake
)

View File

@ -7,10 +7,10 @@
This is the official list of CUTLASS developers and contributors.
## DEVELOPERS
Andrew Kerr
Haicheng Wu
Manish Gupta
Dustyn Blasig
Andrew Kerr
Haicheng Wu
Manish Gupta
Dustyn Blasig
Pradeep Ramani
Cris Cecka
Vijay Thakkar
@ -20,52 +20,50 @@ Ethan Yan
Zhaodong Chen
Jack Kosaian
Yujia Zhai
Naila Farooqui
Piotr Majcher
Paul Springer
Jin Wang
Chinmay Talegaonkar
Shang Zhang
Scott Yokim
Markus Hohnerbach
Aditya Atluri
David Tanner
Manikandan Ananth
Naila Farooqui
Piotr Majcher
Paul Springer
Jin Wang
Chinmay Talegaonkar
Shang Zhang
Scott Yokim
Markus Hohnerbach
Aditya Atluri
David Tanner
Manikandan Ananth
## CUTLASS Product Manager
Matthew Nicely
## CONTRIBUTORS
Timothy Costa
Julien Demouth
Brian Fahs
Michael Goldfarb
Mostafa Hagog
Fei Hu
Alan Kaatz
Tina Li
Timmy Liu
Duane Merrill
Kevin Siu
Markus Tavenrath
John Tran
Vicki Wang
Junkai Wu
Fung Xie
Albert Xu
Jack Yang
Xiuxia Zhang
Nick Zhao
Timothy Costa
Julien Demouth
Brian Fahs
Michael Goldfarb
Mostafa Hagog
Fei Hu
Alan Kaatz
Tina Li
Timmy Liu
Duane Merrill
Kevin Siu
Markus Tavenrath
John Tran
Vicki Wang
Junkai Wu
Fung Xie
Albert Xu
Jack Yang
Xiuxia Zhang
Nick Zhao
## ACKNOWLEDGEMENTS
Girish Bharambe
Luke Durant
Olivier Giroux
Stephen Jones
Rishkul Kulkarni
Bryce Lelbach
Joel McCormack
Kyrylo Perelygin
Girish Bharambe
Luke Durant
Olivier Giroux
Stephen Jones
Rishkul Kulkarni
Bryce Lelbach
Joel McCormack
Kyrylo Perelygin

View File

@ -1,4 +1,4 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without
@ -80,7 +80,7 @@ find_library(
lib64
lib
NO_DEFAULT_PATH
# We aren't going to search any system paths. We want to find the runtime
# We aren't going to search any system paths. We want to find the runtime
# in the CUDA toolkit we're building against.
)
@ -95,10 +95,10 @@ if(NOT TARGET cudart AND CUDART_LIBRARY)
# from the PATH search.
else()
add_library(cudart SHARED IMPORTED GLOBAL)
endif()
endif()
add_library(nvidia::cudart ALIAS cudart)
set_property(
TARGET cudart
PROPERTY IMPORTED_LOCATION
@ -126,7 +126,7 @@ find_library(
lib64/stubs
lib/stubs
NO_DEFAULT_PATH
# We aren't going to search any system paths. We want to find the runtime
# We aren't going to search any system paths. We want to find the runtime
# in the CUDA toolkit we're building against.
)
@ -141,10 +141,10 @@ if(NOT TARGET cuda_driver AND CUDA_DRIVER_LIBRARY)
# from the PATH search.
else()
add_library(cuda_driver SHARED IMPORTED GLOBAL)
endif()
endif()
add_library(nvidia::cuda_driver ALIAS cuda_driver)
set_property(
TARGET cuda_driver
PROPERTY IMPORTED_LOCATION
@ -170,7 +170,7 @@ find_library(
lib64
lib
NO_DEFAULT_PATH
# We aren't going to search any system paths. We want to find the runtime
# We aren't going to search any system paths. We want to find the runtime
# in the CUDA toolkit we're building against.
)
@ -185,10 +185,10 @@ if(NOT TARGET nvrtc AND NVRTC_LIBRARY)
# from the PATH search.
else()
add_library(nvrtc SHARED IMPORTED GLOBAL)
endif()
endif()
add_library(nvidia::nvrtc ALIAS nvrtc)
set_property(
TARGET nvrtc
PROPERTY IMPORTED_LOCATION
@ -247,7 +247,7 @@ function(cutlass_unify_source_files TARGET_ARGS_VAR)
set(CUDA_FILE_ARGS)
set(TARGET_SOURCE_ARGS)
foreach(ARG ${__UNPARSED_ARGUMENTS})
if(${ARG} MATCHES ".*\.cu$")
list(APPEND CUDA_FILE_ARGS ${ARG})
@ -255,7 +255,7 @@ function(cutlass_unify_source_files TARGET_ARGS_VAR)
list(APPEND TARGET_SOURCE_ARGS ${ARG})
endif()
endforeach()
list(LENGTH CUDA_FILE_ARGS NUM_CUDA_FILE_ARGS)
while(NUM_CUDA_FILE_ARGS GREATER 0)
list(SUBLIST CUDA_FILE_ARGS 0 ${__BATCH_SIZE} CUDA_FILE_BATCH)
@ -287,7 +287,7 @@ function(cutlass_unify_source_files TARGET_ARGS_VAR)
endfunction()
function(cutlass_add_library NAME)
set(options)
set(options SKIP_GENCODE_FLAGS)
set(oneValueArgs EXPORT_NAME)
set(multiValueArgs)
cmake_parse_arguments(_ "${options}" "${oneValueArgs}" "${multiValueArgs}" ${ARGN})
@ -303,7 +303,9 @@ function(cutlass_add_library NAME)
endif()
cutlass_apply_standard_compile_options(${NAME})
if (NOT __SKIP_GENCODE_FLAGS)
cutlass_apply_cuda_gencode_flags(${NAME})
endif()
target_compile_features(
${NAME}

View File

@ -1,4 +1,4 @@
Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: BSD-3-Clause
Redistribution and use in source and binary forms, with or without

View File

@ -1,11 +1,19 @@
# Publications Using Cutlass
## 2023
- ["Stream-K: Work-centric Parallel Decomposition for Dense Matrix-Matrix Multiplication on the GPU"](https://arxiv.org/abs/2301.03598). Muhammad Osama, Duane Merrill, Cris Cecka, Michael Garland, John D. Owens. _arXiv_, January 2023.
## 2022
- ["GPU Load Balancing"](https://arxiv.org/abs/2212.08964). Muhammad Osama. _Doctoral dissertation, University of California, Davis_, December 2022.
- ["Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance"](https://arxiv.org/abs/2110.15238). Jiarong Xing, Leyuan Wang, Shang Zhang, Jack Chen, Ang Chen, Yibo Zhu. _Proceedings of the 5th MLSys Conference_, August 2022.
- ["Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance"](https://arxiv.org/abs/2203.03341). Hiroyuki Ootomo, Rio Yokota. _International Journal of High Performance Computing_, March 2022.
- ["Breaking the Computation and Communication Abstraction Barrier in Distributed Machine Learning Workloads"](https://arxiv.org/abs/2105.05720). Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Saeed Maleki, Youshan Miao, Madanlal Musuvathi, Todd Mytkowicz, Olli Sarikivi. _Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems_, February 2022.
## 2021
- ["Arithmetic-intensity-guided fault tolerance for neural network inference on GPUs"](https://dl.acm.org/doi/abs/10.1145/3458817.3476184). Jack Kosaian, K. V. Rashmi. _Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis_, November 2021.

105
README.md
View File

@ -1,8 +1,8 @@
![ALT](/media/images/gemm-hierarchy-with-epilogue-no-labels.png "Complete CUDA GEMM decomposition")
# CUTLASS 2.10
# CUTLASS 2.11
_CUTLASS 2.10 - August 2022_
_CUTLASS 2.11 - November 2022_
CUTLASS is a collection of CUDA C++ template abstractions for implementing
high-performance matrix-multiplication (GEMM) and related computations at all levels
@ -36,24 +36,27 @@ See the [Quick Start Guide](/media/docs/quickstart.md) to get started quickly.
See the [functionality listing](/media/docs/functionality.md) for the list of operations
supported at each level of the execution model hierarchy.
# What's New in CUTLASS 2.10
# What's New in CUTLASS 2.11
CUTLASS 2.10 is an update to CUTLASS adding:
- [CUTLASS Python](/examples/40_cutlass_py) now supports GEMM, Convolution and Grouped GEMM for different data types as well as different epilogue flavors.
- Optimizations for CUTLASS's [Grouped GEMM](examples/24_gemm_grouped/gemm_grouped.cu) kernel. It can move some scheduling into the host side if applicable.
- Optimizations for [GEMM+Softmax](examples/35_gemm_softmax).
- [Grouped GEMM for Multihead Attention](examples/41_multi_head_attention) is a general MHA that does not require equal sequence length in every GEMM.
- [GEMM + Layer norm fusion for Ampere](examples/37_gemm_layernorm_gemm_fusion/) can fuse the layernorm into GEMMs before and after.
- [GEMM Epilogue Permutation Fusion](examples/39_gemm_permute) can permute the GEMM output before storing.
- [Grouped convolution targeting implicit GEMM](test/unit/conv/device/group_conv2d_fprop_implicit_gemm_f16nhwc_f16nhwc_f16nhwc_tensor_op_f32_sm80.cu) introduces the first group convolution implementation to CUTLASS. It is an Analytical implementation, not an Optimized.
- [Depthwise separable convolution](test/unit/conv/device/depthwise_fprop_implicit_gemm_f16nhwc_f16nhwc_f16nhwc_simt_f16_sm60.cu) introduces the first depthwise convolution which is also Analytical for now.
- Standalone [Layernorm](/tools/util/include/cutlass/util/device_layernorm.h) and [Pooling](/tools/util/include/cutlass/util/device_nhwc_pooling.h) kernels.
- [Back-to-back GEMM](examples/13_two_tensor_op_fusion) enhancements.
- Updates and bugfixes from the community (thanks!)
- **Deprecation announcement:** CUTLASS plans to deprecate the following:
CUTLASS 2.11 is an update to CUTLASS adding:
- [Stream-K](/examples/47_ampere_gemm_universal_streamk), which is a new general way to do split-K. It can not only improve performance, but can also significantly reduce the number of tile sizes that need to be profiled to find the best one.
- [Fused multi-head attention kernel](/examples/41_fused_multi_head_attention). It has two variants: one for fixed sequence lengths, and another for variable sequence lengths.
- [Dual GEMM](/examples/45_dual_gemm). It can run two GEMMs that share the same left input matrix in one kernel.
- Hopper improves [double precision matrix multiplication](/test/unit/gemm/device/gemm_f64n_f64t_f64t_tensor_op_f64_sm90.cu) by 2x compared to Ampere at iso-clocks. It is supported since CUDA 11.8.
- [BLAS3](/test/unit/gemm/device/hemm_cf64_cf64_cf64_tensor_op_f64_sm90.cu) functions with Hoppers new double precision matrix multiplication instructions.
- [ELL Block Sparse GEMM](/examples/43_ell_block_sparse_gemm).
- [Optimized Group Conv](/examples/42_ampere_tensorop_group_conv).
- [Optimized DepthWise Conv](/examples/46_depthwise_simt_conv2dfprop).
- [Scripts](/examples/44_multi_gemm_ir_and_codegen) to fuse multiple back-to-back GEMM.
- [FP8 data type definition](/include/cutlass/float8.h) and [conversion routines](/include/cutlass/numeric_conversion.h#L1274-2115).
- Updates and bugfixes from the community (thanks!). Big shout out to Meta's [xFormers](https://github.com/facebookresearch/xformers).
- **Deprecation announcement:** CUTLASS plans to deprecate the following in the next major release:
- Maxwell and Pascal GPU architectures
- Ubuntu 16.04
- CUDA 10.2
- C++ 11
- **Future requirement announcement:** CUTLASS plans to add the following requirements in the next major release:
- Minimum C++ standard - C++17
**See the [CHANGELOG](CHANGELOG.md) for a detailed listing of releases and updates.**
@ -80,10 +83,11 @@ as shown in the above figure. Tensor Core operations are still implemented usin
# Compatibility
CUTLASS requires a C++11 host compiler and
performs best when built with the [**CUDA 11.6u2 Toolkit**](https://developer.nvidia.com/cuda-toolkit).
It is also compatible with CUDA 11.0, CUDA 11.1, CUDA 11.2, CUDA 11.3, CUDA 11.4, and CUDA 11.5.
CUTLASS requires a C++11 host compiler and performs best when built with the [**CUDA 11.8 Toolkit**](https://developer.nvidia.com/cuda-toolkit).
It is also compatible with CUDA 11.x.
## Operating Systems
We have tested the following environments.
|**Operating System** | **Compiler** |
@ -93,11 +97,12 @@ We have tested the following environments.
| | Microsoft Visual Studio 2019|
| Ubuntu 18.04 | GCC 7.5.0 |
| Ubuntu 20.04 | GCC 10.3.0 |
| Ubuntu 21.04 | GCC 11.2.0 |
| Ubuntu 22.04 | GCC 11.2.0 |
Additionally, CUTLASS may be built with clang.
See [these instructions](media/docs/quickstart.md#clang) for more details.
## Hardware
CUTLASS runs successfully on the following NVIDIA GPUs, and it is expected to be efficient on
any Volta-, Turing-, or NVIDIA Ampere- architecture NVIDIA GPU.
@ -110,9 +115,7 @@ any Volta-, Turing-, or NVIDIA Ampere- architecture NVIDIA GPU.
|NVIDIA A100|8.0|11.0|11.0|
|NVIDIA A10 |8.6|11.1|11.1|
|NVIDIA GeForce 3090|8.6|11.1|11.1|
For all GPUs, we recommend compiling with the [CUDA 11.6u2 Toolkit](https://developer.nvidia.com/cuda-toolkit)
for best performance.
|NVIDIA H100 PCIe|9.0|11.8|Double-precision: 11.8; Mixed precision: 12.0|
# Documentation
@ -133,9 +136,16 @@ CUTLASS is described in the following documents and the accompanying
- [CUTLASS Profiler](media/docs/profiler.md) - command-line driven profiling application
- [CUTLASS Utilities](media/docs/utilities.md) - additional templates used to facilate rapid development
# Resources
We have also described the structure of an efficient GEMM in our talk at the
[GPU Technology Conference 2018](http://on-demand.gputechconf.com/gtc/2018/presentation/s8854-cutlass-software-primitives-for-dense-linear-algebra-at-all-levels-and-scales-within-cuda.pdf).
- [CUTLASS: Software Primitives for Dense Linear Algebra at All Levels and Scales within CUDA](https://www.nvidia.com/en-us/on-demand/session/gtcsiliconvalley2018-s8854/)
- [Developing CUDA Kernels to Push Tensor Cores to the Absolute Limit on NVIDIA A100](https://www.nvidia.com/en-us/on-demand/session/gtcsj20-s21745/)
- [Accelerating Convolution with Tensor Cores in CUTLASS](https://www.nvidia.com/en-us/on-demand/session/gtcspring21-s31883/)
- [Accelerating Backward Data Gradient by Increasing Tensor Core Utilization in CUTLASS](https://www.nvidia.com/en-us/on-demand/session/gtcspring22-s41996/)
- [CUTLASS: Python API, Enhancements, and NVIDIA Hopper](https://www.nvidia.com/en-us/on-demand/session/gtcfall22-a41131/)
# Building CUTLASS
CUTLASS is a header-only template library and does not need to be built to be used by other
@ -199,6 +209,8 @@ include/ # client applications should target this directory
conv/ # code specialized for convolution
epilogue/ # code specialized for the epilogue of gemm/convolution
gemm/ # code specialized for general matrix product computations
layout/ # layout definitions for matrices, tensors, and other mathematical objects in memory
@ -206,6 +218,8 @@ include/ # client applications should target this directory
platform/ # CUDA-capable Standard Library components
reduction/ # bandwidth-limited reduction kernels that do not fit the "gemm" model
thread/ # simt code that can be performed within a CUDA thread
transform/ # code specialized for layout, type, and domain transformations
@ -216,49 +230,6 @@ include/ # client applications should target this directory
[CUTLASS SDK examples](/examples) apply CUTLASS templates to implement basic computations.
```
examples/
00_basic_gemm/ # launches a basic GEMM with single precision inputs and outputs
01_cutlass_utilities/ # demonstrates CUTLASS Utilities for allocating and initializing tensors
02_dump_reg_smem/ # debugging utilities for printing register and shared memory contents
03_visualize_layout/ # utility for visualizing all layout functions in CUTLASS
04_tile_iterator/ # example demonstrating an iterator over tiles in memory
05_batched_gemm/ # example demonstrating CUTLASS's batched strided GEMM operation
06_splitK_gemm/ # exmaple demonstrating CUTLASS's Split-K parallel reduction kernel
07_volta_tensorop_gemm/ # example demonstrating mixed precision GEMM using Volta Tensor Cores
08_turing_tensorop_gemm/ # example demonstrating integer GEMM using Turing Tensor Cores
09_turing_tensorop_conv2dfprop/ # example demonstrating integer implicit GEMM convolution (forward propagation) using Turing Tensor Cores
10_planar_complex/ # example demonstrating planar complex GEMM kernels
11_planar_complex_array/ # example demonstrating planar complex kernels with batch-specific problem sizes
12_gemm_bias_relu/ # example demonstrating GEMM fused with bias and relu
13_fused_two_gemms/ # example demonstrating two GEMMs fused in one kernel
22_ampere_tensorop_conv2dfprop/ # example demonstrating integer implicit GEMM convolution (forward propagation) using Ampere Tensor Cores
31_basic_syrk # example demonstrating Symmetric Rank-K update
32_basic_trmm # example demonstrating Triangular Matrix-Matrix multiplication
33_ampere_3xtf32_tensorop_symm # example demonstrating Symmetric Matrix-Matrix multiplication with FP32 emulation
35_gemm_softmax # example demonstrating GEMM fused with Softmax in mixed precision using Ampere Tensor Cores
40_cutlass_py # example demonstrating CUTLASS with CUDA Python
```
### Tools
```
@ -512,7 +483,7 @@ The official list of CUTLASS developers and contributors is available here: [CON
# Copyright
Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: BSD-3-Clause
```

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,4 +1,4 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,4 +1,4 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without
@ -29,7 +29,6 @@
set(TEST_COMMAND_00 RowMajor --extent=16,16)
set(TEST_COMMAND_01 \"ColumnMajorInterleaved<4>\" --extent=32,8 --output-shape=16 --vectorize=4)
cutlass_example_add_executable(
03_visualize_layout
@ -37,6 +36,5 @@ cutlass_example_add_executable(
register_layout.cu
TEST_COMMAND_OPTIONS
TEST_COMMAND_00
TEST_COMMAND_01
)

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without
@ -64,15 +64,15 @@ void RegisterLayouts(std::map<std::string, std::unique_ptr<VisualizeLayoutBase>
// All Ampere/Turing H/Integer matrix multiply tensor core kernels uses the same swizzling
// layout implementation with different templates.
//
// BMMA 88128 Interleaved-256
// BMMA 168256 Interleaved-256
// mma.sync.aligned.m8n8k128.s32.b1.b1.s32 Interleaved-256
// mma.sync.aligned.m16n8k256.s32.b1.b1.s32 Interleaved-256
{"TensorOpMultiplicand<1,256>",
new VisualizeLayout<cutlass::layout::TensorOpMultiplicand<1, 256>>},
// BMMA 88128 TN kblock512
// BMMA 168256 TN kblock512
// mma.sync.aligned.m8n8k128.s32.b1.b1.s32 TN kblock512
// mma.sync.aligned.m16n8k256.s32.b1.b1.s32 TN kblock512
{"TensorOpMultiplicand<1,512>",
new VisualizeLayout<cutlass::layout::TensorOpMultiplicand<1, 512>>},
// BMMA 168256 TN kblock1024
// mma.sync.aligned.m16n8k256.s32.b1.b1.s32 TN kblock1024
{"TensorOpMultiplicand<1,1024>",
new VisualizeLayout<cutlass::layout::TensorOpMultiplicand<1, 1024>>},
// Integer matrix multiply.int4 8832 Interleaved-64

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without
@ -81,7 +81,7 @@ matrix A can be seen as
---------------------------------------
batch 0 | batch 1
, where batch size is 2, M is 6 and K is 2
The stride (batch_stride_B) between the first element of two batches is lda * k
The stride (batch_stride_A) between the first element of two batches is lda * k
matrix B can be seen as
-----------------------------
@ -94,7 +94,7 @@ matrix B can be seen as
(1,1,0) | (1,1,1) | (1,1,2) |
-----------------------------
, where the batch size is 2, N is 3 and K is 2
The stride (batch_stride_C) between the first element of two batches is k
The stride (batch_stride_B) between the first element of two batches is k
*/

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -86,7 +86,7 @@ threadblock. Typically this requires the 2nd Convolution uses 1x1 filter without
# Copyright
Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: BSD-3-Clause
```

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without
@ -199,15 +199,15 @@ public:
"GEMM operations.");
/// Number of cp.async instructions to load one stage of operand A
static int const TBLDGSTSIterationsA0 =
static int const TBLoadIterationsA0 =
IteratorA0::ThreadMap::Iterations::kCount;
/// Number of cp.async instructions to load one stage of operand B
static int const TBLDGSTSIterationsB0 =
static int const TBLoadIterationsB0 =
IteratorB0::ThreadMap::Iterations::kCount;
/// Number of cp.async instructions to load one stage of operand B
static int const TBLDGSTSIterationsB1 =
static int const TBLoadIterationsB1 =
IteratorB1::ThreadMap::Iterations::kCount;
/// Number of stages
@ -215,15 +215,15 @@ public:
/// Number of cp.async instructions to load on group of operand A
static int const kAccessesPerGroupA0 =
(TBLDGSTSIterationsA0 + Base::kWarpGemmIterations0 - 1) / Base::kWarpGemmIterations0;
(TBLoadIterationsA0 + Base::kWarpGemmIterations0 - 1) / Base::kWarpGemmIterations0;
/// Number of cp.async instructions to load on group of operand B
static int const kAccessesPerGroupB0 =
(TBLDGSTSIterationsB0 + Base::kWarpGemmIterations0 - 1) / Base::kWarpGemmIterations0;
(TBLoadIterationsB0 + Base::kWarpGemmIterations0 - 1) / Base::kWarpGemmIterations0;
/// Number of cp.async instructions to load on group of operand B
static int const kAccessesPerGroupB1 =
(TBLDGSTSIterationsB1 + Base::kWarpGemmIterations1 - 1) / Base::kWarpGemmIterations1;
(TBLoadIterationsB1 + Base::kWarpGemmIterations1 - 1) / Base::kWarpGemmIterations1;
};
private:
@ -304,10 +304,10 @@ public:
IteratorA0::kAccessesPerVector);
this->smem_iterator_A0_.set_iteration_index(group_start_A0);
// LDGSTS for operand A
// Load for operand A
CUTLASS_PRAGMA_UNROLL
for (int j = 0; j < Detail::kAccessesPerGroupA0; ++j) {
if (group_start_A0 + j < Detail::TBLDGSTSIterationsA0) {
if (group_start_A0 + j < Detail::TBLoadIterationsA0) {
typename IteratorA0::AccessType *dst_ptr =
reinterpret_cast<typename IteratorA0::AccessType *>(
this->smem_iterator_A0_.get());
@ -334,10 +334,10 @@ public:
IteratorB0::kAccessesPerVector);
this->smem_iterator_B0_.set_iteration_index(group_start_B0);
// LDGSTS for operand B
// Load for operand B
CUTLASS_PRAGMA_UNROLL
for (int j = 0; j < Detail::kAccessesPerGroupB0; ++j) {
if (group_start_B0 + j < Detail::TBLDGSTSIterationsB0) {
if (group_start_B0 + j < Detail::TBLoadIterationsB0) {
typename IteratorB0::AccessType *dst_ptr =
reinterpret_cast<typename IteratorB0::AccessType *>(
this->smem_iterator_B0_.get());
@ -367,10 +367,10 @@ public:
IteratorB1::kAccessesPerVector);
this->smem_iterator_B1_.set_iteration_index(group_start_B1);
// LDGSTS for operand B
// Load for operand B
CUTLASS_PRAGMA_UNROLL
for (int j = 0; j < Detail::kAccessesPerGroupB1; ++j) {
if (group_start_B1 + j < Detail::TBLDGSTSIterationsB1) {
if (group_start_B1 + j < Detail::TBLoadIterationsB1) {
typename IteratorB1::AccessType *dst_ptr =
reinterpret_cast<typename IteratorB1::AccessType *>(
this->smem_iterator_B1_.get());
@ -430,9 +430,9 @@ public:
iterator_A0.set_iteration_index(0);
this->smem_iterator_A0_.set_iteration_index(0);
// LDGSTS for operand A
// Load for operand A
CUTLASS_PRAGMA_UNROLL
for (int j = 0; j < Detail::TBLDGSTSIterationsA0; ++j) {
for (int j = 0; j < Detail::TBLoadIterationsA0; ++j) {
typename IteratorA0::AccessType *dst_ptr =
reinterpret_cast<typename IteratorA0::AccessType *>(
this->smem_iterator_A0_.get());
@ -458,9 +458,9 @@ public:
iterator_B0.set_iteration_index(0);
this->smem_iterator_B0_.set_iteration_index(0);
// LDGSTS for operand B
// Load for operand B
CUTLASS_PRAGMA_UNROLL
for (int j = 0; j < Detail::TBLDGSTSIterationsB0; ++j) {
for (int j = 0; j < Detail::TBLoadIterationsB0; ++j) {
typename IteratorB0::AccessType *dst_ptr =
reinterpret_cast<typename IteratorB0::AccessType *>(
this->smem_iterator_B0_.get());
@ -674,9 +674,9 @@ public:
iterator_B1.set_iteration_index(0);
this->smem_iterator_B1_.set_iteration_index(0);
// LDGSTS for operand B
// Load for operand B
CUTLASS_PRAGMA_UNROLL
for (int j = 0; j < Detail::TBLDGSTSIterationsB1; ++j) {
for (int j = 0; j < Detail::TBLoadIterationsB1; ++j) {
typename IteratorB1::AccessType *dst_ptr =
reinterpret_cast<typename IteratorB1::AccessType *>(
this->smem_iterator_B1_.get());

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without
@ -205,15 +205,15 @@ public:
"GEMM operations.");
/// Number of cp.async instructions to load one stage of operand A
static int const TBLDGSTSIterationsA0 =
static int const TBLoadIterationsA0 =
IteratorA0::ThreadMap::Iterations::kCount;
/// Number of cp.async instructions to load one stage of operand B
static int const TBLDGSTSIterationsB0 =
static int const TBLoadIterationsB0 =
IteratorB0::ThreadMap::Iterations::kCount;
/// Number of cp.async instructions to load one stage of operand B
static int const TBLDGSTSIterationsB1 =
static int const TBLoadIterationsB1 =
IteratorB1::ThreadMap::Iterations::kCount;
/// Number of stages
@ -221,15 +221,15 @@ public:
/// Number of cp.async instructions to load on group of operand A
static int const kAccessesPerGroupA0 =
(TBLDGSTSIterationsA0 + Base::kWarpGemmIterations0 - 1) / Base::kWarpGemmIterations0;
(TBLoadIterationsA0 + Base::kWarpGemmIterations0 - 1) / Base::kWarpGemmIterations0;
/// Number of cp.async instructions to load on group of operand B
static int const kAccessesPerGroupB0 =
(TBLDGSTSIterationsB0 + Base::kWarpGemmIterations0 - 1) / Base::kWarpGemmIterations0;
(TBLoadIterationsB0 + Base::kWarpGemmIterations0 - 1) / Base::kWarpGemmIterations0;
/// Number of cp.async instructions to load on group of operand B
static int const kAccessesPerGroupB1 =
(TBLDGSTSIterationsB1 + Base::kWarpGemmIterations1 - 1) / Base::kWarpGemmIterations1;
(TBLoadIterationsB1 + Base::kWarpGemmIterations1 - 1) / Base::kWarpGemmIterations1;
};
private:
@ -327,10 +327,10 @@ public:
IteratorA0::kAccessesPerVector);
this->smem_iterator_A0_.set_iteration_index(group_start_A0);
// LDGSTS for operand A
// cp.async for operand A
CUTLASS_PRAGMA_UNROLL
for (int j = 0; j < Detail::kAccessesPerGroupA0; ++j) {
if (group_start_A0 + j < Detail::TBLDGSTSIterationsA0) {
if (group_start_A0 + j < Detail::TBLoadIterationsA0) {
typename IteratorA0::AccessType *dst_ptr =
reinterpret_cast<typename IteratorA0::AccessType *>(
this->smem_iterator_A0_.get());
@ -357,10 +357,10 @@ public:
IteratorB0::kAccessesPerVector);
this->smem_iterator_B0_.set_iteration_index(group_start_B0);
// LDGSTS for operand B
// cp.async for operand B
CUTLASS_PRAGMA_UNROLL
for (int j = 0; j < Detail::kAccessesPerGroupB0; ++j) {
if (group_start_B0 + j < Detail::TBLDGSTSIterationsB0) {
if (group_start_B0 + j < Detail::TBLoadIterationsB0) {
typename IteratorB0::AccessType *dst_ptr =
reinterpret_cast<typename IteratorB0::AccessType *>(
this->smem_iterator_B0_.get());
@ -390,10 +390,10 @@ public:
IteratorB1::kAccessesPerVector);
this->smem_iterator_B1_.set_iteration_index(group_start_B1);
// LDGSTS for operand B
// cp.async for operand B
CUTLASS_PRAGMA_UNROLL
for (int j = 0; j < Detail::kAccessesPerGroupB1; ++j) {
if (group_start_B1 + j < Detail::TBLDGSTSIterationsB1) {
if (group_start_B1 + j < Detail::TBLoadIterationsB1) {
typename IteratorB1::AccessType *dst_ptr =
reinterpret_cast<typename IteratorB1::AccessType *>(
this->smem_iterator_B1_.get());
@ -453,9 +453,9 @@ public:
iterator_A0.set_iteration_index(0);
this->smem_iterator_A0_.set_iteration_index(0);
// LDGSTS for operand A
// cp.async for operand A
CUTLASS_PRAGMA_UNROLL
for (int j = 0; j < Detail::TBLDGSTSIterationsA0; ++j) {
for (int j = 0; j < Detail::TBLoadIterationsA0; ++j) {
typename IteratorA0::AccessType *dst_ptr =
reinterpret_cast<typename IteratorA0::AccessType *>(
this->smem_iterator_A0_.get());
@ -481,9 +481,9 @@ public:
iterator_B0.set_iteration_index(0);
this->smem_iterator_B0_.set_iteration_index(0);
// LDGSTS for operand B
// cp.async for operand B
CUTLASS_PRAGMA_UNROLL
for (int j = 0; j < Detail::TBLDGSTSIterationsB0; ++j) {
for (int j = 0; j < Detail::TBLoadIterationsB0; ++j) {
typename IteratorB0::AccessType *dst_ptr =
reinterpret_cast<typename IteratorB0::AccessType *>(
this->smem_iterator_B0_.get());
@ -689,9 +689,9 @@ public:
iterator_B1.set_iteration_index(0);
this->smem_iterator_B1_.set_iteration_index(0);
// LDGSTS for operand B
// cp.async for operand B
CUTLASS_PRAGMA_UNROLL
for (int j = 0; j < Detail::TBLDGSTSIterationsB1; ++j) {
for (int j = 0; j < Detail::TBLoadIterationsB1; ++j) {
typename IteratorB1::AccessType *dst_ptr =
reinterpret_cast<typename IteratorB1::AccessType *>(
this->smem_iterator_B1_.get());

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
# Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: BSD-3-Clause
#
# Redistribution and use in source and binary forms, with or without

View File

@ -1,5 +1,5 @@
/***************************************************************************************************
* Copyright (c) 2017 - 2022 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* Copyright (c) 2017 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
* SPDX-License-Identifier: BSD-3-Clause
*
* Redistribution and use in source and binary forms, with or without

Some files were not shown because too many files have changed in this diff Show More