Commit Graph

225 Commits

Author SHA1 Message Date
21c1fa3849 add .github (#479)
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-04-28 12:36:59 -07:00
8c339ac039 Fix compilation in clang (#478)
- adds missing commas
- adjusts misaligned usage of CUTLASS_DEVICE between
  template declaration and specializations

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
2022-04-28 14:22:06 -04:00
e49f690fd7 Update linear_combination_generic.h 2022-04-28 14:04:53 -04:00
96dad61a75 Update CHANGELOG.md 2022-04-28 10:52:10 -04:00
cc2ea4c3fc Update README.md 2022-04-28 10:50:11 -04:00
a0de301283 Used relative paths for includes (#477) 2022-04-27 12:04:23 -07:00
319a389f42 Update CMakeLists.txt (#473)
* Update CMakeLists.txt

Add 128bit int support if using nvc++ to solve #310 

@jeffhammond, would you please give it a try?

* Update CMakeLists.txt

correct copy paste error
v2.9.0
2022-04-27 07:02:26 -07:00
71def2f084 Use platform:: instead of std::abs and std::conditional (#452)
* Fixed template struct/class mismatch

* Use platform implementation instead of std::abs and std::conditional during nvrtc compilation

* Use platform implementation instead of std::abs and std::conditional during nvrtc compilation

* Revert absolute_value() usage
2022-04-25 14:40:22 -04:00
70f3ba57f5 Fix typo in shared memory layout description (#471) 2022-04-24 18:32:13 -04:00
dd77fadc70 Remove redundant offset def and init in shared_load_iterator.h (#456)
Signed-off-by: Fujun Han <fujun.han@iluvatar.ai>
2022-04-24 16:31:00 -04:00
be4578d517 Fixed template struct/class mismatch (#453) 2022-04-24 16:30:21 -04:00
d7b499deff Fix CUDA_PERROR_EXIT and print failing expression (#446)
`CUDA_PERROR_EXIT ` can lead to incorrect usage (see e.g. [this description](https://www.cs.technion.ac.il/users/yechiel/c++-faq/macros-with-if.html)) because it contains an incomplete `if` expression. Consider:

```
if (condition)
    CUDA_PERROR_EXIT(cudaFree(x))
else
    free(x);
```

The author of the code forgot to add a semicolon after the macro. In that case, the `else` will bind to the `if` inside the macro definition, leading to code that the author did not intend or expect. It the author does use a semicolon, the code will not compile, which is awkward.

The change adds a `do while` around the `if`, which always requires a semicolon.

This PR also adds the text of the failing expression to the printed error message.
2022-04-24 16:29:43 -04:00
310ed81ac3 fix description in example 12. (#444)
Co-authored-by: Exusial <Exusial>
2022-04-24 16:29:06 -04:00
4c0d6e1eb4 [BUGFIX]: Force unroll a loop that doesn't have compilation constant (#441)
loop times is dangerous.

Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2022-04-24 16:28:32 -04:00
167ac54c65 Fix link to Python example (#469) 2022-04-23 15:37:38 -04:00
12f4108ac2 CUTLASS 2.9 (#468) 2022-04-23 15:02:38 -04:00
dd571f0edb [style] fix code indentation (#449)
* [docs] fix typo in media/docs/layout.md

* [docs] fix comment error

* fix typo in include/cutlass/arch/simd_61.h

* fix stride comment errors in TensorLayout

* fix indentation
2022-04-03 21:13:17 -04:00
6d0d265047 Update PUBLICATIONS.md (#447) 2022-04-03 21:03:28 -04:00
f11fa975a5 Update PUBLICATIONS.md
@tsuki
2022-03-23 21:04:43 -04:00
0e71d9b450 Transposed conv2d and wgrad split k examples (#413)
* add split k wgrad example

* wgrad done

* begin transposed conv2d example

* update transposed conv2d example and add ref check

* update doc for conv2d transpose example

* add license

* add wgrad doc

* more clarification on GEMM output type

* typo fix

* clean up indent

* address comments

* rename example numbers to 34 and 35

* GEMM -> Implicit GEMM

* Revert "rename example numbers to 34 and 35"

This reverts commit 551a808c22.

* transposed_conv2d is 34

* add compiler and device version check to exit gracefully

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-03-23 14:52:54 -04:00
eb0d4c9213 [library] pass pointer of arguments to get_host_workspace_size() in gemm_universal() (#412)
Otherwise GemmUniversalOperation::get_host_workspace_size() will fail on SegmentFault.
2022-03-22 12:36:34 -04:00
bc45e2c023 fixed datatype error of numeric_limit for uint1b_t (#419)
Co-authored-by: Haojin Yang <haojin.yang@.hpi.uni-potsdam.de>
2022-03-22 12:30:30 -04:00
095cbba57c Example 23 - Passing correct alpha and beta values with --parallel-split-k (#424)
When split-k is enabled, we should set alpha to 1 and beta to 0 for the
split-k gemm kernel.

The fix was from hwu36. I only did fixed some minor typos along with his
fix.
2022-03-22 12:27:34 -04:00
8f1fe7a132 Fix separate compilation -dc (#433)
* Fix separate compilation `-dc`

- when cutlass is included in multiple compilation units
  compiled with `-dc` OOB_NAN_F16x8 device constant is
  instantiated multiple times causing
  Multiple definition of '_ZN7cutlass4arch13OOB_NAN_F16x8E' error
  This PR makes this variable a local constant as it is not
  modified during runtime

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

* Fix

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

* Test GH

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

* Revert test GH

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
2022-03-22 12:21:18 -04:00
3ab1eacf09 Fix typo in profiler examples (#437) 2022-03-21 12:00:13 -04:00
cd39c75e25 Fix typo in docs, code comments (#429)
* [docs] fix typo in media/docs/layout.md

* [docs] fix comment error

* fix typo in include/cutlass/arch/simd_61.h

* fix stride comment errors in TensorLayout
2022-03-15 21:54:36 -04:00
b2e1e97cb1 Update PUBLICATIONS.md
ACM Trans on Graphics from nv research.
2022-03-01 22:37:18 -05:00
96a11a1ef3 Removed trivial copy constructors on parameter classes to enable devi… (#366)
* Removed trivial copy constructors on parameter classes to enable device-side launch of CUTLASS kernels

* Added SFINAE to the `TensorRef(NonConstTensorRef const&)` constructor to avoid making it a copy-constructor for device code

* std => platform

* fix affine2

* really fix affine2

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-02-28 21:34:02 -05:00
e96f00586c Make cutlass::gemm::device::GemmArray usable (#295)
* Fix the build of cutlass/gemm/device/gemm_array.h and add a demo for GemmArray

* Add a reference to GemmArray to the docs

Co-authored-by: Ivan Komarov <dfyz@yandex-team.ru>
2022-02-17 20:01:05 -05:00
3cfa5db2a2 Actually use float accumulation in gemm_f16t_f16t_f16t_wmma_tensor_op… (#407)
* Actually use float accumulation in gemm_f16t_f16t_f16t_wmma_tensor_op_f32_sm70.cu

As title

* Update gemm_f16t_f16t_f16t_wmma_tensor_op_f32_sm70.cu

change the missing one

Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
2022-02-16 09:53:21 -05:00
1db6971a8d Remove unused gemm_k_iterations in GemmKernel::Params (#406)
Otherwise we get gemm_k_iterations is uninitialized warnings.
2022-02-16 09:52:45 -05:00
b954127297 Update PUBLICATIONS.md
@jackkosaian
2022-02-14 16:54:32 -05:00
d0d941efc7 [hardswish] correct implmentation (#403)
* [hardswish] correct implmentation

* seems working

* hardswish fp32/fp16x2 optimization

* [relu] half2 support

* add relu0; add multiply_add_relu0;

* cleanup

Co-authored-by: Bing Xu <bingxu@fb.com>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-02-09 14:28:53 -05:00
8a951b2940 Enable convolution with fused epilogue for Volta Tensor Cores (#402)
* Enabled convolution with epilogue fusion for Volta Tensor Cores.

* Compilation fixes

* Disabled testing Volta on Ampere architectures.
2022-01-30 23:24:50 -05:00
1e4703cbab Support parallel split K mode for porfiling (#277)
* Support parallel split K mode for porfiling

Signed-off-by: Peter Han <fujun.han@iluvatar.ai>

* Parallel Split K support

  1. find gemm kernel by preference key
  2. switch m n for redution kernel

Signed-off-by: Peter Han <fujun.han@iluvatar.ai>

* parallel splitk for fp16 gemm

* add one missing file

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-01-27 10:37:37 -05:00
c3353add63 Merge pull request #388 from depaulmillz/fix/headersonly
Fix utils include not being installed in header only
2022-01-26 14:22:51 -06:00
ac8825b941 Minor fix to change from LIBRARY_INIT to LIBRARY 2022-01-26 15:17:46 -05:00
8fd94806e5 Update PUBLICATIONS.md
add mlsys 2022 paper.
2022-01-17 00:08:18 -05:00
d7c9cbf0b9 Fix typo in scripts/library.py (wrong data size for u8) (#393) 2022-01-07 13:29:56 -05:00
c2ee13a0fe Add epilogue functor for residual block fusion (#391)
* Add epilogue functor for residual block fusion

* Do not run split-k tests when ActivationOp is not Identity

* explain TestSplitK param

* return early
2021-12-29 22:53:40 -05:00
f78994bb40 add the missing pieces (#392)
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2021-12-25 04:29:54 -08:00
dceabd4c5a Support half precision sigmoid activation (#378)
* Support half precision sigmoid activation

* introduce a vectorized variant using fast_tanh

* move the math to fast_math.h

* fixed compile

* .raw() -> .to_half()

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2021-12-22 14:45:06 -05:00
86fa1dc30b Fix utils include not being installed in header only 2021-12-21 12:10:26 -05:00
288af365db Added missing synchronization to avoid WAR hazards between tiles. (#386) 2021-12-20 08:34:08 -08:00
0dc3ba60b3 Refactor GELU and Sigmoid epilogue to use a common template (and add SiLu, Hardswish epilogue) (#379)
* Support half precision sigmoid activation

* introduce a vectorized variant using fast_tanh

* refactored sigmoid using the new interface

* refactored gelu

* add silu activation

* add hardswish

* remove sigmoid for now

* add description to silu and hardswish, and other doc update

* Do not ignore Round

* use constant N

* Set isHeavy = true in sigmoid and silu epilogue
2021-12-18 14:58:15 -05:00
ec4f7e5194 Updates to fused epilogue (#383)
* Enhancements and fixes to fused GEMM and Convolution epilogue.
* Need to explicitly list cudart as unit test library dependency.
2021-12-17 16:04:43 -05:00
4e666e1dfd Updated README and added issue templates. (#382) 2021-12-17 09:26:20 -05:00
3799e12f25 Merge pull request #381 from Peter9606/update-makefile-version
Update project version to 2.8.0 in CMakeLists.txt
2021-12-16 21:54:57 -05:00
fc3bc85db8 Update project version to 2.8.0 in CMakeLists.txt
Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
2021-12-17 02:23:31 +00:00
49c0a58d50 Set theme jekyll-theme-minimal 2021-12-15 14:51:24 -05:00