Commit Graph

276 Commits

Author SHA1 Message Date
7a458f00a6 fix(permute.h): incorrect comment in Tensor5DPermute20314 (#637)
* fix(permute.h): incorrect comment in `Tensor5DPermute20314`

* typo in usage in example 39
2022-09-22 09:21:13 -04:00
97bff52e8c add two missing files (#636)
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-09-21 15:42:42 -04:00
9f2e3faa69 fix call of GELU_Taylor in LinearCombinationGeneric (#634) 2022-09-20 21:00:55 -04:00
a821280dc7 Gemm broadcast (#632)
* gemm_universal_with_broadcast, +2 sources.

* Revert "gemm_universal_with_broadcast, +2 sources."

This reverts commit fb063251f2.

* gemm_universal_with_broadcast separated version.

* Update copyright banner.

* update banner
2022-09-20 10:37:12 -04:00
f73374a1eb fix:comment typo in example 23 (#633) 2022-09-19 09:54:14 -04:00
faab7536fc add comment (#628) 2022-09-17 21:40:30 -04:00
fc9ebc645b CUTLASS 2.10 bug fixes and minor updates. (#626) v2.10.0 2022-09-15 16:20:33 -04:00
2cc2c7ba1f Add set_k_partition function (#624)
A member function set_k_partition is required for the instatiation of cutlass::gemm::kernel::Gemm, even though SplitKSerial is false
2022-09-13 22:34:20 -04:00
50ceed7154 Minor README fix (#623)
* minor fix

* Minor fix
2022-09-12 22:40:25 -04:00
e773429f7e CUTLASS 2.10 updates (#622)
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2022-09-12 21:26:30 -04:00
beae168f90 fix broken link (#620)
Co-authored-by: yuzhai <yuzhai@nvidia.com>
2022-09-06 16:32:44 -04:00
f29d8f7ca9 Include vector in base_grouped.h (#618) 2022-09-06 13:21:23 -04:00
b1d3f9b2fd upstream internal updates (#616)
Co-authored-by: yuzhai <yuzhai@nvidia.com>
2022-09-04 23:05:09 -04:00
b72cbf957d CUTLASS 2.10 (#615)
Co-authored-by: Aniket Shivam <ashivam@nvidia.com>
2022-09-03 18:48:46 -04:00
ca23ff7924 Fixed typo in class name (#608) 2022-08-29 20:51:52 -04:00
1c3d400b14 Added value_type trait to complex to make it an easier drop-in replacement for std::complex. (#607) 2022-08-28 01:12:40 -04:00
abafbf2afd Missing comma in trmm header (#604) 2022-08-25 16:07:33 -04:00
536b20763e Fixed typo in profiler README (#603) 2022-08-24 21:55:13 -04:00
497b499d9d Add residual support for shmem staging iterator used in back-to-back GEMM fusion. This allows support of problem_size_0_n that is not multiple of 32. (#590)
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-08-15 11:19:24 -04:00
e66bfcb1f8 Fix for #596 (typo in example 03) (#597)
* [examples] Fix typos in SYRK and TRMM examples

* Fix typo in example 03
2022-08-09 09:58:36 -04:00
1617685a77 fix: fix types in example 06 (#587) 2022-07-29 12:46:06 -04:00
25ebf15d02 Ensure all arch::Mma specializations have ElementC set (#576)
Co-authored-by: danthe3rd <danthe3rd@users.noreply.github.com>
2022-07-22 23:53:03 -04:00
5d05808072 fix gather example (#574) 2022-07-19 16:18:17 -04:00
0b8cacd6f1 Remove redundant <fstream> includes (#563)
* Remove redundant <fstream> includes

* Fix fstream in examples/

* Fix <fstream> in test/

* Use consistent order for <fstream> (always after <iostream>)

* Remove an unneeded include in a file where std::ofstream usage is commented out

Co-authored-by: Ivan Komarov <dfyz@yandex-team.ru>
2022-07-19 15:23:54 -04:00
e7a61c761a fix race condition when h < stride_h or w < stride_w (#562)
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-07-12 16:37:08 -04:00
fb379eaa5b epilogue leaky relu support ScaleType (#564)
Co-authored-by: xuweiqi <xuweiqi117@gmail.com>
2022-07-11 17:30:55 -04:00
8a766804ad Fix doc in testbed_gemm_with_broadcast (#559) 2022-07-07 09:56:16 -04:00
1eb6355182 [activation] tanh (#550)
Co-authored-by: Bing Xu <bingxu@fb.com>
2022-07-02 08:00:45 -04:00
04a9777b87 Softmax (#546)
* add test layernorm g-mem version

* Delete include/configure directory

* Delete examples/test_layernorm directory

* Update gemm_with_softmax.h

* Update gemm_softmax.cu

* Update linear_combination.h

* Update fast_math.h

* remove redundant vars

Co-authored-by: yujia.zhai <yujia.zhai@bytedance.com>
Co-authored-by: yuzhai <yuzhai@nvidia.com>
2022-07-02 01:19:18 -04:00
e45e773436 Update linear_combination_generic.h (#472)
add `skip_elementwise_` to support serial splitk in linear_combination_generic.h`
v2.9.1
2022-06-28 07:29:38 -04:00
dae6b6893b Update CHANGELOG.md 2022-06-27 23:30:49 -04:00
ba18ea9c32 Update README.md 2022-06-27 23:25:26 -04:00
9ab9110168 add leaky relu (#542)
Authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-06-26 10:07:50 -04:00
e5d4669f16 Update CHANGELOG.md (#543) 2022-06-25 13:23:49 -04:00
94f01f19d5 Add implicit gemm perf
plot from @manishucsd, presented in gtc'22 cutlass talk
2022-06-23 22:47:11 -04:00
fa56763c25 Fix occupancy calculation for grouped GEMM (#532) 2022-06-18 19:53:59 -04:00
25e26a6e51 fix bugs in linear_combination_generic.h missing include cutlass/epilogue/thread/scale_type.h (#531) 2022-06-17 23:35:14 -04:00
f248e9bdb4 Create CITATION.cff
Add initial CITATION.cff
2022-06-07 21:25:16 -04:00
dceefe4f64 Increment stride correctly in warp iterator. (#516)
Co-authored-by: peisun1115 <peis@google.com>
2022-06-06 12:33:36 -04:00
c3881d097e Fix a comment about LDSM layout. (#514)
Co-authored-by: peisun1115 <peis@google.com>
2022-06-04 23:04:00 -04:00
a29dfb1c63 Fix a bug to increment stride tile correctly (#503)
* Fix a bug to increment stride tile correctly

* Update regular_tile_access_iterator_tensor_op.h

Co-authored-by: peisun1115 <peis@google.com>
Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>
2022-06-03 22:54:52 -04:00
0abaac84ea [examples] Fix typos in SYRK and TRMM examples (#507) 2022-06-03 22:52:41 -04:00
858c735856 Update gather_scatter_fusion.cu
Correct the reference code in gather/scatter example to put bias add in the correct place.
2022-05-18 13:15:25 -04:00
d6f58b2d14 Update functionality.md 2022-05-11 09:34:24 -04:00
c4cf0dad82 Fix init-self compiler warnings (#493)
Fix a few errors caused by trying to initialize a class member
with itself. These errors can turn into errors if you compile
with `-Winit-self`.
2022-05-11 00:35:28 -04:00
57551902d0 Update functionality.md
add some explanations to the functionality table.
2022-05-11 00:01:19 -04:00
1604ebaf10 Update generator.py
stop generating analytical conv kernels to reduce kernel number
2022-05-08 21:47:15 -04:00
6023038bae add verification of the reduction tensor (#489)
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-05-06 10:24:51 -07:00
ddd8f9cf41 update float < int32_t * 4 (#488)
Co-authored-by: 赵俊涛 <zhaojuntao@zhaojuntaos-MacBook-Pro.local>
2022-05-04 13:36:05 -04:00
ec2b4fd85d b2b bias vector support (#482)
* b2b bias vector support

* add files

Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-04-30 04:16:15 -07:00