cutlass

Author	SHA1	Message	Date
Wenzhuo Liu	7a458f00a6	fix(permute.h): incorrect comment in `Tensor5DPermute20314` (#637 ) * fix(permute.h): incorrect comment in `Tensor5DPermute20314` * typo in usage in example 39	2022-09-22 09:21:13 -04:00
Haicheng Wu	97bff52e8c	add two missing files (#636 ) Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2022-09-21 15:42:42 -04:00
Tianqi Zhang (张天启)	9f2e3faa69	fix call of GELU_Taylor in LinearCombinationGeneric (#634 )	2022-09-20 21:00:55 -04:00
Ying Zhang	a821280dc7	Gemm broadcast (#632 ) * gemm_universal_with_broadcast, +2 sources. * Revert "gemm_universal_with_broadcast, +2 sources." This reverts commit `fb063251f2`. * gemm_universal_with_broadcast separated version. * Update copyright banner. * update banner	2022-09-20 10:37:12 -04:00
Wenzhuo Liu	f73374a1eb	fix:comment typo in example 23 (#633 )	2022-09-19 09:54:14 -04:00
Yujia Zhai	faab7536fc	add comment (#628 )	2022-09-17 21:40:30 -04:00
Andrew Kerr	fc9ebc645b	CUTLASS 2.10 bug fixes and minor updates. (#626 ) v2.10.0	2022-09-15 16:20:33 -04:00
alexfreudenberg	2cc2c7ba1f	Add set_k_partition function (#624 ) A member function set_k_partition is required for the instatiation of cutlass::gemm::kernel::Gemm, even though SplitKSerial is false	2022-09-13 22:34:20 -04:00
ANIKET SHIVAM	50ceed7154	Minor README fix (#623 ) * minor fix * Minor fix	2022-09-12 22:40:25 -04:00
ANIKET SHIVAM	e773429f7e	CUTLASS 2.10 updates (#622 ) Co-authored-by: Aniket Shivam <ashivam@nvidia.com>	2022-09-12 21:26:30 -04:00
Yujia Zhai	beae168f90	fix broken link (#620 ) Co-authored-by: yuzhai <yuzhai@nvidia.com>	2022-09-06 16:32:44 -04:00
Jack Kosaian	f29d8f7ca9	Include vector in base_grouped.h (#618 )	2022-09-06 13:21:23 -04:00
Yujia Zhai	b1d3f9b2fd	upstream internal updates (#616 ) Co-authored-by: yuzhai <yuzhai@nvidia.com>	2022-09-04 23:05:09 -04:00
ANIKET SHIVAM	b72cbf957d	CUTLASS 2.10 (#615 ) Co-authored-by: Aniket Shivam <ashivam@nvidia.com>	2022-09-03 18:48:46 -04:00
Cliff Burdick	ca23ff7924	Fixed typo in class name (#608 )	2022-08-29 20:51:52 -04:00
Cliff Burdick	1c3d400b14	Added `value_type` trait to complex to make it an easier drop-in replacement for std::complex. (#607 )	2022-08-28 01:12:40 -04:00
Cliff Burdick	abafbf2afd	Missing comma in trmm header (#604 )	2022-08-25 16:07:33 -04:00
Cliff Burdick	536b20763e	Fixed typo in profiler README (#603 )	2022-08-24 21:55:13 -04:00
Haicheng Wu	497b499d9d	Add residual support for shmem staging iterator used in back-to-back GEMM fusion. This allows support of problem_size_0_n that is not multiple of 32. (#590 ) Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2022-08-15 11:19:24 -04:00
Jack Kosaian	e66bfcb1f8	Fix for #596 (typo in example 03) (#597 ) * [examples] Fix typos in SYRK and TRMM examples * Fix typo in example 03	2022-08-09 09:58:36 -04:00
Michaël Benesty	1617685a77	fix: fix types in example 06 (#587 )	2022-07-29 12:46:06 -04:00
dan_the_3rd	25ebf15d02	Ensure all arch::Mma specializations have ElementC set (#576 ) Co-authored-by: danthe3rd <danthe3rd@users.noreply.github.com>	2022-07-22 23:53:03 -04:00
Shang Zhang	5d05808072	fix gather example (#574 )	2022-07-19 16:18:17 -04:00
Ivan Komarov	0b8cacd6f1	Remove redundant <fstream> includes (#563 ) * Remove redundant <fstream> includes * Fix fstream in examples/ * Fix <fstream> in test/ * Use consistent order for <fstream> (always after <iostream>) * Remove an unneeded include in a file where std::ofstream usage is commented out Co-authored-by: Ivan Komarov <dfyz@yandex-team.ru>	2022-07-19 15:23:54 -04:00
Haicheng Wu	e7a61c761a	fix race condition when h < stride_h or w < stride_w (#562 ) Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2022-07-12 16:37:08 -04:00
seventh	fb379eaa5b	epilogue leaky relu support ScaleType (#564 ) Co-authored-by: xuweiqi <xuweiqi117@gmail.com>	2022-07-11 17:30:55 -04:00
Jacob He	8a766804ad	Fix doc in testbed_gemm_with_broadcast (#559 )	2022-07-07 09:56:16 -04:00
Bing Xu	1eb6355182	[activation] tanh (#550 ) Co-authored-by: Bing Xu <bingxu@fb.com>	2022-07-02 08:00:45 -04:00
Yujia Zhai	04a9777b87	Softmax (#546 ) * add test layernorm g-mem version * Delete include/configure directory * Delete examples/test_layernorm directory * Update gemm_with_softmax.h * Update gemm_softmax.cu * Update linear_combination.h * Update fast_math.h * remove redundant vars Co-authored-by: yujia.zhai <yujia.zhai@bytedance.com> Co-authored-by: yuzhai <yuzhai@nvidia.com>	2022-07-02 01:19:18 -04:00
Haicheng Wu	e45e773436	Update linear_combination_generic.h (#472 ) add `skip_elementwise_` to support serial splitk in linear_combination_generic.h` v2.9.1	2022-06-28 07:29:38 -04:00
Haicheng Wu	dae6b6893b	Update CHANGELOG.md	2022-06-27 23:30:49 -04:00
Haicheng Wu	ba18ea9c32	Update README.md	2022-06-27 23:25:26 -04:00
Haicheng Wu	9ab9110168	add leaky relu (#542 ) Authored-by: Haicheng Wu <haichengw@nvidia.com>	2022-06-26 10:07:50 -04:00
Jinze (Richard) Xue	e5d4669f16	Update CHANGELOG.md (#543 )	2022-06-25 13:23:49 -04:00
Haicheng Wu	94f01f19d5	Add implicit gemm perf plot from @manishucsd, presented in gtc'22 cutlass talk	2022-06-23 22:47:11 -04:00
Jack Kosaian	fa56763c25	Fix occupancy calculation for grouped GEMM (#532 )	2022-06-18 19:53:59 -04:00
LiuWei	25e26a6e51	fix bugs in linear_combination_generic.h missing include cutlass/epilogue/thread/scale_type.h (#531 )	2022-06-17 23:35:14 -04:00
Haicheng Wu	f248e9bdb4	Create CITATION.cff Add initial CITATION.cff	2022-06-07 21:25:16 -04:00
Pei Sun	dceefe4f64	Increment stride correctly in warp iterator. (#516 ) Co-authored-by: peisun1115 <peis@google.com>	2022-06-06 12:33:36 -04:00
Pei Sun	c3881d097e	Fix a comment about LDSM layout. (#514 ) Co-authored-by: peisun1115 <peis@google.com>	2022-06-04 23:04:00 -04:00
Pei Sun	a29dfb1c63	Fix a bug to increment stride tile correctly (#503 ) * Fix a bug to increment stride tile correctly * Update regular_tile_access_iterator_tensor_op.h Co-authored-by: peisun1115 <peis@google.com> Co-authored-by: Haicheng Wu <57973641+hwu36@users.noreply.github.com>	2022-06-03 22:54:52 -04:00
Jack Kosaian	0abaac84ea	[examples] Fix typos in SYRK and TRMM examples (#507 )	2022-06-03 22:52:41 -04:00
Haicheng Wu	858c735856	Update gather_scatter_fusion.cu Correct the reference code in gather/scatter example to put bias add in the correct place.	2022-05-18 13:15:25 -04:00
Haicheng Wu	d6f58b2d14	Update functionality.md	2022-05-11 09:34:24 -04:00
Mike Iovine	c4cf0dad82	Fix init-self compiler warnings (#493 ) Fix a few errors caused by trying to initialize a class member with itself. These errors can turn into errors if you compile with `-Winit-self`.	2022-05-11 00:35:28 -04:00
Haicheng Wu	57551902d0	Update functionality.md add some explanations to the functionality table.	2022-05-11 00:01:19 -04:00
Haicheng Wu	1604ebaf10	Update generator.py stop generating analytical conv kernels to reduce kernel number	2022-05-08 21:47:15 -04:00
Haicheng Wu	6023038bae	add verification of the reduction tensor (#489 ) Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2022-05-06 10:24:51 -07:00
TonyZhao	ddd8f9cf41	update float < int32_t * 4 (#488 ) Co-authored-by: 赵俊涛 <zhaojuntao@zhaojuntaos-MacBook-Pro.local>	2022-05-04 13:36:05 -04:00
Haicheng Wu	ec2b4fd85d	b2b bias vector support (#482 ) * b2b bias vector support * add files Co-authored-by: Haicheng Wu <haichengw@nvidia.com>	2022-04-30 04:16:15 -07:00

1 2 3 4 5 ...

276 Commits