Files
cutlass/examples
dan_the_3rd 1b4e24470a Example 43 - DualGemm (#670)
* Ex50 wip

* IS_PROFILING mode

* MultiStage2 - but is slower

* Add SwiGLU

* Support SplitKSerial reduction
Support not storing D0/D1
Cleanup code

* Option to disable bias

* Renumber example

* Fix build

* Remove references to pb_size_0 / pb_size_1

* Add support for bf16 inputs with float accum

* small changes

Co-authored-by: danthe3rd <danthe3rd>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-10-26 14:04:42 -04:00
..
2022-04-23 15:02:38 -04:00
2022-04-23 15:02:38 -04:00
2022-04-23 15:02:38 -04:00
2022-04-23 15:02:38 -04:00
2022-04-23 15:02:38 -04:00
2022-09-03 18:48:46 -04:00
2022-04-23 15:02:38 -04:00
2022-04-23 15:02:38 -04:00
2022-04-23 15:02:38 -04:00
2022-09-03 18:48:46 -04:00
2022-09-03 18:48:46 -04:00
2022-10-26 14:04:42 -04:00
2019-11-19 16:55:34 -08:00
2022-10-26 14:04:42 -04:00