dan_the_3rd
1b4e24470a
Example 43 - DualGemm (#670)
* Ex50 wip
* IS_PROFILING mode
* MultiStage2 - but is slower
* Add SwiGLU
* Support SplitKSerial reduction
Support not storing D0/D1
Cleanup code
* Option to disable bias
* Renumber example
* Fix build
* Remove references to pb_size_0 / pb_size_1
* Add support for bf16 inputs with float accum
* small changes
Co-authored-by: danthe3rd <danthe3rd>
Co-authored-by: Haicheng Wu <haichengw@nvidia.com>
2022-10-26 14:04:42 -04:00
..
2022-10-26 14:04:42 -04:00
2022-10-26 14:04:42 -04:00
2022-10-26 14:04:42 -04:00
2022-10-26 14:04:42 -04:00
2022-10-26 14:04:42 -04:00
2022-10-26 14:04:42 -04:00
2022-10-26 14:04:42 -04:00
2022-10-26 14:04:42 -04:00