Gregory Meyer (gregjm)
ecbd24566c
Enable shared memory intrinsics and ldmatrix PTX on Clang. ( #754 )
...
* Enable shared memory intrinsics and ldmatrix PTX on Clang.
This commit adds preprocessor checks to enable the shared memory
intrinsics `__cvta_generic_to_shared` and `__nvvm_get_smem_pointer`, as
well as the `ldmatrix` PTX instructions, on Clang. Preventing these
intrinsics from being used is a significant latency regression on Clang.
* refine the macro
---------
Co-authored-by: Haicheng Wu <haichengw@nvidia.com >
2023-03-31 21:42:24 -04:00
..
2023-03-31 21:42:24 -04:00
2023-03-20 17:25:27 -04:00
2023-03-10 12:58:17 -05:00
2023-03-29 10:42:40 -04:00
2023-03-29 11:59:48 -04:00
2023-01-20 16:32:57 -05:00
2023-03-09 23:22:56 -05:00
2023-01-20 16:32:57 -05:00
2023-03-09 23:22:56 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-23 20:55:28 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-23 20:55:28 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-23 20:55:28 -05:00
2023-01-23 20:55:28 -05:00
2023-01-20 16:32:57 -05:00
2023-03-09 15:27:40 -05:00
2023-01-20 16:32:57 -05:00
2023-03-28 17:47:10 -04:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-23 20:55:28 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-23 20:55:28 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-23 20:55:28 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-01-20 16:32:57 -05:00
2023-03-02 11:17:21 -05:00
2023-03-09 15:27:40 -05:00