1. Add 'pragma once' preprocess directive 2. Replace prmt PTX with __byte_perm intrinsic Signed-off-by: Peter Han <fujun.han@iluvatar.ai>
1. Add 'pragma once' preprocess directive 2. Replace prmt PTX with __byte_perm intrinsic Signed-off-by: Peter Han <fujun.han@iluvatar.ai>