[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208)

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
This commit is contained in:
jon-chuang
2024-08-12 15:47:41 -07:00
committed by GitHub
parent 4ddc4743d7
commit a046f86397
10 changed files with 208 additions and 47 deletions

View File

@ -45,5 +45,3 @@ Here is an example of how to enable this feature:
# output w/ scaling factors: England, the United Kingdom, and one of the world's leading financial,
# output w/o scaling factors: England, located in the southeastern part of the country. It is known
Note, current prefix caching doesn't work with FP8 KV cache enabled, forward_prefix kernel should handle different KV and cache type.

View File

@ -32,5 +32,3 @@ Here is an example of how to enable this feature:
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
Note, current prefix caching doesn't work with FP8 KV cache enabled, forward_prefix kernel should handle different KV and cache type.