vllm/attention at 8e192ff967b44b186ea02d30e49fddf656fdfe50 - vllm

Files

Eric Xihui Lin 8e192ff967 [Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 )

Co-authored-by: beagleski <yunanzhang@microsoft.com>
Co-authored-by: bapatra <bapatra@microsoft.com>
Co-authored-by: Barun Patra <codedecde@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>

2024-05-24 22:00:52 -07:00

attention_dtypes.h

Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 )

2024-04-03 14:15:55 -07:00

attention_generic.cuh

[CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722 )

2024-05-22 07:18:41 +00:00

attention_kernels.cu

[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 )

2024-05-24 22:00:52 -07:00

attention_utils.cuh

[CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722 )

2024-05-22 07:18:41 +00:00

dtype_bfloat16.cuh

[CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722 )

2024-05-22 07:18:41 +00:00

dtype_float16.cuh

[CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722 )

2024-05-22 07:18:41 +00:00

dtype_float32.cuh

[CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722 )

2024-05-22 07:18:41 +00:00

dtype_fp8.cuh

[CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722 )

2024-05-22 07:18:41 +00:00