vllm/piecewise at 070da660c1bf9e7a7be8b9efeff4b06f91c7342f - vllm

Files

fhl2000 74f441f4b5 [Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059 )

Signed-off-by: fhl <2410591650@qq.com>
Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>

2025-08-15 10:01:39 -04:00

__init__.py

[torch.compile] rework compile control with piecewise cudagraph (#9715 )

2024-10-29 23:03:49 -07:00

test_full_cudagraph.py

[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059 )

2025-08-15 10:01:39 -04:00

test_multiple_graphs.py

Add test case for compiling multiple graphs (#21044 )

2025-07-23 11:00:47 -07:00

test_simple.py

[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059 )

2025-08-15 10:01:39 -04:00

test_toy_llama.py

[Core] Allow full cudagraph with separate attention routines and orthogonal to compilation, add support for FA2 and FlashInfer (#20059 )

2025-08-15 10:01:39 -04:00