vllm/piecewise at 4ca175ea0b83e92a0886fee2a74bbd3fc9fcb478 - vllm

Files

Daisy-Ma-coder 2a8bd2b93b [CLI env var] Add VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH in env variables (#25274 )

Signed-off-by: qqma <qqma@amazon.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: qqma <qqma@amazon.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>

2025-10-03 13:35:53 -07:00

__init__.py

[torch.compile] rework compile control with piecewise cudagraph (#9715 )

2024-10-29 23:03:49 -07:00

test_full_cudagraph.py

[CLI env var] Add VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH in env variables (#25274 )

2025-10-03 13:35:53 -07:00

test_multiple_graphs.py

[CI] execute all piecewise compilation tests together (#24502 )

2025-09-09 11:05:25 -07:00

test_simple.py

[torch.compile] CUDAGraph Inductor partition integration (#24281 )

2025-10-03 13:35:53 -07:00

test_toy_llama.py

[CI] execute all piecewise compilation tests together (#24502 )

2025-09-09 11:05:25 -07:00