This website requires JavaScript.
Explore
Help
Sign In
youngkingdom
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
Code
Issues
Pull Requests
Actions
Packages
Projects
Releases
Wiki
Activity
Files
9cfa7697c1912fdb6e883eb5acea027e60fef733
vllm
/
vllm
/
attention
History
Chendi.Xue
9d70c103aa
[BUG FIX][NON-CUDA]quick fix to avoid call cudagraph_unsafe in attention (
#25298
)
...
Signed-off-by: Chendi Xue <
Chendi.Xue@intel.com
> Signed-off-by: yewentao256 <
zhyanwentao@126.com
>
2025-10-03 13:35:53 -07:00
..
backends
Encoder model support for the Transformers backend (
#25174
)
2025-09-19 19:15:22 +01:00
layers
Directly get max encoder len from VLLM config in V1 (
#24866
)
2025-09-16 17:52:31 +00:00
ops
Optimize triton unified attention performance for sliding window attention (
#24390
)
2025-10-03 13:35:52 -07:00
utils
[Attention] FlashAttn MLA (
#14258
)
2025-09-04 02:47:59 -07:00
__init__.py
Remove duplicate entry in vllm.attention.__all__ (
#23296
)
2025-08-20 17:14:59 -07:00
layer.py
[BUG FIX][NON-CUDA]quick fix to avoid call cudagraph_unsafe in attention (
#25298
)
2025-10-03 13:35:53 -07:00
selector.py
[gpt-oss] Enable gpt-oss on ampere (
#22714
)
2025-08-12 03:21:44 -07:00