vllm/source at 8e192ff967b44b186ea02d30e49fddf656fdfe50 - vllm

Files

Eric Xihui Lin 8e192ff967 [Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 )

Co-authored-by: beagleski <yunanzhang@microsoft.com>
Co-authored-by: bapatra <bapatra@microsoft.com>
Co-authored-by: Barun Patra <codedecde@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>

2024-05-24 22:00:52 -07:00

assets

[Doc] add visualization for multi-stage dockerfile (#4456 )

2024-04-30 17:41:59 +00:00

community

[Docs] Add acknowledgment for sponsors (#4925 )

2024-05-21 00:17:25 -07:00

dev

[Doc] Add API reference for offline inference (#4710 )

2024-05-13 17:47:42 -07:00

getting_started

[Doc] add ccache guide in doc (#5012 )

2024-05-23 23:21:54 +00:00

models

[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 )

2024-05-24 22:00:52 -07:00

offline_inference

[Doc] Add API reference for offline inference (#4710 )

2024-05-13 17:47:42 -07:00

quantization

Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 )

2024-04-03 14:15:55 -07:00

serving

Support to serve vLLM on Kubernetes with LWS (#4829 )

2024-05-16 16:37:29 -07:00

conf.py

[CI] Disable non-lazy string operation on logging (#4326 )

2024-04-26 00:16:58 -07:00

generate_examples.py

Add example scripts to documentation (#4225 )

2024-04-22 16:36:54 +00:00

index.rst

[Docs] Add acknowledgment for sponsors (#4925 )

2024-05-21 00:17:25 -07:00