vllm/models at e9d3aa04f6e55e2bb540f0810da97ddd0deebb13 - vllm

Files

Eric Xihui Lin 8e192ff967 [Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 )

Co-authored-by: beagleski <yunanzhang@microsoft.com>
Co-authored-by: bapatra <bapatra@microsoft.com>
Co-authored-by: Barun Patra <codedecde@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>

2024-05-24 22:00:52 -07:00

adding_model.rst

[Doc]: Update the doc of adding new models (#4236 )

2024-04-21 09:57:08 -07:00

engine_args.rst

Don't show default value for flags in EngineArgs (#4223 )

2024-04-21 09:15:28 -07:00

lora.rst

[Doc] Add docs about OpenAI compatible server (#3288 )

2024-03-18 22:05:34 -07:00

performance.rst

[Scheduler] Warning upon preemption and Swapping (#4647 )

2024-05-13 23:50:44 +09:00

supported_models.rst

[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model (#4799 )

2024-05-24 22:00:52 -07:00