5cbe8d155c
[Core] Registry for processing model inputs ( #5214 )
...
Co-authored-by: ywang96 <ywang@roblox.com >
2024-06-28 12:09:56 +00:00
79c92c7c8a
[Model] Add Gemma 2 ( #5908 )
2024-06-27 13:33:56 -07:00
96354d6a29
[Model] Add base class for LoRA-supported models ( #5018 )
2024-06-27 16:03:04 +08:00
3aa7b6cf66
[Misc][Doc] Add Example of using OpenAI Server with VLM ( #5832 )
2024-06-25 20:34:25 -07:00
f23871e9ee
[Doc] Add notice about breaking changes to VLMs ( #5818 )
2024-06-25 01:25:03 -07:00
1744cc99ba
[Doc] Add Phi-3-medium to list of supported models ( #5788 )
2024-06-24 10:48:55 -07:00
daef218b55
[Model] Initialize Phi-3-vision support ( #4986 )
2024-06-17 19:34:33 -07:00
0ce7b952f8
[Doc] Update LLaVA docs ( #5437 )
...
Co-authored-by: Roger Wang <ywang@roblox.com >
2024-06-13 11:22:07 -07:00
89ec06c33b
[Docs] [Spec decode] Fix docs error in code example ( #5427 )
2024-06-11 10:31:56 -07:00
4c2ffb28ff
[Speculative decoding] Initial spec decode docs ( #5400 )
2024-06-11 10:15:40 -07:00
246598a6b1
[CI] docfix ( #5410 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: ywang96 <ywang@roblox.com >
2024-06-11 01:28:50 -07:00
856c990041
[Docs] Add Docs on Limitations of VLM Support ( #5383 )
2024-06-10 09:53:50 -07:00
6b29d6fe70
[Model] Initial support for LLaVA-NeXT ( #4199 )
...
Co-authored-by: Roger Wang <ywang@roblox.com >
2024-06-10 12:47:15 +00:00
7a9cb294ae
[Frontend] Add OpenAI Vision API Support ( #5237 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-06-07 11:23:32 -07:00
7a64d24aad
[Core] Support image processor ( #4197 )
2024-06-02 22:56:41 -07:00
657579113f
[Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support ( #5171 )
2024-05-31 17:20:19 -07:00
8e192ff967
[Kernel][Backend][Model] Blocksparse flash attention kernel and Phi-3-Small model ( #4799 )
...
Co-authored-by: beagleski <yunanzhang@microsoft.com >
Co-authored-by: bapatra <bapatra@microsoft.com >
Co-authored-by: Barun Patra <codedecde@users.noreply.github.com >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
2024-05-24 22:00:52 -07:00
f12c3b5b3d
[Model] Add Phi-2 LoRA support ( #4886 )
2024-05-21 14:24:17 +09:00
ac1fbf7fd2
[Doc] Shorten README by removing supported model list ( #4796 )
2024-05-13 16:23:54 -07:00
e7c46b9527
[Scheduler] Warning upon preemption and Swapping ( #4647 )
...
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com >
2024-05-13 23:50:44 +09:00
51d4094fda
chunked-prefill-doc-syntax ( #4603 )
...
Fix the docs: https://docs.vllm.ai/en/latest/models/performance.html
Co-authored-by: sang <rkooo567@gmail.com >
2024-05-10 14:13:23 +09:00
36fb68f947
[Doc] Chunked Prefill Documentation ( #4580 )
2024-05-04 00:18:00 -07:00
fbf152d976
[Bugfix][Model] Refactor OLMo model to support new HF format in transformers 4.40.0 ( #4324 )
...
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2024-04-25 09:35:56 -07:00
96e90fdeb3
[Model] Adds Phi-3 support ( #4298 )
2024-04-25 03:06:57 +00:00
7f2593b164
[Doc]: Update the doc of adding new models ( #4236 )
2024-04-21 09:57:08 -07:00
fe7d648fe5
Don't show default value for flags in EngineArgs ( #4223 )
...
Co-authored-by: Harry Mellor <hmellor@oxts.com >
2024-04-21 09:15:28 -07:00
682789d402
Fix missing docs and out of sync EngineArgs ( #4219 )
...
Co-authored-by: Harry Mellor <hmellor@oxts.com >
2024-04-19 20:51:33 -07:00
705578ae14
[Docs] document that Meta Llama 3 is supported ( #4175 )
2024-04-18 10:55:48 -07:00
d619ae2d19
[Doc] Add better clarity for tensorizer usage ( #4090 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com >
2024-04-15 13:28:25 -07:00
aceb17cf2d
[Docs] document that mixtral 8x22b is supported ( #4073 )
2024-04-14 14:35:55 -07:00
711a000255
[Frontend] [Core] feat: Add model loading using tensorizer ( #3476 )
2024-04-13 17:13:01 -07:00
e35397468f
[Doc] Add doc to state our model support policy ( #3948 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com >
2024-04-10 17:03:02 +00:00
b4543c8f6b
[Model] add minicpm ( #3893 )
2024-04-08 18:28:36 +08:00
95baec828f
[Core] enable out-of-tree model register ( #3871 )
2024-04-06 17:11:41 -07:00
78107fa091
[Doc]Add asynchronous engine arguments to documentation. ( #3810 )
...
Co-authored-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com >
2024-04-04 21:52:01 -07:00
d6ea427f04
[Model] Add support for Qwen2MoeModel ( #3346 )
2024-03-28 15:19:59 +00:00
6d9aa00fc4
[Docs] Add Command-R to supported models ( #3669 )
2024-03-27 15:20:00 -07:00
e24336b5a7
[Model] Add support for DBRX ( #3660 )
2024-03-27 13:01:46 -07:00
e66b629c04
[Misc] Minor fix in KVCache type ( #3652 )
2024-03-26 23:14:06 -07:00
76879342a3
[Doc]add lora support ( #3649 )
2024-03-27 02:06:46 +00:00
4c07dd28c0
[ 🚀 Ready to be merged] Added support for Jais models ( #3183 )
2024-03-21 09:45:24 +00:00
ef65dcfa6f
[Doc] Add docs about OpenAI compatible server ( #3288 )
2024-03-18 22:05:34 -07:00
657061fdce
[docs] Add LoRA support information for models ( #3299 )
2024-03-11 00:54:51 -07:00
ce4f5a29fb
Add Automatic Prefix Caching ( #2762 )
...
Co-authored-by: ElizaWszola <eliza@neuralmagic.com >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
2024-03-02 00:50:01 -08:00
a8683102cc
multi-lora documentation fix ( #3064 )
2024-02-27 21:26:15 -08:00
8b430d7dea
[Minor] Fix StableLMEpochForCausalLM -> StableLmForCausalLM ( #3046 )
2024-02-26 20:23:50 -08:00
48a8f4a7fd
Support Orion model ( #2539 )
...
Co-authored-by: zhangdacheng <zhangdacheng@ainirobot.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2024-02-26 19:17:06 -08:00
a9c8212895
[FIX] Add Gemma model to the doc ( #2966 )
2024-02-21 09:46:15 -08:00
ab3a5a8259
Support OLMo models. ( #2832 )
2024-02-18 21:05:15 -08:00
8f36444c4f
multi-LoRA as extra models in OpenAI server ( #2775 )
...
how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py )):
```terminal
$ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
$ python -m vllm.entrypoints.api_server \
--model meta-llama/Llama-2-7b-hf \
--enable-lora \
--lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH
```
the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs
no work has been done here to scope client permissions to specific models
2024-02-17 12:00:48 -08:00