ce4f5a29fb
Add Automatic Prefix Caching ( #2762 )
...
Co-authored-by: ElizaWszola <eliza@neuralmagic.com >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
2024-03-02 00:50:01 -08:00
a8683102cc
multi-lora documentation fix ( #3064 )
2024-02-27 21:26:15 -08:00
8b430d7dea
[Minor] Fix StableLMEpochForCausalLM -> StableLmForCausalLM ( #3046 )
2024-02-26 20:23:50 -08:00
48a8f4a7fd
Support Orion model ( #2539 )
...
Co-authored-by: zhangdacheng <zhangdacheng@ainirobot.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2024-02-26 19:17:06 -08:00
a9c8212895
[FIX] Add Gemma model to the doc ( #2966 )
2024-02-21 09:46:15 -08:00
ab3a5a8259
Support OLMo models. ( #2832 )
2024-02-18 21:05:15 -08:00
8f36444c4f
multi-LoRA as extra models in OpenAI server ( #2775 )
...
how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py )):
```terminal
$ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
$ python -m vllm.entrypoints.api_server \
--model meta-llama/Llama-2-7b-hf \
--enable-lora \
--lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH
```
the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs
no work has been done here to scope client permissions to specific models
2024-02-17 12:00:48 -08:00
317b29de0f
Remove Yi model definition, please use LlamaForCausalLM instead ( #2854 )
...
Co-authored-by: Roy <jasonailu87@gmail.com >
2024-02-13 14:22:22 -08:00
4ca2c358b1
Add documentation section about LoRA ( #2834 )
2024-02-12 17:24:45 +01:00
cd9e60c76c
Add Internlm2 ( #2666 )
2024-02-01 09:27:40 -08:00
2832e7b9f9
fix names and license for Qwen2 ( #2589 )
2024-01-24 22:37:51 -08:00
223c19224b
Fix the syntax error in the doc of supported_models ( #2584 )
2024-01-24 11:22:51 -08:00
94b5edeb53
Add qwen2 ( #2495 )
2024-01-22 14:34:21 -08:00
e1957c6ebd
Add StableLM3B model ( #2372 )
2024-01-16 20:32:40 -08:00
fd4ea8ef5c
Use NCCL instead of ray for control-plane communication to remove serialization overhead ( #2221 )
2024-01-03 11:30:22 -08:00
c17daa9f89
[Docs] Fix broken links ( #2222 )
2023-12-20 12:43:42 -08:00
de60a3fb93
Added DeciLM-7b and DeciLM-7b-instruct ( #2062 )
2023-12-19 02:29:33 -08:00
3ec8c25cd0
[Docs] Update documentation for gpu-memory-utilization option ( #2162 )
2023-12-17 10:51:57 -08:00
f8c688d746
[Minor] Add Phi 2 to supported models ( #2159 )
2023-12-17 02:54:57 -08:00
21d93c140d
Optimize Mixtral with expert parallelism ( #2090 )
2023-12-13 23:55:07 -08:00
096827c284
[Docs] Add notes on ROCm-supported models ( #2087 )
2023-12-13 09:45:34 -08:00
4ff0203987
Minor fixes for Mixtral ( #2015 )
2023-12-11 09:16:15 -08:00
d940ce497e
Fix typo in adding_model.rst ( #1947 )
...
adpated -> adapted
2023-12-06 10:04:26 -08:00
e5452ddfd6
Normalize head weights for Baichuan 2 ( #1876 )
2023-11-30 20:03:58 -08:00
0f621c2c7d
[Docs] Add information about using shared memory in docker ( #1845 )
2023-11-29 18:33:56 -08:00
a921d8be9d
[DOCS] Add engine args documentation ( #1741 )
2023-11-22 12:31:27 -08:00
edb305584b
Support download models from www.modelscope.cn ( #1588 )
2023-11-17 20:38:31 -08:00
0fc280b06c
Update the adding-model doc according to the new refactor ( #1692 )
2023-11-16 18:46:26 -08:00
415d109527
[Fix] Update Supported Models List ( #1690 )
2023-11-16 14:47:26 -08:00
0967102c6d
fixing typo in tiiuae/falcon-rw-7b model name ( #1226 )
2023-09-29 13:40:25 -07:00
202351d5bf
Add Mistral to supported model list ( #1221 )
2023-09-28 14:33:04 -07:00
002800f081
Align vLLM's beam search implementation with HF generate ( #857 )
2023-09-04 17:29:42 -07:00
55b28b1eee
[Docs] Minor fixes in supported models ( #920 )
...
* Minor fix in supported models
* Add another small fix for Aquila model
---------
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com >
2023-08-31 16:28:39 -07:00
14f9c72bfd
Update Supported Model List ( #825 )
2023-08-22 11:51:44 -07:00
1b151ed181
Fix baichuan doc style ( #748 )
2023-08-13 20:57:31 -07:00
f7389f4763
[Doc] Add Baichuan 13B to supported models ( #656 )
2023-08-02 16:45:12 -07:00
1b0bd0fe8a
Add Falcon support (new) ( #592 )
2023-08-02 14:04:39 -07:00
df5dd3c68e
Add Baichuan-7B to README ( #494 )
2023-07-25 15:25:12 -07:00
6fc2a38b11
Add support for LLaMA-2 ( #505 )
2023-07-20 11:38:27 -07:00
c894836108
[Model] Add support for GPT-J ( #226 )
...
Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu >
2023-07-08 17:55:16 -07:00
ffa6d2f9f9
[Docs] Fix typo ( #346 )
2023-07-03 16:51:47 -07:00
404422f42e
[Model] Add support for MPT ( #334 )
2023-07-03 16:47:53 -07:00
e41f06702c
Add support for BLOOM ( #331 )
2023-07-03 13:12:35 -07:00
665c48963b
[Docs] Add GPTBigCode to supported models ( #213 )
2023-06-22 15:05:11 -07:00
794e578de0
[Minor] Fix URLs ( #166 )
2023-06-19 22:57:14 -07:00
b7e62d3454
Fix repo & documentation URLs ( #163 )
2023-06-19 20:03:40 -07:00
0b32a987dd
Add and list supported models in README ( #161 )
2023-06-20 10:57:46 +08:00
dcda03b4cb
Write README and front page of doc ( #147 )
2023-06-18 03:19:38 -07:00
0b98ba15c7
Change the name to vLLM ( #150 )
2023-06-17 03:07:40 -07:00
456941cfe4
[Docs] Write the Adding a New Model section ( #138 )
2023-06-05 20:01:26 -07:00