27a7b070db
Add document for vllm paged attention kernel. ( #2978 )
2024-03-04 09:23:34 -08:00
d0fae88114
[DOC] add setup document to support neuron backend ( #2777 )
2024-03-04 01:03:51 +00:00
ce4f5a29fb
Add Automatic Prefix Caching ( #2762 )
...
Co-authored-by: ElizaWszola <eliza@neuralmagic.com >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
2024-03-02 00:50:01 -08:00
49d849b3ab
docs: Add tutorial on deploying vLLM model with KServe ( #2586 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com >
2024-03-01 11:04:14 -08:00
a8683102cc
multi-lora documentation fix ( #3064 )
2024-02-27 21:26:15 -08:00
8b430d7dea
[Minor] Fix StableLMEpochForCausalLM -> StableLmForCausalLM ( #3046 )
2024-02-26 20:23:50 -08:00
48a8f4a7fd
Support Orion model ( #2539 )
...
Co-authored-by: zhangdacheng <zhangdacheng@ainirobot.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2024-02-26 19:17:06 -08:00
ef978fe411
Port metrics from aioprometheus to prometheus_client ( #2730 )
2024-02-25 11:54:00 -08:00
a9c8212895
[FIX] Add Gemma model to the doc ( #2966 )
2024-02-21 09:46:15 -08:00
ab3a5a8259
Support OLMo models. ( #2832 )
2024-02-18 21:05:15 -08:00
8f36444c4f
multi-LoRA as extra models in OpenAI server ( #2775 )
...
how to serve the loras (mimicking the [multilora inference example](https://github.com/vllm-project/vllm/blob/main/examples/multilora_inference.py )):
```terminal
$ export LORA_PATH=~/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/
$ python -m vllm.entrypoints.api_server \
--model meta-llama/Llama-2-7b-hf \
--enable-lora \
--lora-modules sql-lora=$LORA_PATH sql-lora2=$LORA_PATH
```
the above server will list 3 separate values if the user queries `/models`: one for the base served model, and one each for the specified lora modules. in this case sql-lora and sql-lora2 point to the same underlying lora, but this need not be the case. lora config values take the same values they do in EngineArgs
no work has been done here to scope client permissions to specific models
2024-02-17 12:00:48 -08:00
317b29de0f
Remove Yi model definition, please use LlamaForCausalLM instead ( #2854 )
...
Co-authored-by: Roy <jasonailu87@gmail.com >
2024-02-13 14:22:22 -08:00
f964493274
[CI] Ensure documentation build is checked in CI ( #2842 )
2024-02-12 22:53:07 -08:00
4ca2c358b1
Add documentation section about LoRA ( #2834 )
2024-02-12 17:24:45 +01:00
0580aab02f
[ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention ( #2768 )
2024-02-10 23:14:37 -08:00
931746bc6d
Add documentation on how to do incremental builds ( #2796 )
2024-02-07 14:42:02 -08:00
5ed704ec8c
docs: fix langchain ( #2736 )
2024-02-03 18:17:55 -08:00
cd9e60c76c
Add Internlm2 ( #2666 )
2024-02-01 09:27:40 -08:00
1af090b57d
Bump up version to v0.3.0 ( #2656 )
2024-01-31 00:07:07 -08:00
9090bf02e7
Support FP8-E5M2 KV Cache ( #2279 )
...
Co-authored-by: zhaoyang <zhao.yang16@zte.com.cn >
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com >
2024-01-28 16:43:54 -08:00
6b7de1a030
[ROCm] add support to ROCm 6.0 and MI300 ( #2274 )
2024-01-26 12:41:10 -08:00
2832e7b9f9
fix names and license for Qwen2 ( #2589 )
2024-01-24 22:37:51 -08:00
223c19224b
Fix the syntax error in the doc of supported_models ( #2584 )
2024-01-24 11:22:51 -08:00
9c1352eb57
[Feature] Simple API token authentication and pluggable middlewares ( #1106 )
2024-01-23 15:13:00 -08:00
94b5edeb53
Add qwen2 ( #2495 )
2024-01-22 14:34:21 -08:00
e1957c6ebd
Add StableLM3B model ( #2372 )
2024-01-16 20:32:40 -08:00
827cbcd37c
Update quickstart.rst ( #2369 )
2024-01-12 12:56:18 -08:00
f745847ef7
[Minor] Fix the format in quick start guide related to Model Scope ( #2425 )
2024-01-11 19:44:01 -08:00
6549aef245
[DOC] Add additional comments for LLMEngine and AsyncLLMEngine ( #1011 )
2024-01-11 19:26:49 -08:00
fd4ea8ef5c
Use NCCL instead of ray for control-plane communication to remove serialization overhead ( #2221 )
2024-01-03 11:30:22 -08:00
1db83e31a2
[Docs] Update installation instructions to include CUDA 11.8 xFormers ( #2246 )
2023-12-22 23:20:02 -08:00
c17daa9f89
[Docs] Fix broken links ( #2222 )
2023-12-20 12:43:42 -08:00
de60a3fb93
Added DeciLM-7b and DeciLM-7b-instruct ( #2062 )
2023-12-19 02:29:33 -08:00
1b7c791d60
[ROCm] Fixes for GPTQ on ROCm ( #2180 )
2023-12-18 10:41:04 -08:00
3ec8c25cd0
[Docs] Update documentation for gpu-memory-utilization option ( #2162 )
2023-12-17 10:51:57 -08:00
f8c688d746
[Minor] Add Phi 2 to supported models ( #2159 )
2023-12-17 02:54:57 -08:00
26c52a5ea6
[Docs] Add CUDA graph support to docs ( #2148 )
2023-12-17 01:49:20 -08:00
b81a6a6bb3
[Docs] Add supported quantization methods to docs ( #2135 )
2023-12-15 13:29:22 -08:00
21d93c140d
Optimize Mixtral with expert parallelism ( #2090 )
2023-12-13 23:55:07 -08:00
096827c284
[Docs] Add notes on ROCm-supported models ( #2087 )
2023-12-13 09:45:34 -08:00
6565d9e33e
Update installation instruction for vLLM + CUDA 11.8 ( #2086 )
2023-12-13 09:25:59 -08:00
f375ec8440
[ROCm] Upgrade xformers version for ROCm & update doc ( #2079 )
...
Co-authored-by: miloice <jeffaw99@hotmail.com >
2023-12-13 00:56:05 -08:00
c0ce15dfb2
Update run_on_sky.rst ( #2025 )
...
sharable -> shareable
2023-12-11 10:32:58 -08:00
4ff0203987
Minor fixes for Mixtral ( #2015 )
2023-12-11 09:16:15 -08:00
c85b80c2b6
[Docker] Add cuda arch list as build option ( #1950 )
2023-12-08 09:53:47 -08:00
6ccc0bfffb
Merge EmbeddedLLM/vllm-rocm into vLLM main ( #1836 )
...
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com >
Co-authored-by: Amir Balwel <amoooori04@gmail.com >
Co-authored-by: root <kuanfu.liu@akirakan.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com >
Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com >
2023-12-07 23:16:52 -08:00
24f60a54f4
[Docker] Adding number of nvcc_threads during build as envar ( #1893 )
2023-12-07 11:00:32 -08:00
42c02f5892
Fix quickstart.rst typo jinja ( #1964 )
2023-12-07 08:34:44 -08:00
d940ce497e
Fix typo in adding_model.rst ( #1947 )
...
adpated -> adapted
2023-12-06 10:04:26 -08:00
c07a442854
chore(examples-docs): upgrade to OpenAI V1 ( #1785 )
2023-12-03 01:11:22 -08:00