[Doc][CI/Build] Update docs and tests to use vllm serve (#6431)

This commit is contained in:
Cyrus Leung
2024-07-17 15:43:21 +08:00
committed by GitHub
parent a19e8d3726
commit 5bf35a91e4
23 changed files with 155 additions and 175 deletions

View File

@ -114,7 +114,7 @@ Just add the following lines in your code:
from your_code import YourModelForCausalLM
ModelRegistry.register_model("YourModelForCausalLM", YourModelForCausalLM)
If you are running api server with `python -m vllm.entrypoints.openai.api_server args`, you can wrap the entrypoint with the following code:
If you are running api server with :code:`vllm serve <args>`, you can wrap the entrypoint with the following code:
.. code-block:: python
@ -124,4 +124,4 @@ If you are running api server with `python -m vllm.entrypoints.openai.api_server
import runpy
runpy.run_module('vllm.entrypoints.openai.api_server', run_name='__main__')
Save the above code in a file and run it with `python your_file.py args`.
Save the above code in a file and run it with :code:`python your_file.py <args>`.

View File

@ -8,7 +8,7 @@ Below, you can find an explanation of every engine argument for vLLM:
.. argparse::
:module: vllm.engine.arg_utils
:func: _engine_args_parser
:prog: -m vllm.entrypoints.openai.api_server
:prog: vllm serve
:nodefaultconst:
Async Engine Arguments
@ -19,5 +19,5 @@ Below are the additional arguments related to the asynchronous engine:
.. argparse::
:module: vllm.engine.arg_utils
:func: _async_engine_args_parser
:prog: -m vllm.entrypoints.openai.api_server
:prog: vllm serve
:nodefaultconst:

View File

@ -61,8 +61,7 @@ LoRA adapted models can also be served with the Open-AI compatible vLLM server.
.. code-block:: bash
python -m vllm.entrypoints.openai.api_server \
--model meta-llama/Llama-2-7b-hf \
vllm serve meta-llama/Llama-2-7b-hf \
--enable-lora \
--lora-modules sql-lora=$HOME/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/snapshots/0dfa347e8877a4d4ed19ee56c140fa518470028c/

View File

@ -94,9 +94,7 @@ Below is an example on how to launch the same ``llava-hf/llava-1.5-7b-hf`` with
.. code-block:: bash
python -m vllm.entrypoints.openai.api_server \
--model llava-hf/llava-1.5-7b-hf \
--chat-template template_llava.jinja
vllm serve llava-hf/llava-1.5-7b-hf --chat-template template_llava.jinja
.. important::
We have removed all vision language related CLI args in the ``0.5.1`` release. **This is a breaking change**, so please update your code to follow