[Doc][CI/Build] Update docs and tests to use vllm serve (#6431)
This commit is contained in:
@ -114,7 +114,7 @@ Just add the following lines in your code:
|
||||
from your_code import YourModelForCausalLM
|
||||
ModelRegistry.register_model("YourModelForCausalLM", YourModelForCausalLM)
|
||||
|
||||
If you are running api server with `python -m vllm.entrypoints.openai.api_server args`, you can wrap the entrypoint with the following code:
|
||||
If you are running api server with :code:`vllm serve <args>`, you can wrap the entrypoint with the following code:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@ -124,4 +124,4 @@ If you are running api server with `python -m vllm.entrypoints.openai.api_server
|
||||
import runpy
|
||||
runpy.run_module('vllm.entrypoints.openai.api_server', run_name='__main__')
|
||||
|
||||
Save the above code in a file and run it with `python your_file.py args`.
|
||||
Save the above code in a file and run it with :code:`python your_file.py <args>`.
|
||||
|
||||
@ -8,7 +8,7 @@ Below, you can find an explanation of every engine argument for vLLM:
|
||||
.. argparse::
|
||||
:module: vllm.engine.arg_utils
|
||||
:func: _engine_args_parser
|
||||
:prog: -m vllm.entrypoints.openai.api_server
|
||||
:prog: vllm serve
|
||||
:nodefaultconst:
|
||||
|
||||
Async Engine Arguments
|
||||
@ -19,5 +19,5 @@ Below are the additional arguments related to the asynchronous engine:
|
||||
.. argparse::
|
||||
:module: vllm.engine.arg_utils
|
||||
:func: _async_engine_args_parser
|
||||
:prog: -m vllm.entrypoints.openai.api_server
|
||||
:prog: vllm serve
|
||||
:nodefaultconst:
|
||||
@ -61,8 +61,7 @@ LoRA adapted models can also be served with the Open-AI compatible vLLM server.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python -m vllm.entrypoints.openai.api_server \
|
||||
--model meta-llama/Llama-2-7b-hf \
|
||||
vllm serve meta-llama/Llama-2-7b-hf \
|
||||
--enable-lora \
|
||||
--lora-modules sql-lora=$HOME/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/snapshots/0dfa347e8877a4d4ed19ee56c140fa518470028c/
|
||||
|
||||
|
||||
@ -94,9 +94,7 @@ Below is an example on how to launch the same ``llava-hf/llava-1.5-7b-hf`` with
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python -m vllm.entrypoints.openai.api_server \
|
||||
--model llava-hf/llava-1.5-7b-hf \
|
||||
--chat-template template_llava.jinja
|
||||
vllm serve llava-hf/llava-1.5-7b-hf --chat-template template_llava.jinja
|
||||
|
||||
.. important::
|
||||
We have removed all vision language related CLI args in the ``0.5.1`` release. **This is a breaking change**, so please update your code to follow
|
||||
|
||||
Reference in New Issue
Block a user