[Doc][CI/Build] Update docs and tests to use vllm serve (#6431)

2024-07-17 15:43:21 +08:00
parent a19e8d3726
commit 5bf35a91e4
23 changed files with 155 additions and 175 deletions
--- a/docs/source/models/adding_model.rst
+++ b/docs/source/models/adding_model.rst
@ -114,7 +114,7 @@ Just add the following lines in your code:
    from your_code import YourModelForCausalLM
    ModelRegistry.register_model("YourModelForCausalLM", YourModelForCausalLM)

-If you are running api server with `python -m vllm.entrypoints.openai.api_server args`, you can wrap the entrypoint with the following code:
+If you are running api server with :code:`vllm serve <args>`, you can wrap the entrypoint with the following code:

 .. code-block:: python

@ -124,4 +124,4 @@ If you are running api server with `python -m vllm.entrypoints.openai.api_server
    import runpy
    runpy.run_module('vllm.entrypoints.openai.api_server', run_name='__main__')

-Save the above code in a file and run it with `python your_file.py args`.
+Save the above code in a file and run it with :code:`python your_file.py <args>`.
--- a/docs/source/models/engine_args.rst
+++ b/docs/source/models/engine_args.rst
@ -8,7 +8,7 @@ Below, you can find an explanation of every engine argument for vLLM:
 .. argparse::
    :module: vllm.engine.arg_utils
    :func: _engine_args_parser
-    :prog: -m vllm.entrypoints.openai.api_server
+    :prog: vllm serve
    :nodefaultconst:

 Async Engine Arguments
@ -19,5 +19,5 @@ Below are the additional arguments related to the asynchronous engine:
 .. argparse::
    :module: vllm.engine.arg_utils
    :func: _async_engine_args_parser
-    :prog: -m vllm.entrypoints.openai.api_server
+    :prog: vllm serve
    :nodefaultconst:
--- a/docs/source/models/lora.rst
+++ b/docs/source/models/lora.rst
@ -61,8 +61,7 @@ LoRA adapted models can also be served with the Open-AI compatible vLLM server.

 .. code-block:: bash

-    python -m vllm.entrypoints.openai.api_server \
-        --model meta-llama/Llama-2-7b-hf \
+    vllm serve meta-llama/Llama-2-7b-hf \
        --enable-lora \
        --lora-modules sql-lora=$HOME/.cache/huggingface/hub/models--yard1--llama-2-7b-sql-lora-test/snapshots/0dfa347e8877a4d4ed19ee56c140fa518470028c/

--- a/docs/source/models/vlm.rst
+++ b/docs/source/models/vlm.rst
@ -94,9 +94,7 @@ Below is an example on how to launch the same ``llava-hf/llava-1.5-7b-hf`` with

 .. code-block:: bash

-    python -m vllm.entrypoints.openai.api_server \
-        --model llava-hf/llava-1.5-7b-hf \
-        --chat-template template_llava.jinja
+    vllm serve llava-hf/llava-1.5-7b-hf --chat-template template_llava.jinja

 .. important::
    We have removed all vision language related CLI args in the ``0.5.1`` release. **This is a breaking change**, so please update your code to follow