[Doc] Move examples and further reorganize user guide (#18666)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
@ -5,11 +5,9 @@ nav:
|
||||
- getting_started/quickstart.md
|
||||
- getting_started/installation
|
||||
- Examples:
|
||||
- Offline Inference: getting_started/examples/offline_inference
|
||||
- Online Serving: getting_started/examples/online_serving
|
||||
- Others:
|
||||
- LMCache: getting_started/examples/lmcache
|
||||
- getting_started/examples/other/*
|
||||
- Offline Inference: examples/offline_inference
|
||||
- Online Serving: examples/online_serving
|
||||
- Others: examples/others
|
||||
- Quick Links:
|
||||
- User Guide: usage/README.md
|
||||
- Developer Guide: contributing/README.md
|
||||
@ -19,6 +17,7 @@ nav:
|
||||
- Releases: https://github.com/vllm-project/vllm/releases
|
||||
- User Guide:
|
||||
- Summary: usage/README.md
|
||||
- usage/v1_guide.md
|
||||
- General:
|
||||
- usage/*
|
||||
- Inference and Serving:
|
||||
|
||||
@ -1,4 +1,9 @@
|
||||
# Configuration Options
|
||||
|
||||
This section lists the most common options for running the vLLM engine.
|
||||
For a full list, refer to the [configuration][configuration] page.
|
||||
This section lists the most common options for running vLLM.
|
||||
|
||||
There are three main levels of configuration, from highest priority to lowest priority:
|
||||
|
||||
- [Request parameters][completions-api] and [input arguments][sampling-params]
|
||||
- [Engine arguments](./engine_args.md)
|
||||
- [Environment variables](./env_vars.md)
|
||||
|
||||
@ -61,7 +61,7 @@ These are documented under [Inferencing and Serving -> Production Metrics](../..
|
||||
|
||||
### Grafana Dashboard
|
||||
|
||||
vLLM also provides [a reference example](https://docs.vllm.ai/en/latest/getting_started/examples/prometheus_grafana.html) for how to collect and store these metrics using Prometheus and visualize them using a Grafana dashboard.
|
||||
vLLM also provides [a reference example](https://docs.vllm.ai/en/latest/examples/prometheus_grafana.html) for how to collect and store these metrics using Prometheus and visualize them using a Grafana dashboard.
|
||||
|
||||
The subset of metrics exposed in the Grafana dashboard gives us an indication of which metrics are especially important:
|
||||
|
||||
@ -673,7 +673,7 @@ v0 has support for OpenTelemetry tracing:
|
||||
- [OpenTelemetry blog
|
||||
post](https://opentelemetry.io/blog/2024/llm-observability/)
|
||||
- [User-facing
|
||||
docs](https://docs.vllm.ai/en/latest/getting_started/examples/opentelemetry.html)
|
||||
docs](https://docs.vllm.ai/en/latest/examples/opentelemetry.html)
|
||||
- [Blog
|
||||
post](https://medium.com/@ronen.schaffer/follow-the-trail-supercharging-vllm-with-opentelemetry-distributed-tracing-aa655229b46f)
|
||||
- [IBM product
|
||||
|
||||
@ -9,7 +9,7 @@ from typing import Literal
|
||||
ROOT_DIR = Path(__file__).parent.parent.parent.parent
|
||||
ROOT_DIR_RELATIVE = '../../../../..'
|
||||
EXAMPLE_DIR = ROOT_DIR / "examples"
|
||||
EXAMPLE_DOC_DIR = ROOT_DIR / "docs/getting_started/examples"
|
||||
EXAMPLE_DOC_DIR = ROOT_DIR / "docs/examples"
|
||||
print(ROOT_DIR.resolve())
|
||||
print(EXAMPLE_DIR.resolve())
|
||||
print(EXAMPLE_DOC_DIR.resolve())
|
||||
|
||||
@ -10,7 +10,7 @@ shorter Pod startup times and CPU memory usage. Tensor encryption is also suppor
|
||||
|
||||
For more information on CoreWeave's Tensorizer, please refer to
|
||||
[CoreWeave's Tensorizer documentation](https://github.com/coreweave/tensorizer). For more information on serializing a vLLM model, as well a general usage guide to using Tensorizer with vLLM, see
|
||||
the [vLLM example script](https://docs.vllm.ai/en/latest/getting_started/examples/tensorize_vllm_model.html).
|
||||
the [vLLM example script](https://docs.vllm.ai/en/latest/examples/tensorize_vllm_model.html).
|
||||
|
||||
!!! note
|
||||
Note that to use this feature you will need to install `tensorizer` by running `pip install vllm[tensorizer]`.
|
||||
|
||||
@ -6,6 +6,6 @@ vLLM can be used to generate the completions for RLHF. The best way to do this i
|
||||
|
||||
See the following basic examples to get started if you don't want to use an existing library:
|
||||
|
||||
- [Training and inference processes are located on separate GPUs (inspired by OpenRLHF)](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf.html)
|
||||
- [Training and inference processes are colocated on the same GPUs using Ray](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_colocate.html)
|
||||
- [Utilities for performing RLHF with vLLM](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_utils.html)
|
||||
- [Training and inference processes are located on separate GPUs (inspired by OpenRLHF)](../examples/offline_inference/rlhf.md)
|
||||
- [Training and inference processes are colocated on the same GPUs using Ray](../examples/offline_inference/rlhf_colocate.md)
|
||||
- [Utilities for performing RLHF with vLLM](../examples/offline_inference/rlhf_utils.md)
|
||||
|
||||
Reference in New Issue
Block a user