[Doc] Move examples and further reorganize user guide (#18666)

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-26 22:38:04 +08:00
parent 9553fdb41e
commit 82e2339b06
27 changed files with 31 additions and 42 deletions
--- a/docs/.nav.yml
+++ b/docs/.nav.yml
@ -5,11 +5,9 @@ nav:
      - getting_started/quickstart.md
      - getting_started/installation
    - Examples:
-      - Offline Inference: getting_started/examples/offline_inference
-      - Online Serving: getting_started/examples/online_serving
-      - Others:
-        - LMCache: getting_started/examples/lmcache
-        - getting_started/examples/other/*
+      - Offline Inference: examples/offline_inference
+      - Online Serving: examples/online_serving
+      - Others: examples/others
    - Quick Links:
      - User Guide: usage/README.md
      - Developer Guide: contributing/README.md
@ -19,6 +17,7 @@ nav:
      - Releases: https://github.com/vllm-project/vllm/releases
  - User Guide:
    - Summary: usage/README.md
+    - usage/v1_guide.md
    - General:
      - usage/*
    - Inference and Serving:
--- a/docs/configuration/README.md
+++ b/docs/configuration/README.md
@ -1,4 +1,9 @@
 # Configuration Options

-This section lists the most common options for running the vLLM engine.
-For a full list, refer to the [configuration][configuration] page.
+This section lists the most common options for running vLLM.
+
+There are three main levels of configuration, from highest priority to lowest priority:
+
+- [Request parameters][completions-api] and [input arguments][sampling-params]
+- [Engine arguments](./engine_args.md)
+- [Environment variables](./env_vars.md)
--- a/docs/configuration/env_vars.md
+++ b/docs/configuration/env_vars.md
--- a/docs/design/v1/metrics.md
+++ b/docs/design/v1/metrics.md
@ -61,7 +61,7 @@ These are documented under [Inferencing and Serving -> Production Metrics](../..

 ### Grafana Dashboard

-vLLM also provides [a reference example](https://docs.vllm.ai/en/latest/getting_started/examples/prometheus_grafana.html) for how to collect and store these metrics using Prometheus and visualize them using a Grafana dashboard.
+vLLM also provides [a reference example](https://docs.vllm.ai/en/latest/examples/prometheus_grafana.html) for how to collect and store these metrics using Prometheus and visualize them using a Grafana dashboard.

 The subset of metrics exposed in the Grafana dashboard gives us an indication of which metrics are especially important:

@ -673,7 +673,7 @@ v0 has support for OpenTelemetry tracing:
 - [OpenTelemetry blog
  post](https://opentelemetry.io/blog/2024/llm-observability/)
 - [User-facing
-  docs](https://docs.vllm.ai/en/latest/getting_started/examples/opentelemetry.html)
+  docs](https://docs.vllm.ai/en/latest/examples/opentelemetry.html)
 - [Blog
  post](https://medium.com/@ronen.schaffer/follow-the-trail-supercharging-vllm-with-opentelemetry-distributed-tracing-aa655229b46f)
 - [IBM product
--- a/docs/mkdocs/hooks/generate_examples.py
+++ b/docs/mkdocs/hooks/generate_examples.py
@ -9,7 +9,7 @@ from typing import Literal
 ROOT_DIR = Path(__file__).parent.parent.parent.parent
 ROOT_DIR_RELATIVE = '../../../../..'
 EXAMPLE_DIR = ROOT_DIR / "examples"
-EXAMPLE_DOC_DIR = ROOT_DIR / "docs/getting_started/examples"
+EXAMPLE_DOC_DIR = ROOT_DIR / "docs/examples"
 print(ROOT_DIR.resolve())
 print(EXAMPLE_DIR.resolve())
 print(EXAMPLE_DOC_DIR.resolve())
--- a/docs/models/extensions/tensorizer.md
+++ b/docs/models/extensions/tensorizer.md
@ -10,7 +10,7 @@ shorter Pod startup times and CPU memory usage. Tensor encryption is also suppor

 For more information on CoreWeave's Tensorizer, please refer to
 [CoreWeave's Tensorizer documentation](https://github.com/coreweave/tensorizer). For more information on serializing a vLLM model, as well a general usage guide to using Tensorizer with vLLM, see
-the [vLLM example script](https://docs.vllm.ai/en/latest/getting_started/examples/tensorize_vllm_model.html).
+the [vLLM example script](https://docs.vllm.ai/en/latest/examples/tensorize_vllm_model.html).

 !!! note
    Note that to use this feature you will need to install `tensorizer` by running `pip install vllm[tensorizer]`.
--- a/docs/training/rlhf.md
+++ b/docs/training/rlhf.md
@ -6,6 +6,6 @@ vLLM can be used to generate the completions for RLHF. The best way to do this i

 See the following basic examples to get started if you don't want to use an existing library:

- [Training and inference processes are located on separate GPUs (inspired by OpenRLHF)](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf.html)
- [Training and inference processes are colocated on the same GPUs using Ray](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_colocate.html)
- [Utilities for performing RLHF with vLLM](https://docs.vllm.ai/en/latest/getting_started/examples/rlhf_utils.html)
+- [Training and inference processes are located on separate GPUs (inspired by OpenRLHF)](../examples/offline_inference/rlhf.md)
+- [Training and inference processes are colocated on the same GPUs using Ray](../examples/offline_inference/rlhf_colocate.md)
+- [Utilities for performing RLHF with vLLM](../examples/offline_inference/rlhf_utils.md)