diff --git a/.gitignore b/.gitignore index 5dc0f04b6f..721dd7536b 100644 --- a/.gitignore +++ b/.gitignore @@ -150,7 +150,8 @@ venv.bak/ # mkdocs documentation /site docs/argparse -docs/examples +docs/examples/* +!docs/examples/README.md # mypy .mypy_cache/ diff --git a/docs/.nav.yml b/docs/.nav.yml index acedc32c30..dbac0e12f1 100644 --- a/docs/.nav.yml +++ b/docs/.nav.yml @@ -1,25 +1,17 @@ nav: - - Home: - - vLLM: README.md + - Home: README.md + - User Guide: + - usage/README.md - Getting Started: - getting_started/quickstart.md - getting_started/installation - Examples: + - examples/README.md - Offline Inference: examples/offline_inference - Online Serving: examples/online_serving - Others: examples/others - - Quick Links: - - User Guide: usage/README.md - - Developer Guide: contributing/README.md - - API Reference: api/README.md - - CLI Reference: cli/README.md - - Timeline: - - Roadmap: https://roadmap.vllm.ai - - Releases: https://github.com/vllm-project/vllm/releases - - User Guide: - - Summary: usage/README.md - - usage/v1_guide.md - General: + - usage/v1_guide.md - usage/* - Inference and Serving: - serving/offline_inference.md @@ -32,7 +24,7 @@ nav: - deployment/integrations - Training: training - Configuration: - - Summary: configuration/README.md + - configuration/README.md - configuration/* - Models: - models/supported_models.md @@ -45,7 +37,7 @@ nav: - features/* - features/quantization - Developer Guide: - - Summary: contributing/README.md + - contributing/README.md - General: - glob: contributing/* flatten_single_child_sections: true diff --git a/docs/README.md b/docs/README.md index 6823008ed3..e8d2fd953a 100644 --- a/docs/README.md +++ b/docs/README.md @@ -21,6 +21,17 @@ vLLM is a fast and easy-to-use library for LLM inference and serving. Originally developed in the [Sky Computing Lab](https://sky.cs.berkeley.edu) at UC Berkeley, vLLM has evolved into a community-driven project with contributions from both academia and industry. +Where to get started with vLLM depends on the type of user. If you are looking to: + +- Run open-source models on vLLM, we recommend starting with the [Quickstart Guide](./getting_started/quickstart.md) +- Build applications with vLLM, we recommend starting with the [User Guide](./usage) +- Build vLLM, we recommend starting with [Developer Guide](./contributing) + +For information about the development of vLLM, see: + +- [Roadmap](https://roadmap.vllm.ai) +- [Releases](https://github.com/vllm-project/vllm/releases) + vLLM is fast with: - State-of-the-art serving throughput diff --git a/docs/examples/README.md b/docs/examples/README.md new file mode 100644 index 0000000000..34e4dfd408 --- /dev/null +++ b/docs/examples/README.md @@ -0,0 +1,7 @@ +# Examples + +vLLM's examples are split into three categories: + +- If you are using vLLM from within Python code, see [Offline Inference](./offline_inference/) +- If you are using vLLM from an HTTP application or client, see [Online Serving](./online_serving/) +- For examples of using some of vLLM's advanced features (e.g. LMCache or Tensorizer) which are not specific to either of the above use cases, see [Others](./others/) diff --git a/docs/mkdocs/stylesheets/extra.css b/docs/mkdocs/stylesheets/extra.css index fb44d9cdcf..6a1979b241 100644 --- a/docs/mkdocs/stylesheets/extra.css +++ b/docs/mkdocs/stylesheets/extra.css @@ -23,6 +23,13 @@ a:not(:has(svg)):not(.md-icon):not(.autorefs-external) { } } +a[href*="localhost"]::after, +a[href*="127.0.0.1"]::after, +a[href*="org.readthedocs.build"]::after, +a[href*="docs.vllm.ai"]::after { + display: none !important; +} + /* Light mode: darker section titles */ body[data-md-color-scheme="default"] .md-nav__item--section > label.md-nav__link .md-ellipsis { color: rgba(0, 0, 0, 0.7) !important; diff --git a/docs/usage/README.md b/docs/usage/README.md index 681db57d8e..83aea12181 100644 --- a/docs/usage/README.md +++ b/docs/usage/README.md @@ -1,6 +1,8 @@ # Using vLLM -vLLM supports the following usage patterns: +First, vLLM must be [installed](../getting_started/installation) for your chosen device in either a Python or Docker environment. + +Then, vLLM supports the following usage patterns: - [Inference and Serving](../serving/offline_inference.md): Run a single instance of a model. - [Deployment](../deployment/docker.md): Scale up model instances for production. diff --git a/mkdocs.yaml b/mkdocs.yaml index 3a64888fb4..47fe1ebce9 100644 --- a/mkdocs.yaml +++ b/mkdocs.yaml @@ -34,13 +34,14 @@ theme: - content.action.edit - content.code.copy - content.tabs.link + - navigation.instant + - navigation.instant.progress - navigation.tracking - navigation.tabs - navigation.tabs.sticky - navigation.sections - - navigation.prune - - navigation.top - navigation.indexes + - navigation.top - search.highlight - search.share - toc.follow