Stop using title frontmatter and fix doc that can only be reached by search (#20623)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
Harry Mellor
2025-07-08 11:27:40 +01:00
committed by GitHub
parent b4bab81660
commit b942c094e3
81 changed files with 82 additions and 238 deletions

View File

@ -1,6 +1,4 @@
---
title: Loading models with Run:ai Model Streamer
---
# Loading models with Run:ai Model Streamer
Run:ai Model Streamer is a library to read tensors in concurrency, while streaming it to GPU memory.
Further reading can be found in [Run:ai Model Streamer Documentation](https://github.com/run-ai/runai-model-streamer/blob/master/docs/README.md).

View File

@ -1,6 +1,4 @@
---
title: Loading models with CoreWeave's Tensorizer
---
# Loading models with CoreWeave's Tensorizer
vLLM supports loading models with [CoreWeave's Tensorizer](https://docs.coreweave.com/coreweave-machine-learning-and-ai/inference/tensorizer).
vLLM model tensors that have been serialized to disk, an HTTP/HTTPS endpoint, or S3 endpoint can be deserialized

View File

@ -1,6 +1,4 @@
---
title: Generative Models
---
# Generative Models
vLLM provides first-class support for generative models, which covers most of LLMs.

View File

@ -1,6 +1,4 @@
---
title: TPU
---
# TPU
# TPU Supported Models
## Text-only Language Models

View File

@ -1,6 +1,4 @@
---
title: Pooling Models
---
# Pooling Models
vLLM also supports pooling models, including embedding, reranking and reward models.

View File

@ -1,6 +1,4 @@
---
title: Supported Models
---
# Supported Models
vLLM supports [generative](./generative_models.md) and [pooling](./pooling_models.md) models across various tasks.
If a model supports more than one task, you can set the task via the `--task` argument.