Stop using title frontmatter and fix doc that can only be reached by search (#20623)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
@ -1,6 +1,4 @@
|
||||
---
|
||||
title: Loading models with Run:ai Model Streamer
|
||||
---
|
||||
# Loading models with Run:ai Model Streamer
|
||||
|
||||
Run:ai Model Streamer is a library to read tensors in concurrency, while streaming it to GPU memory.
|
||||
Further reading can be found in [Run:ai Model Streamer Documentation](https://github.com/run-ai/runai-model-streamer/blob/master/docs/README.md).
|
||||
|
||||
@ -1,6 +1,4 @@
|
||||
---
|
||||
title: Loading models with CoreWeave's Tensorizer
|
||||
---
|
||||
# Loading models with CoreWeave's Tensorizer
|
||||
|
||||
vLLM supports loading models with [CoreWeave's Tensorizer](https://docs.coreweave.com/coreweave-machine-learning-and-ai/inference/tensorizer).
|
||||
vLLM model tensors that have been serialized to disk, an HTTP/HTTPS endpoint, or S3 endpoint can be deserialized
|
||||
|
||||
@ -1,6 +1,4 @@
|
||||
---
|
||||
title: Generative Models
|
||||
---
|
||||
# Generative Models
|
||||
|
||||
vLLM provides first-class support for generative models, which covers most of LLMs.
|
||||
|
||||
|
||||
@ -1,6 +1,4 @@
|
||||
---
|
||||
title: TPU
|
||||
---
|
||||
# TPU
|
||||
|
||||
# TPU Supported Models
|
||||
## Text-only Language Models
|
||||
|
||||
@ -1,6 +1,4 @@
|
||||
---
|
||||
title: Pooling Models
|
||||
---
|
||||
# Pooling Models
|
||||
|
||||
vLLM also supports pooling models, including embedding, reranking and reward models.
|
||||
|
||||
|
||||
@ -1,6 +1,4 @@
|
||||
---
|
||||
title: Supported Models
|
||||
---
|
||||
# Supported Models
|
||||
|
||||
vLLM supports [generative](./generative_models.md) and [pooling](./pooling_models.md) models across various tasks.
|
||||
If a model supports more than one task, you can set the task via the `--task` argument.
|
||||
|
||||
Reference in New Issue
Block a user