Stop using title frontmatter and fix doc that can only be reached by search (#20623)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-08 11:27:40 +01:00
parent b4bab81660
commit b942c094e3
81 changed files with 82 additions and 238 deletions
--- a/docs/design/arch_overview.md
+++ b/docs/design/arch_overview.md
@ -1,6 +1,4 @@
---
-title: Architecture Overview
---
+# Architecture Overview

 This document provides an overview of the vLLM architecture.

--- a/docs/design/automatic_prefix_caching.md
+++ b/docs/design/automatic_prefix_caching.md
@ -1,6 +1,4 @@
---
-title: Automatic Prefix Caching
---
+# Automatic Prefix Caching

 The core idea of [PagedAttention](https://blog.vllm.ai/2023/06/20/vllm.html) is to partition the KV cache of each request into KV Blocks. Each block contains the attention keys and values for a fixed number of tokens. The PagedAttention algorithm allows these blocks to be stored in non-contiguous physical memory so that we can eliminate memory fragmentation by allocating the memory on demand.

--- a/docs/design/huggingface_integration.md
+++ b/docs/design/huggingface_integration.md
@ -1,6 +1,4 @@
---
-title: Integration with HuggingFace
---
+# Integration with HuggingFace

 This document describes how vLLM integrates with HuggingFace libraries. We will explain step by step what happens under the hood when we run `vllm serve`.

--- a/docs/design/kernel/paged_attention.md
+++ b/docs/design/kernel/paged_attention.md
@ -1,6 +1,4 @@
---
-title: vLLM Paged Attention
---
+# vLLM Paged Attention

 Currently, vLLM utilizes its own implementation of a multi-head query
 attention kernel (`csrc/attention/attention_kernels.cu`).
--- a/docs/design/mm_processing.md
+++ b/docs/design/mm_processing.md
@ -1,6 +1,4 @@
---
-title: Multi-Modal Data Processing
---
+# Multi-Modal Data Processing

 To enable various optimizations in vLLM such as [chunked prefill][chunked-prefill] and [prefix caching](../features/automatic_prefix_caching.md), we use [BaseMultiModalProcessor][vllm.multimodal.processing.BaseMultiModalProcessor] to provide the correspondence between placeholder feature tokens (e.g. `<image>`) and multi-modal inputs (e.g. the raw input image) based on the outputs of HF processor.

--- a/docs/design/plugin_system.md
+++ b/docs/design/plugin_system.md
@ -1,6 +1,4 @@
---
-title: vLLM's Plugin System
---
+# vLLM's Plugin System

 The community frequently requests the ability to extend vLLM with custom features. To facilitate this, vLLM includes a plugin system that allows users to add custom features without modifying the vLLM codebase. This document explains how plugins work in vLLM and how to create a plugin for vLLM.