Files

Lucia Fang 2af83ebdde remove feature for metadata dump and input reload

Signed-off-by: Lucia Fang <fanglu@fb.com>

2025-08-05 09:25:17 -07:00

4.9 KiB

Raw Blame History

Intermediate Tensor Logging

This document provides guidance on using the intermediate tensor logging feature in vLLM, which allows you to capture and save intermediate tensors during model execution.

Overview

The intermediate tensor logging feature enables you to:

Log input and output tensors from a configured set of filters
Filter modules by name using regex patterns
Filter module fwd call index (e.g. dump 2nd call of forward pass on same module)
Filter tensors by device
Filter whole model fwd step id

This is manily useful for debugging model accucacy gaps with 2 runs

Usage

Enabling via parameters or config file

Offline Inference example

Dump all modules, all devices for step 0 (default behavior)

python3 ./examples/offline_inference/llm_engine_example.py --model "meta-llama/Llama-3.1-8B-Instruct"  --enforce-eager  --intermediate-log-config '{"enabled": true}'

Dump first layers module, all devices for step 0

python3 ./examples/offline_inference/llm_engine_example.py --model "meta-llama/Llama-3.1-8B-Instruct"  --enforce-eager  --intermediate-log-config '{"enabled": true, "module_call_match": "layers\\.0\\."}'

Dump customized layers, devices, steps through a config file

The configuration file should be a JSON file with the following structure:

{
  "output_dir": "/tmp/vllm_intermediates",
  "module_call_match": ["layers\\.0\\.(?!.*rotary_emb).*", "rotary_emb:0", "embed_tokens", "model\\.norm"],
  "log_step_ids": [0, 1],
  "device_names": ["cuda:0"]
}

python3 ./examples/offline_inference/llm_engine_example.py --model "meta-llama/Llama-3.1-8B-Instruct"  --enforce-eager  --intermediate-log-config-path $HOME/intermediate_logging_config.json

Configuration Parameters

Parameter	Type	Description	Default
`output_dir`	string	Directory where to save the intermediate tensors	`/tmp/vllm_intermediates`
`module_call_match`	array	Regex patterns to filter module names, if limti to ith call only, add `:i`	`null` (log all modules)
`log_step_ids`	array	List of step IDs to log	`[0]`
`max_tensor_size`	integer	Maximum number of elements in tensors to log	`null` (no limit)
`device_names`	array	List of device names to log	`[]` (log all devices)

Output Directory Structure

When you enable intermediate logging, the system creates a timestamped directory under your specified output_dir. This helps organize multiple logging sessions:

/tmp/vllm_intermediates/010fed05-4a36-4c19-ab44-7cd67e3f63ce/
└── step_0
    ├── model.embed_tokens
    │   ├── inputs_0_cuda_0.pt
    │   ├── inputs.json
    │   ├── outputs_cuda_0.pt
    │   └── outputs.json
    ├── model.layers.0.input_layernorm
    │   ├── inputs_0_cuda_0.pt
    │   ├── inputs.json
    │   ├── outputs_cuda_0.pt
    │   └── outputs.json
    └── step_1/
        └── ...

Each tensor is saved in two formats:

.json files containing metadata and small tensor values
.pt files containing the full PyTorch tensors (can be loaded with torch.load())

Comparing Intermediate Logging Results

vLLM provides a tool called compare_intermediate.py to compare intermediate tensors between two different runs. This is particularly useful for debugging accuracy differences or verifying that code changes don't affect model outputs.

Usage

python tools/compare_intermediate.py --dir1 /path/to/first/log/dir --dir2 /path/to/second/log/dir [options]

Options

Option	Description	Default
`--dir1`	First intermediate logging directory	(required)
`--dir2`	Second intermediate logging directory	(required)
`--output`	Output file for the report	stdout
`--rtol`	Relative tolerance for tensor comparison	1e-5
`--atol`	Absolute tolerance for tensor comparison	1e-8
`--steps`	Comma-separated list of steps to compare	all
`--modules`	Comma-separated list of module name patterns to compare	all
`--verbose`	Include detailed information about each tensor	false

Example

# Compare all tensors from two different runs
python tools/compare_intermediate.py --dir1 /tmp/vllm_intermediates/run1 --dir2 /tmp/vllm_intermediates/run2

# Compare only specific modules and steps with custom tolerance
python tools/compare_intermediate.py \
  --dir1 /tmp/vllm_intermediates/run1 \
  --dir2 /tmp/vllm_intermediates/run2 \
  --steps 0,1 \
  --modules ".*attention.*,.*mlp.*" \
  --rtol 1e-4 \
  --atol 1e-7 \
  --output comparison_report.md

Output

The tool generates a detailed markdown report that includes:

Overall summary of matching and mismatched tensors
Per-module comparison results
Detailed tensor differences (when using --verbose)

This makes it easy to identify which specific tensors differ between runs and by how much.

4.9 KiB Raw Blame History

Intermediate Tensor Logging

Overview

Usage

Enabling via parameters or config file

Configuration Parameters

Output Directory Structure

Comparing Intermediate Logging Results

Usage

Options

Example

Output

4.9 KiB

Raw Blame History