136 lines
4.9 KiB
Markdown
136 lines
4.9 KiB
Markdown
# Intermediate Tensor Logging
|
|
|
|
This document provides guidance on using the intermediate tensor logging feature in vLLM, which allows you to capture and save intermediate tensors during model execution.
|
|
|
|
## Overview
|
|
|
|
The intermediate tensor logging feature enables you to:
|
|
|
|
- Log input and output tensors from a configured set of filters
|
|
- Filter modules by name using regex patterns
|
|
- Filter module fwd call index (e.g. dump 2nd call of forward pass on same module)
|
|
- Filter tensors by device
|
|
- Filter whole model fwd step id
|
|
|
|
This is manily useful for debugging model accucacy gaps with 2 runs
|
|
|
|
## Usage
|
|
|
|
### Enabling via parameters or config file
|
|
|
|
**Offline Inference example**
|
|
|
|
Dump all modules, all devices for step 0 (default behavior)
|
|
|
|
```bash
|
|
python3 ./examples/offline_inference/llm_engine_example.py --model "meta-llama/Llama-3.1-8B-Instruct" --enforce-eager --intermediate-log-config '{"enabled": true}'
|
|
```
|
|
|
|
Dump first layers module, all devices for step 0
|
|
|
|
```bash
|
|
python3 ./examples/offline_inference/llm_engine_example.py --model "meta-llama/Llama-3.1-8B-Instruct" --enforce-eager --intermediate-log-config '{"enabled": true, "module_call_match": "layers\\.0\\."}'
|
|
```
|
|
|
|
Dump customized layers, devices, steps through a config file
|
|
|
|
The configuration file should be a JSON file with the following structure:
|
|
|
|
```json
|
|
{
|
|
"output_dir": "/tmp/vllm_intermediates",
|
|
"module_call_match": ["layers\\.0\\.(?!.*rotary_emb).*", "rotary_emb:0", "embed_tokens", "model\\.norm"],
|
|
"log_step_ids": [0, 1],
|
|
"device_names": ["cuda:0"]
|
|
}
|
|
```
|
|
|
|
```bash
|
|
python3 ./examples/offline_inference/llm_engine_example.py --model "meta-llama/Llama-3.1-8B-Instruct" --enforce-eager --intermediate-log-config-path $HOME/intermediate_logging_config.json
|
|
```
|
|
|
|
#### Configuration Parameters
|
|
|
|
| Parameter | Type | Description | Default |
|
|
|-----------|------|-------------|---------|
|
|
| `output_dir` | string | Directory where to save the intermediate tensors | `/tmp/vllm_intermediates` |
|
|
| `module_call_match` | array | Regex patterns to filter module names, if limti to ith call only, add `:i` | `null` (log all modules) |
|
|
| `log_step_ids` | array | List of step IDs to log | `[0]` |
|
|
| `max_tensor_size` | integer | Maximum number of elements in tensors to log | `null` (no limit) |
|
|
| `device_names` | array | List of device names to log | `[]` (log all devices) |
|
|
|
|
### Output Directory Structure
|
|
|
|
When you enable intermediate logging, the system creates a timestamped directory under your specified `output_dir`. This helps organize multiple logging sessions:
|
|
|
|
```
|
|
/tmp/vllm_intermediates/010fed05-4a36-4c19-ab44-7cd67e3f63ce/
|
|
└── step_0
|
|
├── model.embed_tokens
|
|
│ ├── inputs_0_cuda_0.pt
|
|
│ ├── inputs.json
|
|
│ ├── outputs_cuda_0.pt
|
|
│ └── outputs.json
|
|
├── model.layers.0.input_layernorm
|
|
│ ├── inputs_0_cuda_0.pt
|
|
│ ├── inputs.json
|
|
│ ├── outputs_cuda_0.pt
|
|
│ └── outputs.json
|
|
└── step_1/
|
|
└── ...
|
|
```
|
|
|
|
Each tensor is saved in two formats:
|
|
1. `.json` files containing metadata and small tensor values
|
|
2. `.pt` files containing the full PyTorch tensors (can be loaded with `torch.load()`)
|
|
|
|
## Comparing Intermediate Logging Results
|
|
|
|
vLLM provides a tool called `compare_intermediate.py` to compare intermediate tensors between two different runs. This is particularly useful for debugging accuracy differences or verifying that code changes don't affect model outputs.
|
|
|
|
### Usage
|
|
|
|
```bash
|
|
python tools/compare_intermediate.py --dir1 /path/to/first/log/dir --dir2 /path/to/second/log/dir [options]
|
|
```
|
|
|
|
### Options
|
|
|
|
| Option | Description | Default |
|
|
|--------|-------------|---------|
|
|
| `--dir1` | First intermediate logging directory | (required) |
|
|
| `--dir2` | Second intermediate logging directory | (required) |
|
|
| `--output` | Output file for the report | stdout |
|
|
| `--rtol` | Relative tolerance for tensor comparison | 1e-5 |
|
|
| `--atol` | Absolute tolerance for tensor comparison | 1e-8 |
|
|
| `--steps` | Comma-separated list of steps to compare | all |
|
|
| `--modules` | Comma-separated list of module name patterns to compare | all |
|
|
| `--verbose` | Include detailed information about each tensor | false |
|
|
|
|
### Example
|
|
|
|
```bash
|
|
# Compare all tensors from two different runs
|
|
python tools/compare_intermediate.py --dir1 /tmp/vllm_intermediates/run1 --dir2 /tmp/vllm_intermediates/run2
|
|
|
|
# Compare only specific modules and steps with custom tolerance
|
|
python tools/compare_intermediate.py \
|
|
--dir1 /tmp/vllm_intermediates/run1 \
|
|
--dir2 /tmp/vllm_intermediates/run2 \
|
|
--steps 0,1 \
|
|
--modules ".*attention.*,.*mlp.*" \
|
|
--rtol 1e-4 \
|
|
--atol 1e-7 \
|
|
--output comparison_report.md
|
|
```
|
|
|
|
### Output
|
|
|
|
The tool generates a detailed markdown report that includes:
|
|
|
|
- Overall summary of matching and mismatched tensors
|
|
- Per-module comparison results
|
|
- Detailed tensor differences (when using `--verbose`)
|
|
|
|
This makes it easy to identify which specific tensors differ between runs and by how much.
|