Merge remote-tracking branch 'origin/main' into il_tool

Signed-off-by: Lucia Fang <fanglu@fb.com>
This commit is contained in:
Lucia Fang
2025-08-05 09:48:28 -07:00
543 changed files with 20183 additions and 13429 deletions

View File

@ -26,6 +26,8 @@ See <gh-file:LICENSE>.
## Developing
--8<-- "docs/getting_started/installation/python_env_setup.inc.md"
Depending on the kind of development you'd like to do (e.g. Python, CUDA), you can choose to build vLLM with or without compilation.
Check out the [building from source][build-from-source] documentation for details.
@ -42,7 +44,7 @@ For an optimized workflow when iterating on C++/CUDA kernels, see the [Increment
Install MkDocs along with the [plugins](https://github.com/vllm-project/vllm/blob/main/mkdocs.yaml) used in the vLLM documentation, as well as required dependencies:
```bash
pip install -r requirements/docs.txt
uv pip install -r requirements/docs.txt
```
!!! note
@ -98,13 +100,14 @@ For additional features and advanced configurations, refer to the official [MkDo
??? console "Commands"
```bash
pip install -r requirements/common.txt -r requirements/dev.txt
# These commands are only for Nvidia CUDA platforms.
uv pip install -r requirements/common.txt -r requirements/dev.txt --torch-backend=auto
# Linting, formatting and static type checking
pre-commit install --hook-type pre-commit --hook-type commit-msg
pre-commit install
# You can manually run pre-commit with
pre-commit run --all-files
pre-commit run --all-files --show-diff-on-failure
# To manually run something from CI that does not run
# locally by default, you can run:
@ -122,6 +125,10 @@ For additional features and advanced configurations, refer to the official [MkDo
Therefore, we recommend developing with Python 3.12 to minimise the chance of your local environment clashing with our CI environment.
!!! note "Install python3-dev if Python.h is missing"
If any of the above commands fails with `Python.h: No such file or directory`, install
`python3-dev` with `sudo apt install python3-dev`.
!!! note
Currently, the repository is not fully checked by `mypy`.
@ -153,7 +160,7 @@ Using `-s` with `git commit` will automatically add this header.
!!! tip
You can enable automatic sign-off via your IDE:
- **PyCharm**: Click on the `Show Commit Options` icon to the right of the `Commit and Push...` button in the `Commit` window.
It will bring up a `git` window where you can modify the `Author` and enable `Sign-off commit`.
- **VSCode**: Open the [Settings editor](https://code.visualstudio.com/docs/configure/settings)

View File

@ -20,19 +20,19 @@ the failure?
- **Use this title format:**
```
```text
[CI Failure]: failing-test-job - regex/matching/failing:test
```
- **For the environment field:**
```
Still failing on main as of commit abcdef123
```text
Still failing on main as of commit abcdef123
```
- **In the description, include failing tests:**
```
```text
FAILED failing/test.py:failing_test1 - Failure description
FAILED failing/test.py:failing_test2 - Failure description
https://github.com/orgs/vllm-project/projects/20

View File

@ -57,8 +57,7 @@ cc the PyTorch release team to initiate discussion on how to address them.
## Update CUDA version
The PyTorch release matrix includes both stable and experimental [CUDA versions](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix). Due to limitations, only the latest stable CUDA version (for example,
`torch2.7.0+cu12.6`) is uploaded to PyPI. However, vLLM may require a different CUDA version,
The PyTorch release matrix includes both stable and experimental [CUDA versions](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix). Due to limitations, only the latest stable CUDA version (for example, torch `2.7.1+cu126`) is uploaded to PyPI. However, vLLM may require a different CUDA version,
such as 12.8 for Blackwell support.
This complicates the process as we cannot use the out-of-the-box
`pip install torch torchvision torchaudio` command. The solution is to use
@ -107,6 +106,7 @@ releases (which would take too much time), they can be built from
source to unblock the update process.
### FlashInfer
Here is how to build and install it from source with `torch2.7.0+cu128` in vLLM [Dockerfile](https://github.com/vllm-project/vllm/blob/27bebcd89792d5c4b08af7a65095759526f2f9e1/docker/Dockerfile#L259-L271):
```bash
@ -122,6 +122,7 @@ public location for immediate installation, such as [this FlashInfer wheel link]
team if you want to get the package published there.
### xFormers
Similar to FlashInfer, here is how to build and install xFormers from source:
```bash
@ -139,7 +140,7 @@ uv pip install --system \
### causal-conv1d
```
```bash
uv pip install 'git+https://github.com/Dao-AILab/causal-conv1d@v1.5.0.post8'
```

View File

@ -31,7 +31,7 @@ Features that fall under this policy include (at a minimum) the following:
The deprecation process consists of several clearly defined stages that span
multiple Y releases:
**1. Deprecated (Still On By Default)**
### 1. Deprecated (Still On By Default)
- **Action**: Feature is marked as deprecated.
- **Timeline**: A removal version is explicitly stated in the deprecation
@ -46,7 +46,7 @@ warning (e.g., "This will be removed in v0.10.0").
- GitHub Issue (RFC) for feedback
- Documentation and use of the `@typing_extensions.deprecated` decorator for Python APIs
**2.Deprecated (Off By Default)**
### 2.Deprecated (Off By Default)
- **Action**: Feature is disabled by default, but can still be re-enabled via a
CLI flag or environment variable. Feature throws an error when used without
@ -55,7 +55,7 @@ re-enabling.
while signaling imminent removal. Ensures any remaining usage is clearly
surfaced and blocks silent breakage before full removal.
**3. Removed**
### 3. Removed
- **Action**: Feature is completely removed from the codebase.
- **Note**: Only features that have passed through the previous deprecation

View File

@ -5,7 +5,12 @@
## Profile with PyTorch Profiler
We support tracing vLLM workers using the `torch.profiler` module. You can enable tracing by setting the `VLLM_TORCH_PROFILER_DIR` environment variable to the directory where you want to save the traces: `VLLM_TORCH_PROFILER_DIR=/mnt/traces/`
We support tracing vLLM workers using the `torch.profiler` module. You can enable tracing by setting the `VLLM_TORCH_PROFILER_DIR` environment variable to the directory where you want to save the traces: `VLLM_TORCH_PROFILER_DIR=/mnt/traces/`. Additionally, you can control the profiling content by specifying the following environment variables:
- `VLLM_TORCH_PROFILER_RECORD_SHAPES=1` to enable recording Tensor Shapes, off by default
- `VLLM_TORCH_PROFILER_WITH_PROFILE_MEMORY=1` to record memory, off by default
- `VLLM_TORCH_PROFILER_WITH_STACK=1` to enable recording stack information, on by default
- `VLLM_TORCH_PROFILER_WITH_FLOPS=1` to enable recording FLOPs, off by default
The OpenAI server also needs to be started with the `VLLM_TORCH_PROFILER_DIR` environment variable set.
@ -112,13 +117,13 @@ vllm bench serve \
In practice, you should set the `--duration` argument to a large value. Whenever you want the server to stop profiling, run:
```
```bash
nsys sessions list
```
to get the session id in the form of `profile-XXXXX`, then run:
```
```bash
nsys stop --session=profile-XXXXX
```

View File

@ -32,9 +32,9 @@ We prefer to keep all vulnerability-related communication on the security report
on GitHub. However, if you need to contact the VMT directly for an urgent issue,
you may contact the following individuals:
- Simon Mo - simon.mo@hey.com
- Russell Bryant - rbryant@redhat.com
- Huzaifa Sidhpurwala - huzaifas@redhat.com
- Simon Mo - <simon.mo@hey.com>
- Russell Bryant - <rbryant@redhat.com>
- Huzaifa Sidhpurwala - <huzaifas@redhat.com>
## Slack Discussion