Commit Graph

654 Commits

Author SHA1 Message Date
c53dc466b1 [Doc] Remove performance warning for auto_awq.md (#12743) 2025-02-04 22:43:11 -08:00
815079de8e [VLM] merged multimodal processor and V1 support for idefics3 (#12660)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-02-04 20:00:51 +08:00
d1ca7df84d [VLM] Merged multi-modal processor for InternVL-based models (#12553)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-02-04 16:44:52 +08:00
bb392af434 [Doc] Replace ibm-fms with ibm-ai-platform (#12709)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-02-04 07:05:04 +00:00
a1a2aaadb9 [Model]: Add transformers backend support (#11330)
# Adds support for `transformers` as a backend

Following https://github.com/huggingface/transformers/pull/35235, a
bunch of models should already be supported, we are ramping up support
for more models.

Thanks @Isotr0py for the TP support, and @hmellor for his help as well!
This includes: 
- `trust_remote_code=True` support: any model on the hub, if it
implements attention the correct way can be natively supported!!
- tensor parallel support

---------

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-02-03 21:30:38 +08:00
e64330910b [doc][misc] clarify VLLM_HOST_IP for multi-node inference (#12667)
As more and more people are trying deepseek models with multi-node
inference, https://github.com/vllm-project/vllm/issues/7815 becomes more
frequent. Let's give clear message to users.

Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-03 09:32:18 +08:00
e489ad7a21 [Misc] Add SPDX-License-Identifier headers to python source files (#12628)
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**

commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:18:24 2025 -0500

    Add SPDX license headers to python source files
    
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
    also be easily used by tools to help manage license compliance.
    
The Linux Foundation runs license scans against the codebase to help
ensure
    we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
    
    More information can be found on the SPDX site:
    
    - https://spdx.dev/learn/handling-license-info/
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:36:32 2025 -0500

    Check for SPDX headers using pre-commit
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

---------

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-02 11:58:18 -08:00
f256ebe4df [Hardware][Intel GPU] add XPU bf16 support (#12392)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-02-02 10:17:26 +00:00
44bbca78d7 [Doc] int4 w4a16 example (#12585)
Based on a request by @mgoin , with @kylesayrs we have added an example
doc for int4 w4a16 quantization, following the pre-existing int8 w8a8
quantization example and the example available in
[`llm-compressor`](https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_w4a16/llama3_example.py)

FIX #n/a (no issue created)

@kylesayrs and I have discussed a couple additional improvements for the
quantization docs. We will revisit at a later date, possibly including:
- A section for "choosing the correct quantization scheme/ compression
technique"
- Additional vision or audio calibration datasets

---------

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2025-01-31 15:38:48 -08:00
60808bd4c7 [Doc] Improve installation signposting (#12575)
- Make device tab names more explicit
- Add comprehensive list of devices to
https://docs.vllm.ai/en/latest/getting_started/installation/index.html
- Add `attention` blocks to the intro of all devices that don't have
pre-built wheels/images

---------

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-31 15:38:35 -08:00
60bcef000e [Docs][V1] Prefix caching design (#12598)
- Create v1 design document section in docs.
- Add prefix caching design doc.

@WoosukKwon @ywang96

---------

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
2025-01-31 12:30:46 -08:00
847f883232 [Git] Automatically sign-off commits (#12595)
It's very annoying when I forgot to add `-s` in `git commit` to
sign-off, because I then need to `git rebase HEAD~1 --signoff` and `git
push -f` to fix the DCO. This PR adds a hook to sign off commits
automatically when `-s` is missing to solve this problem. The only
change from the user side is now users have to install 2 hooks, so
instead of just

```
pre-commit install
```

Now we need to

```
pre-commit install --hook-type pre-commit --hook-type commit-msg
```

Note that even if users still only install the pre-commit hook, they
won't get any error in `git commit`. Just the sign-off hook won't run.

cc @hmellor @youkaichao

---------

Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
2025-01-31 12:30:33 -08:00
e3f7ff65e7 Add favicon to docs (#12611)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-31 09:20:34 -08:00
a2769032ca Set ?device={device} when changing tab in installation guides (#12560)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-30 00:05:42 -08:00
d93bf4da85 [Model] Refactoring of MiniCPM-V and add MiniCPM-o-2.6 support for vLLM (#12069)
Signed-off-by: hzh <hezhihui_thu@163.com>
Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Signed-off-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Oleg Mosalov <oleg@krai.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
Signed-off-by: Chenguang Li <757486878@qq.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Shanshan Shen <467638484@qq.com>
Signed-off-by: elijah <f1renze.142857@gmail.com>
Signed-off-by: Yikun <yikunkero@gmail.com>
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
Co-authored-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com>
Co-authored-by: shaochangxu <85155497+shaochangxu@users.noreply.github.com>
Co-authored-by: shaochangxu.scx <shaochangxu.scx@antgroup.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: sixgod <evethwillbeok@outlook.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Akshat Tripathi <Akshat.tripathi6568@gmail.com>
Co-authored-by: Oleg Mosalov <oleg@krai.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Avshalom Manevich <12231371+avshalomman@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Yangcheng Li <liyangcheng.lyc@alibaba-inc.com>
Co-authored-by: Siyuan Li <94890248+liaoyanqing666@users.noreply.github.com>
Co-authored-by: Concurrensee <yida.wu@amd.com>
Co-authored-by: Chenguang Li <757486878@qq.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Alex Brooks <alex.brooks@ibm.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Shanshan Shen <467638484@qq.com>
Co-authored-by: elijah <30852919+e1ijah1@users.noreply.github.com>
Co-authored-by: Yikun Jiang <yikunkero@gmail.com>
Co-authored-by: Steve Luo <36296769+SunflowerAries@users.noreply.github.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: TJian <tunjian1996@gmail.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: maang-h <55082429+maang-h@users.noreply.github.com>
Co-authored-by: Elfie Guo <164945471+elfiegg@users.noreply.github.com>
Co-authored-by: Rui Qiao <161574667+ruisearch42@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-01-29 09:24:59 +00:00
dd6a3a02cb [Doc] Convert docs to use colon fences (#12471)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-29 11:38:29 +08:00
a7e3eba66f [Frontend] Support reasoning content for deepseek r1 (#12473)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
2025-01-29 11:38:08 +08:00
925d2f1908 [Doc] Fix typo for x86 CPU installation (#12514)
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
2025-01-28 16:37:10 +00:00
8f58a51358 [VLM] Merged multi-modal processor and V1 support for Qwen-VL (#12504)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-28 16:25:05 +00:00
582cf78798 [DOC] Add link to vLLM blog (#12460)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-27 03:46:19 +00:00
0034b09ceb [Frontend] Rerank API (Jina- and Cohere-compatible API) (#12376)
Signed-off-by: Kyle Mistele <kyle@mistele.com>
2025-01-26 19:58:45 -07:00
9a0f3bdbe5 [Hardware][Gaudi][Doc] Add missing step in setup instructions (#12382) 2025-01-24 09:43:49 +00:00
c5cffcd0cd [Docs] Update spec decode + structured output in compat matrix (#12373)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-01-24 01:15:52 +00:00
682b55bc07 [Docs] Add meetup slides (#12345)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-01-23 14:10:03 -08:00
2cbeedad09 [Docs] Document Phi-4 support (#12362)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-23 19:18:51 +00:00
e97f802b2d [FP8][Kernel] Dynamic kv cache scaling factors computation (#11906)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
2025-01-23 18:04:03 +00:00
d07efb31c5 [Doc] Troubleshooting errors during model inspection (#12351)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-23 22:46:58 +08:00
511627445e [doc] explain common errors around torch.compile (#12340)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-23 14:56:02 +08:00
7551a34032 [Docs] Document vulnerability disclosure process (#12326)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-01-23 03:44:09 +00:00
01a55941f5 [Docs] Update FP8 KV Cache documentation (#12238)
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-01-23 11:18:09 +08:00
09ccc9c8f7 [Documentation][AMD] Add information about prebuilt ROCm vLLM docker for perf validation purpose (#12281)
Signed-off-by: Hongxia Yang <hongxyan@amd.com>
2025-01-22 07:49:22 +08:00
96912550c8 [Misc] Rename MultiModalInputsV2 -> MultiModalInputs (#12244)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-21 07:31:19 +00:00
d4b62d4641 [AMD][Build] Porting dockerfiles from the ROCm/vllm fork (#11777)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-01-21 12:22:23 +08:00
83609791d2 [Model] Add Qwen2 PRM model support (#12202)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-20 14:59:46 +08:00
3ea7b94523 Move linting to pre-commit (#11975)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-20 14:58:01 +08:00
81763c58a0 [V1] Add V1 support of Qwen2-VL (#12128)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: imkero <kerorek@outlook.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-19 19:52:13 +08:00
02798ecabe [Model] Port deepseek-vl2 processor, remove dependency (#12169)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-18 13:59:39 +08:00
c09503ddd6 [AMD][CI/Build][Bugfix] use pytorch stale wheel (#12172)
Signed-off-by: hongxyan <hongxyan@amd.com>
2025-01-18 11:15:53 +08:00
1475847a14 [Doc] Add instructions on using Podman when SELinux is active (#12136)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-17 04:45:36 +00:00
62b06ba23d [Model] Add support for deepseek-vl2-tiny model (#12068)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-16 17:14:48 +00:00
f8ef146f03 [Doc] Add documentation for specifying model architecture (#12105) 2025-01-16 15:53:43 +08:00
97eb97b5a4 [Model]: Support internlm3 (#12037) 2025-01-15 11:35:17 +00:00
3f9b7ab9f5 [Doc] Update examples to remove SparseAutoModelForCausalLM (#12062)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-01-15 06:36:01 +00:00
c9d6ff530b Explain where the engine args go when using Docker (#12041)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-14 16:05:50 +00:00
8a1f938e6f [Doc] Update Quantization Hardware Support Documentation (#12025)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-01-14 04:37:52 +00:00
1a401252b5 [Docs] Add Sky Computing Lab to project intro (#12019)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-01-13 17:24:36 -08:00
e8c23ff989 [Doc] Organise installation documentation into categories and tabs (#11935)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-13 12:27:36 +00:00
cd8249903f [Doc][V1] Update model implementation guide for V1 support (#11998)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-01-13 11:58:54 +00:00
8bddb73512 [Hardware][CPU] Multi-LoRA implementation for the CPU backend (#11100)
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Oleg Mosalov <oleg@krai.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Oleg Mosalov <oleg@krai.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-01-12 13:01:52 +00:00
f967e51f38 [Model] Initialize support for Deepseek-VL2 models (#11578)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-01-12 00:17:24 -08:00