Commit Graph

699 Commits

Author SHA1 Message Date
5157338ed9 [Misc] Improve LoRA spelling (#13831) 2025-02-25 23:43:01 -08:00
07c4353057 [Model] Support Grok1 (#13795)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-02-26 01:07:12 +00:00
cdc1fa12eb Remove unused kwargs from model definitions (#13555) 2025-02-24 17:13:52 -08:00
444b0f0f62 [Misc][Docs] Raise error when flashinfer is not installed and VLLM_ATTENTION_BACKEND is set (#12513)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-02-24 10:43:21 -05:00
8354f6640c [Doc] Dockerfile instructions for optional dependencies and dev transformers (#13699) 2025-02-22 06:04:31 -08:00
2cb8c1540e [Metrics] Add --show-hidden-metrics-for-version CLI arg (#13295) 2025-02-22 00:20:45 -08:00
8c0dd3d4df docs: Add a note on full CI run in contributing guide (#13646) 2025-02-21 21:53:59 -08:00
1c3c975766 [FEATURE] Enables /score endpoint for embedding models (#12846) 2025-02-20 22:09:47 -08:00
44c33f01f3 Add llmaz as another integration (#13643)
Signed-off-by: kerthcet <kerthcet@gmail.com>
2025-02-21 03:52:40 +00:00
bfbc0b32c6 [Frontend] Add backend-specific options for guided decoding (#13505)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-02-20 15:07:58 -05:00
992e5c3d34 Merge similar examples in offline_inference into single basic example (#12737) 2025-02-20 04:53:51 -08:00
512368e34a [Misc] Qwen2.5 VL support LoRA (#13261) 2025-02-19 18:37:55 -08:00
01c184b8f3 Fix copyright year to auto get current year (#13561) 2025-02-19 16:55:34 +00:00
ad5a35c21b [doc] clarify multi-node serving doc (#13558)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-19 22:32:17 +08:00
52ce14d31f [doc] clarify profiling is only for developers (#13554)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-19 20:55:58 +08:00
fd84857f64 [Doc] Add clarification note regarding paligemma (#13511) 2025-02-18 22:24:03 -08:00
00b69c2d27 [Misc] Remove dangling references to --use-v2-block-manager (#13492)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-19 03:37:26 +00:00
7b203b7694 [misc] fix debugging code (#13487)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-02-18 09:37:11 -08:00
2358ca527b [Doc]: Improve feature tables (#13224)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-18 18:52:39 +08:00
67ef8f666a [Model] Enable quantization support for transformers backend (#12960) 2025-02-17 19:52:47 -08:00
7b623fca0b [VLM] Check required fields before initializing field config in DictEmbeddingItems (#13380) 2025-02-17 01:36:07 -08:00
f857311d13 Fix spelling error in index.md (#13369) 2025-02-17 06:53:20 +00:00
46cdd59577 [Feature][Spec Decode] Simplify the use of Eagle Spec Decode (#12304)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-02-16 19:32:26 -08:00
da833b0aee [Docs] Change myenv to vllm. Update python_env_setup.inc.md (#13325) 2025-02-16 16:04:21 +00:00
b7d309860e [V1] Update doc and examples for H2O-VL (#13349)
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-02-16 10:35:54 +00:00
367cb8ce8c [Doc] [2/N] Add Fuyu E2E example for multimodal processor (#13331) 2025-02-15 07:06:23 -08:00
579d7a63b2 [Bugfix][Docs] Fix offline Whisper (#13274) 2025-02-14 21:32:37 -08:00
d84cef76eb [Frontend] Add /v1/audio/transcriptions OpenAI API endpoint (#12909) 2025-02-13 07:23:45 -08:00
1bc3b5e71b [VLM] Separate text-only and vision variants of the same model architecture (#13157) 2025-02-13 06:19:15 -08:00
c9d3ecf016 [VLM] Merged multi-modal processor for Molmo (#12966) 2025-02-13 04:34:00 -08:00
d46d490c27 [Frontend] Move CLI code into vllm.cmd package (#12971) 2025-02-12 23:12:21 -08:00
60c68df6d1 [Build] Automatically use the wheel of the base commit with Python-only build (#13178) 2025-02-12 23:10:28 -08:00
deb6c1c6b4 [Doc] Improve OpenVINO installation doc (#13102)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-02-11 18:02:46 +00:00
08b2d845d6 [Model] Ultravox Model: Support v0.5 Release (#12912)
Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>
2025-02-10 22:02:48 +00:00
51f0b5f7f6 [Bugfix] Clean up and fix multi-modal processors (#13012)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-02-10 10:45:21 +00:00
243137143c [Doc] Add link to tool_choice tracking issue in tool_calling.md (#13003)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-10 06:09:33 +00:00
86222a3dab [VLM] Merged multi-modal processor for GLM4V (#12449)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-02-08 20:32:16 +00:00
8a69e0e20e [CI/Build] Auto-fix Markdown files (#12941) 2025-02-08 04:25:15 -08:00
256a2d29dc [Doc] Correct HF repository for TeleChat2 models (#12949) 2025-02-08 01:42:15 -08:00
eaa92d4437 [ROCm] [Feature] [Doc] [Dockerfile] [BugFix] Support Per-Token-Activation Per-Channel-Weight FP8 Quantization Inferencing (#12501) 2025-02-07 08:13:43 -08:00
afe74f7a96 [Doc] double quote cmake package in build.inc.md (#12840) 2025-02-06 09:17:55 -08:00
d88506dda4 [Model] LoRA Support for Ultravox model (#11253) 2025-02-05 19:54:13 -08:00
75404d041b [VLM] Update compatibility with transformers 4.49 2025-02-05 19:09:45 -08:00
bf3b79efb8 [VLM] Qwen2.5-VL 2025-02-05 13:31:38 -08:00
9a5b1554b4 [Docs] Drop duplicate [source] links 2025-02-05 13:30:50 -08:00
c53dc466b1 [Doc] Remove performance warning for auto_awq.md (#12743) 2025-02-04 22:43:11 -08:00
815079de8e [VLM] merged multimodal processor and V1 support for idefics3 (#12660)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-02-04 20:00:51 +08:00
d1ca7df84d [VLM] Merged multi-modal processor for InternVL-based models (#12553)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-02-04 16:44:52 +08:00
bb392af434 [Doc] Replace ibm-fms with ibm-ai-platform (#12709)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-02-04 07:05:04 +00:00
a1a2aaadb9 [Model]: Add transformers backend support (#11330)
# Adds support for `transformers` as a backend

Following https://github.com/huggingface/transformers/pull/35235, a
bunch of models should already be supported, we are ramping up support
for more models.

Thanks @Isotr0py for the TP support, and @hmellor for his help as well!
This includes: 
- `trust_remote_code=True` support: any model on the hub, if it
implements attention the correct way can be natively supported!!
- tensor parallel support

---------

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-02-03 21:30:38 +08:00