Commit Graph

424 Commits

Author SHA1 Message Date
211fe91aa8 [TPU] Correctly profile peak memory usage & Upgrade PyTorch XLA (#9438) 2024-10-30 09:41:38 +00:00
04a3ae0aca [Bugfix] Fix multi nodes TP+PP for XPU (#8884)
Signed-off-by: YiSheng5 <syhm@mail.ustc.edu.cn>
Signed-off-by: yan ma <yan.ma@intel.com>
Co-authored-by: YiSheng5 <syhm@mail.ustc.edu.cn>
2024-10-29 21:34:45 -07:00
882a1ad0de [Model] tool calling support for ibm-granite/granite-20b-functioncalling (#8339)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
2024-10-29 15:07:37 -07:00
c5d7fb9ddc [Doc] fix third-party model example (#9771)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-10-28 19:39:21 -07:00
6650e6a930 [Model] Add classification Task with Qwen2ForSequenceClassification (#9704)
Signed-off-by: Kevin-Yang <ykcha9@gmail.com>
Co-authored-by: Kevin-Yang <ykcha9@gmail.com>
2024-10-26 17:53:35 +00:00
228cfbd03f [Doc] Improve quickstart documentation (#9256)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-10-25 14:32:10 -07:00
b979143d5b [Doc] Move additional tips/notes to the top (#9647) 2024-10-24 09:43:59 +00:00
8a02cd045a [torch.compile] Adding torch compile annotations to some models (#9639)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2024-10-24 00:54:57 -07:00
836e8ef6ee [Bugfix] Fix PP for ChatGLM and Molmo (#9422) 2024-10-24 06:12:05 +00:00
33bab41060 [Bugfix]: Make chat content text allow type content (#9358)
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
2024-10-24 05:05:49 +00:00
fc6c274626 [Model] Add Qwen2-Audio model support (#9248)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-23 17:54:22 +00:00
831540cf04 [Model] Support E5-V (#9576) 2024-10-23 11:35:29 +08:00
208cb34c81 [Doc]: Update tensorizer docs to include vllm[tensorizer] (#7889)
Co-authored-by: Kaunil Dhruv <dhruv.kaunil@gmail.com>
2024-10-22 15:43:25 -07:00
32a1ee74a0 [Hardware][Intel CPU][DOC] Update docs for CPU backend (#6212)
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Gubrud, Aaron D <aaron.d.gubrud@intel.com>
Co-authored-by: adgubrud <96072084+adgubrud@users.noreply.github.com>
2024-10-22 10:38:04 -07:00
bb392ea2d2 [Model][VLM] Initialize support for Mono-InternVL model (#9528) 2024-10-22 16:01:46 +00:00
f7db5f0fa9 [Doc] Use shell code-blocks and fix section headers (#9508)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-10-22 06:43:24 +00:00
d621c43df7 [doc] fix format (#9562) 2024-10-21 13:54:57 -07:00
f6b97293aa [Model] FalconMamba Support (#9325) 2024-10-21 12:50:16 -04:00
3921a2f29e [Model] Support Pixtral models in the HF Transformers format (#9036) 2024-10-18 13:29:56 -06:00
051eaf6db3 [Model] Add user-configurable task for models that support both generation and embedding (#9424) 2024-10-18 11:31:58 -07:00
d2b1bf55ec [Frontend][Feature] Add jamba tool parser (#9154) 2024-10-18 10:27:48 +00:00
81ede99ca4 [Core] Deprecating block manager v1 and make block manager v2 default (#8704)
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
2024-10-17 11:38:15 -05:00
5eda21e773 [Hardware][CPU] compressed-tensor INT8 W8A8 AZP support (#9344) 2024-10-17 12:21:04 -04:00
5b8a1fde84 [Model][Bugfix] Add FATReLU activation and support for openbmb/MiniCPM-S-1B-sft (#9396) 2024-10-16 16:40:24 +00:00
59230ef32b [Misc] Consolidate example usage of OpenAI client for multimodal models (#9412)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-16 11:20:51 +00:00
cee711fdbb [Core] Rename input data types (#8688) 2024-10-16 10:49:37 +00:00
7abba39ee6 [Model] VLM2Vec, the first multimodal embedding model in vLLM (#9303) 2024-10-16 14:31:00 +08:00
8e836d982a [Doc] Fix code formatting in spec_decode.rst (#9348) 2024-10-14 21:29:11 -07:00
169b530607 [Bugfix] Clean up some cruft in mamba.py (#9343) 2024-10-15 00:24:25 +00:00
dfe43a2071 [Model] Molmo vLLM Integration (#9016)
Co-authored-by: sanghol <sanghol@allenai.org>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-10-14 07:56:24 -07:00
2b184ddd4f [Misc][Installation] Improve source installation script and doc (#9309)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-10-12 09:36:40 -07:00
8baf85e4e9 [Doc] Compatibility matrix for mutual exclusive features (#8512)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
2024-10-11 11:18:50 -07:00
6cf1167c1a [Model] Add GLM-4v support and meet vllm==0.6.2 (#9242) 2024-10-11 17:36:13 +00:00
7342a7d7f8 [Model] Support Mamba (#6484) 2024-10-11 15:40:06 +00:00
e808156f30 [Misc] Collect model support info in a single process per model (#9233) 2024-10-11 11:08:11 +00:00
f990bab2a4 [Doc][Neuron] add note to neuron documentation about resolving triton issue (#9257)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
2024-10-10 23:36:32 +00:00
055f3270d4 [Doc] Improve debugging documentation (#9204)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-10-10 10:48:51 -07:00
04de9057ab [Model] support input image embedding for minicpmv (#9237) 2024-10-10 15:00:47 +00:00
ca77dd7a44 [Hardware][CPU] Support AWQ for CPU backend (#7515) 2024-10-09 10:28:08 -06:00
dc4aea677a [Doc] Fix VLM prompt placeholder sample bug (#9170) 2024-10-09 08:59:42 +00:00
acce7630c1 Update link to KServe deployment guide (#9173) 2024-10-09 03:58:49 +00:00
9ba0bd6aa6 Add lm-eval directly to requirements-test.txt (#9161) 2024-10-08 18:22:31 -07:00
de24046fcd [Doc] Improve contributing and installation documentation (#9132)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
2024-10-08 20:22:08 +00:00
1874c6a1b0 [Doc] Update vlm.rst to include an example on videos (#9155)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-10-08 18:12:29 +00:00
93cf74a8a7 [Doc]: Add deploying_with_k8s guide (#8451) 2024-10-07 13:31:45 -07:00
151ef4efd2 [Model] Support NVLM-D and fix QK Norm in InternViT (#9045)
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2024-10-07 11:55:12 +00:00
b22b798471 [Model] PP support for embedding models and update docs (#9090)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-10-06 16:35:27 +08:00
f22619fe96 [Misc] Remove user-facing error for removed VLM args (#9104) 2024-10-06 01:33:52 -07:00
5df1834895 [Bugfix] Fix order of arguments matters in config.yaml (#8960) 2024-10-05 17:35:11 +00:00
26aa325f4f [Core][VLM] Test registration for OOT multimodal models (#8717)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-04 10:38:25 -07:00