Commit Graph

334 Commits

Author SHA1 Message Date
d394787e52 Pixtral (#8377)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-11 14:41:55 -07:00
3b7fea770f [Model][VLM] Add Qwen2-VL model support (#7905)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-11 09:31:19 -07:00
6a512a00df [model] Support for Llava-Next-Video model (#7559)
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-09-10 22:21:36 -07:00
a1d874224d Add NVIDIA Meetup slides, announce AMD meetup, and add contact info (#8319) 2024-09-09 23:21:00 -07:00
e807125936 [Model][VLM] Support multi-images inputs for InternVL2 models (#8201) 2024-09-07 16:38:23 +08:00
2f707fcb35 [Model] Multi-input support for LLaVA (#8238) 2024-09-07 02:57:24 +00:00
12dd715807 [misc] [doc] [frontend] LLM torch profiler support (#7943) 2024-09-06 17:48:48 -07:00
23f322297f [Misc] Remove SqueezeLLM (#8220) 2024-09-06 16:29:03 -06:00
db3bf7c991 [Core] Support load and unload LoRA in api server (#6566)
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2024-09-05 18:10:33 -07:00
2febcf2777 [Documentation][Spec Decode] Add documentation about lossless guarantees in Speculative Decoding in vLLM (#7962) 2024-09-05 16:25:29 -04:00
9da25a88aa [MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-05 12:48:10 +00:00
288a938872 [Doc] Indicate more information about supported modalities (#8181) 2024-09-05 10:51:53 +00:00
e02ce498be [Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
2024-09-04 13:18:13 -07:00
61f4a93d14 [TPU][Bugfix] Use XLA rank for persistent cache path (#8137) 2024-09-03 18:35:33 -07:00
1248e8506a [Model] Adding support for MSFT Phi-3.5-MoE (#7729)
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Zeqi Lin <zelin@microsoft.com>
Co-authored-by: Zeqi Lin <Zeqi.Lin@microsoft.com>
2024-08-30 13:42:57 -06:00
058344f89a [Frontend]-config-cli-args (#7737)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Kaunil Dhruv <kaunil_dhruv@intuit.com>
2024-08-30 08:21:02 -07:00
dc13e99348 [MODEL] add Exaone model support (#7819) 2024-08-29 23:34:20 -07:00
8c56e57def [Doc] fix 404 link (#7966) 2024-08-28 13:54:23 -07:00
eeffde1ac0 [TPU] Upgrade PyTorch XLA nightly (#7967) 2024-08-28 13:10:21 -07:00
98c12cffe5 [Doc] fix the autoAWQ example (#7937) 2024-08-28 12:12:32 +00:00
fab5f53e2d [Core][VLM] Stack multimodal tensors to represent multiple images within each prompt (#7902) 2024-08-28 01:53:56 +00:00
57792ed469 [Doc] Fix incorrect docs from #7615 (#7788) 2024-08-22 10:02:06 -07:00
df1a21131d [Model] Fix Phi-3.5-vision-instruct 'num_crops' issue (#7710) 2024-08-22 09:36:24 +08:00
1ca0d4f86b [Model] Add UltravoxModel and UltravoxConfig (#7615) 2024-08-21 22:49:39 +00:00
dd53c4b023 [misc] Add Torch profiler support (#7451)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-08-21 15:39:26 -07:00
4506641212 [Doc] Section for Multimodal Language Models (#7719) 2024-08-20 23:24:01 -07:00
398521ad19 [OpenVINO] Updated documentation (#7687) 2024-08-20 07:33:56 -06:00
d4f0f17b02 [Doc] Update quantization supported hardware table (#7595) 2024-08-16 13:59:27 -07:00
b3f4e17935 [Doc] Add docs for llmcompressor INT8 and FP8 checkpoints (#7444) 2024-08-16 13:59:16 -07:00
22b39e11f2 llama_index serving integration documentation (#6973)
Co-authored-by: pavanmantha <pavan.mantha@thevaslabs.io>
2024-08-14 15:38:37 -07:00
3f674a49b5 [VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126) 2024-08-14 17:55:42 +00:00
199adbb7cf [doc] update test script to include cudagraph (#7501) 2024-08-13 21:52:58 -07:00
dd164d72f3 [Bugfix][Docs] Update list of mock imports (#7493) 2024-08-13 20:37:30 -07:00
a08df8322e [TPU] Support multi-host inference (#7457) 2024-08-13 16:31:20 -07:00
00c3d68e45 [Frontend][Core] Add plumbing to support audio language models (#7446) 2024-08-13 17:39:33 +00:00
e20233d361 Revert "[Doc] Update supported_hardware.rst (#7276)" (#7467) 2024-08-13 01:37:08 -07:00
a046f86397 [Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-08-12 22:47:41 +00:00
e6e42e4b17 [Core][VLM] Support image embeddings as input (#6613) 2024-08-12 16:16:06 +08:00
f020a6297e [Docs] Update readme (#7316) 2024-08-11 17:13:37 -07:00
02b1988b9f [Doc] building vLLM with VLLM_TARGET_DEVICE=empty (#7403) 2024-08-11 14:38:17 -07:00
90bab18f24 [TPU] Use mark_dynamic to reduce compilation time (#7340) 2024-08-10 18:12:22 -07:00
5923532e15 Add Skywork AI as Sponsor (#7314) 2024-08-08 13:59:57 -07:00
757ac70a64 [Model] Rename MiniCPMVQwen2 to MiniCPMV2.6 (#7273) 2024-08-08 14:02:41 +00:00
6d94420246 [Doc] Update supported_hardware.rst (#7276) 2024-08-07 14:21:50 -07:00
0e12cd67a8 [Doc] add online speculative decoding example (#7243) 2024-08-07 09:58:02 -07:00
80cbe10c59 [OpenVINO] migrate to latest dependencies versions (#7251) 2024-08-07 09:49:10 -07:00
2385c8f374 [Doc] Mock new dependencies for documentation (#7245) 2024-08-07 06:43:03 +00:00
789937af2e [Doc] [SpecDecode] Update MLPSpeculator documentation (#7100)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-08-05 23:29:43 +00:00
4db5176d97 bump version to v0.5.4 (#7139) 2024-08-05 14:39:48 -07:00
179a6a36f2 [Model]Refactor MiniCPMV (#7020)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 08:12:41 +00:00