Commit Graph

307 Commits

Author SHA1 Message Date
d4f0f17b02 [Doc] Update quantization supported hardware table (#7595) 2024-08-16 13:59:27 -07:00
b3f4e17935 [Doc] Add docs for llmcompressor INT8 and FP8 checkpoints (#7444) 2024-08-16 13:59:16 -07:00
22b39e11f2 llama_index serving integration documentation (#6973)
Co-authored-by: pavanmantha <pavan.mantha@thevaslabs.io>
2024-08-14 15:38:37 -07:00
3f674a49b5 [VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126) 2024-08-14 17:55:42 +00:00
199adbb7cf [doc] update test script to include cudagraph (#7501) 2024-08-13 21:52:58 -07:00
dd164d72f3 [Bugfix][Docs] Update list of mock imports (#7493) 2024-08-13 20:37:30 -07:00
a08df8322e [TPU] Support multi-host inference (#7457) 2024-08-13 16:31:20 -07:00
00c3d68e45 [Frontend][Core] Add plumbing to support audio language models (#7446) 2024-08-13 17:39:33 +00:00
e20233d361 Revert "[Doc] Update supported_hardware.rst (#7276)" (#7467) 2024-08-13 01:37:08 -07:00
a046f86397 [Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-08-12 22:47:41 +00:00
e6e42e4b17 [Core][VLM] Support image embeddings as input (#6613) 2024-08-12 16:16:06 +08:00
f020a6297e [Docs] Update readme (#7316) 2024-08-11 17:13:37 -07:00
02b1988b9f [Doc] building vLLM with VLLM_TARGET_DEVICE=empty (#7403) 2024-08-11 14:38:17 -07:00
90bab18f24 [TPU] Use mark_dynamic to reduce compilation time (#7340) 2024-08-10 18:12:22 -07:00
5923532e15 Add Skywork AI as Sponsor (#7314) 2024-08-08 13:59:57 -07:00
757ac70a64 [Model] Rename MiniCPMVQwen2 to MiniCPMV2.6 (#7273) 2024-08-08 14:02:41 +00:00
6d94420246 [Doc] Update supported_hardware.rst (#7276) 2024-08-07 14:21:50 -07:00
0e12cd67a8 [Doc] add online speculative decoding example (#7243) 2024-08-07 09:58:02 -07:00
80cbe10c59 [OpenVINO] migrate to latest dependencies versions (#7251) 2024-08-07 09:49:10 -07:00
2385c8f374 [Doc] Mock new dependencies for documentation (#7245) 2024-08-07 06:43:03 +00:00
789937af2e [Doc] [SpecDecode] Update MLPSpeculator documentation (#7100)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-08-05 23:29:43 +00:00
4db5176d97 bump version to v0.5.4 (#7139) 2024-08-05 14:39:48 -07:00
179a6a36f2 [Model]Refactor MiniCPMV (#7020)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 08:12:41 +00:00
654bc5ca49 Support for guided decoding for offline LLM (#6878)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-08-04 03:12:09 +00:00
b482b9a5b1 [CI/Build] Add support for Python 3.12 (#7035) 2024-08-02 13:51:22 -07:00
fc912e0886 [Models] Support Qwen model with PP (#6974)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-08-01 12:40:43 -07:00
7ecee34321 [Kernel][RFC] Refactor the punica kernel based on Triton (#5036) 2024-07-31 17:12:24 -07:00
2f4e108f75 [Bugfix] Clean up MiniCPM-V (#6939)
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-07-31 14:39:19 +00:00
f230cc2ca6 [Bugfix] Fix broadcasting logic for multi_modal_kwargs (#6836) 2024-07-31 10:38:45 +08:00
5895b24677 [OpenVINO] Updated OpenVINO requirements and build docs (#6948) 2024-07-30 11:33:01 -07:00
7cbd9ec7a9 [Model] Initialize support for InternVL2 series models (#6514)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-29 10:16:30 +00:00
fad5576c58 [TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856) 2024-07-27 10:28:33 -07:00
f954d0715c [Docs] Add RunLLM chat widget (#6857) 2024-07-27 09:24:46 -07:00
1ad86acf17 [Model] Initial support for BLIP-2 (#5920)
Co-authored-by: ywang96 <ywang@roblox.com>
2024-07-27 11:53:07 +00:00
ecb33a28cb [CI/Build][Doc] Update CI and Doc for VLM example changes (#6860) 2024-07-27 09:54:14 +00:00
c53041ae3b [Doc] Add missing mock import to docs conf.py (#6834) 2024-07-27 04:47:33 +00:00
3c3012398e [Doc] add VLLM_TARGET_DEVICE=neuron to documentation for neuron (#6844)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
2024-07-26 20:20:16 -07:00
ced36cd89b [ROCm] Upgrade PyTorch nightly version (#6845) 2024-07-26 20:16:13 -07:00
150a1ffbfd [Doc] Update SkyPilot doc for wrong indents and instructions for update service (#4283) 2024-07-26 14:39:10 -07:00
281977bd6e [Doc] Add Nemotron to supported model docs (#6843) 2024-07-26 17:32:44 -04:00
3bbb4936dc [Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125) 2024-07-26 13:50:10 -07:00
85ad7e2d01 [doc][debugging] add known issues for hangs (#6816) 2024-07-25 21:48:05 -07:00
b7215de2c5 [Docs] Publish 5th meetup slides (#6799) 2024-07-25 16:47:55 -07:00
f3ff63c3f4 [doc][distributed] improve multinode serving doc (#6804) 2024-07-25 15:38:32 -07:00
6a1e25b151 [Doc] Add documentations for nightly benchmarks (#6412) 2024-07-25 11:57:16 -07:00
9e169a4c61 [Model] Adding support for MiniCPM-V (#4087) 2024-07-24 20:59:30 -07:00
d88c458f44 [Doc][AMD][ROCm]Added tips to refer to mi300x tuning guide for mi300x users (#6754) 2024-07-24 14:32:57 -07:00
ccc4a73257 [Docs][ROCm] Detailed instructions to build from source (#6680) 2024-07-24 01:07:23 -07:00
87525fab92 [bitsandbytes]: support read bnb pre-quantized model (#5753)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-07-23 23:45:09 +00:00
71950af726 [doc][distributed] fix doc argument order (#6691) 2024-07-23 08:55:33 -07:00