Commit Graph

356 Commits

Author SHA1 Message Date
13f9f7a3d0 [[Misc]Upgrade bitsandbytes to the latest version 0.44.0 (#8768) 2024-09-24 17:08:55 -07:00
3185fb0cca Revert "[Core] Rename PromptInputs to PromptType, and inputs to prompt" (#8750) 2024-09-24 05:45:20 +00:00
530821d00c [Hardware][AMD] ROCm6.2 upgrade (#8674) 2024-09-23 18:52:39 -07:00
ee5f34b1c2 [CI/Build] use setuptools-scm to set __version__ (#4738)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-09-23 09:44:26 -07:00
d23679eb99 [Bugfix] fix docker build for xpu (#8652) 2024-09-22 22:54:18 -07:00
d4a2ac8302 [build] enable existing pytorch (for GH200, aarch64, nightly) (#8713) 2024-09-22 12:47:54 -07:00
5b59532760 [Model][VLM] Add LLaVA-Onevision model support (#8486)
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-22 10:51:44 -07:00
4dfdf43196 [Doc] Fix typo in AMD installation guide (#8689) 2024-09-21 00:24:12 -07:00
0057894ef7 [Core] Rename PromptInputs and inputs(#8673) 2024-09-20 19:00:54 -07:00
7c8566aa4f [Doc] neuron documentation update (#8671)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
2024-09-20 15:04:37 -07:00
3b63de9353 [Model] Add OLMoE (#7922) 2024-09-20 09:31:41 -07:00
260d40b5ea [Core] Support Lora lineage and base model metadata management (#6315) 2024-09-20 06:20:56 +00:00
ea4647b7d7 [Doc] Add documentation for GGUF quantization (#8618) 2024-09-19 13:15:55 -06:00
e18749ff09 [Model] Support Solar Model (#8386)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-09-18 11:04:00 -06:00
7c7714d856 [Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH (#8157)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-09-18 13:56:58 +00:00
fa0c114fad [doc] improve installation doc (#8550)
Co-authored-by: Andy Dai <76841985+Imss27@users.noreply.github.com>
2024-09-17 16:24:06 -07:00
2759a43a26 [doc] update doc on testing and debugging (#8514) 2024-09-16 12:10:23 -07:00
8a0cf1ddc3 [Model] support minicpm3 (#8297)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-14 14:50:26 +00:00
f57092c00b [Doc] Add oneDNN installation to CPU backend documentation (#8467) 2024-09-13 18:06:30 +00:00
a84e598e21 [CI/Build] Reorganize models tests (#7820) 2024-09-13 10:20:06 -07:00
cab69a15e4 [doc] recommend pip instead of conda (#8446) 2024-09-12 23:52:41 -07:00
c6202daeed [Model] Support multiple images for qwen-vl (#8247)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-12 10:10:54 -07:00
d394787e52 Pixtral (#8377)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-09-11 14:41:55 -07:00
3b7fea770f [Model][VLM] Add Qwen2-VL model support (#7905)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-11 09:31:19 -07:00
6a512a00df [model] Support for Llava-Next-Video model (#7559)
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-09-10 22:21:36 -07:00
a1d874224d Add NVIDIA Meetup slides, announce AMD meetup, and add contact info (#8319) 2024-09-09 23:21:00 -07:00
e807125936 [Model][VLM] Support multi-images inputs for InternVL2 models (#8201) 2024-09-07 16:38:23 +08:00
2f707fcb35 [Model] Multi-input support for LLaVA (#8238) 2024-09-07 02:57:24 +00:00
12dd715807 [misc] [doc] [frontend] LLM torch profiler support (#7943) 2024-09-06 17:48:48 -07:00
23f322297f [Misc] Remove SqueezeLLM (#8220) 2024-09-06 16:29:03 -06:00
db3bf7c991 [Core] Support load and unload LoRA in api server (#6566)
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2024-09-05 18:10:33 -07:00
2febcf2777 [Documentation][Spec Decode] Add documentation about lossless guarantees in Speculative Decoding in vLLM (#7962) 2024-09-05 16:25:29 -04:00
9da25a88aa [MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-09-05 12:48:10 +00:00
288a938872 [Doc] Indicate more information about supported modalities (#8181) 2024-09-05 10:51:53 +00:00
e02ce498be [Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
2024-09-04 13:18:13 -07:00
61f4a93d14 [TPU][Bugfix] Use XLA rank for persistent cache path (#8137) 2024-09-03 18:35:33 -07:00
1248e8506a [Model] Adding support for MSFT Phi-3.5-MoE (#7729)
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Zeqi Lin <zelin@microsoft.com>
Co-authored-by: Zeqi Lin <Zeqi.Lin@microsoft.com>
2024-08-30 13:42:57 -06:00
058344f89a [Frontend]-config-cli-args (#7737)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Kaunil Dhruv <kaunil_dhruv@intuit.com>
2024-08-30 08:21:02 -07:00
dc13e99348 [MODEL] add Exaone model support (#7819) 2024-08-29 23:34:20 -07:00
8c56e57def [Doc] fix 404 link (#7966) 2024-08-28 13:54:23 -07:00
eeffde1ac0 [TPU] Upgrade PyTorch XLA nightly (#7967) 2024-08-28 13:10:21 -07:00
98c12cffe5 [Doc] fix the autoAWQ example (#7937) 2024-08-28 12:12:32 +00:00
fab5f53e2d [Core][VLM] Stack multimodal tensors to represent multiple images within each prompt (#7902) 2024-08-28 01:53:56 +00:00
57792ed469 [Doc] Fix incorrect docs from #7615 (#7788) 2024-08-22 10:02:06 -07:00
df1a21131d [Model] Fix Phi-3.5-vision-instruct 'num_crops' issue (#7710) 2024-08-22 09:36:24 +08:00
1ca0d4f86b [Model] Add UltravoxModel and UltravoxConfig (#7615) 2024-08-21 22:49:39 +00:00
dd53c4b023 [misc] Add Torch profiler support (#7451)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
2024-08-21 15:39:26 -07:00
4506641212 [Doc] Section for Multimodal Language Models (#7719) 2024-08-20 23:24:01 -07:00
398521ad19 [OpenVINO] Updated documentation (#7687) 2024-08-20 07:33:56 -06:00
d4f0f17b02 [Doc] Update quantization supported hardware table (#7595) 2024-08-16 13:59:27 -07:00