|
|
3b63de9353
|
[Model] Add OLMoE (#7922)
|
2024-09-20 09:31:41 -07:00 |
|
|
|
260d40b5ea
|
[Core] Support Lora lineage and base model metadata management (#6315)
|
2024-09-20 06:20:56 +00:00 |
|
|
|
ea4647b7d7
|
[Doc] Add documentation for GGUF quantization (#8618)
|
2024-09-19 13:15:55 -06:00 |
|
|
|
e18749ff09
|
[Model] Support Solar Model (#8386)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-09-18 11:04:00 -06:00 |
|
|
|
7c7714d856
|
[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH (#8157)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-09-18 13:56:58 +00:00 |
|
|
|
fa0c114fad
|
[doc] improve installation doc (#8550)
Co-authored-by: Andy Dai <76841985+Imss27@users.noreply.github.com>
|
2024-09-17 16:24:06 -07:00 |
|
|
|
2759a43a26
|
[doc] update doc on testing and debugging (#8514)
|
2024-09-16 12:10:23 -07:00 |
|
|
|
8a0cf1ddc3
|
[Model] support minicpm3 (#8297)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-14 14:50:26 +00:00 |
|
|
|
f57092c00b
|
[Doc] Add oneDNN installation to CPU backend documentation (#8467)
|
2024-09-13 18:06:30 +00:00 |
|
|
|
a84e598e21
|
[CI/Build] Reorganize models tests (#7820)
|
2024-09-13 10:20:06 -07:00 |
|
|
|
cab69a15e4
|
[doc] recommend pip instead of conda (#8446)
|
2024-09-12 23:52:41 -07:00 |
|
|
|
c6202daeed
|
[Model] Support multiple images for qwen-vl (#8247)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-12 10:10:54 -07:00 |
|
|
|
d394787e52
|
Pixtral (#8377)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-09-11 14:41:55 -07:00 |
|
|
|
3b7fea770f
|
[Model][VLM] Add Qwen2-VL model support (#7905)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-11 09:31:19 -07:00 |
|
|
|
6a512a00df
|
[model] Support for Llava-Next-Video model (#7559)
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-09-10 22:21:36 -07:00 |
|
|
|
a1d874224d
|
Add NVIDIA Meetup slides, announce AMD meetup, and add contact info (#8319)
|
2024-09-09 23:21:00 -07:00 |
|
|
|
e807125936
|
[Model][VLM] Support multi-images inputs for InternVL2 models (#8201)
|
2024-09-07 16:38:23 +08:00 |
|
|
|
2f707fcb35
|
[Model] Multi-input support for LLaVA (#8238)
|
2024-09-07 02:57:24 +00:00 |
|
|
|
12dd715807
|
[misc] [doc] [frontend] LLM torch profiler support (#7943)
|
2024-09-06 17:48:48 -07:00 |
|
|
|
23f322297f
|
[Misc] Remove SqueezeLLM (#8220)
|
2024-09-06 16:29:03 -06:00 |
|
|
|
db3bf7c991
|
[Core] Support load and unload LoRA in api server (#6566)
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-09-05 18:10:33 -07:00 |
|
|
|
2febcf2777
|
[Documentation][Spec Decode] Add documentation about lossless guarantees in Speculative Decoding in vLLM (#7962)
|
2024-09-05 16:25:29 -04:00 |
|
|
|
9da25a88aa
|
[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-05 12:48:10 +00:00 |
|
|
|
288a938872
|
[Doc] Indicate more information about supported modalities (#8181)
|
2024-09-05 10:51:53 +00:00 |
|
|
|
e02ce498be
|
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
|
2024-09-04 13:18:13 -07:00 |
|
|
|
61f4a93d14
|
[TPU][Bugfix] Use XLA rank for persistent cache path (#8137)
|
2024-09-03 18:35:33 -07:00 |
|
|
|
1248e8506a
|
[Model] Adding support for MSFT Phi-3.5-MoE (#7729)
Co-authored-by: Your Name <you@example.com>
Co-authored-by: Zeqi Lin <zelin@microsoft.com>
Co-authored-by: Zeqi Lin <Zeqi.Lin@microsoft.com>
|
2024-08-30 13:42:57 -06:00 |
|
|
|
058344f89a
|
[Frontend]-config-cli-args (#7737)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Kaunil Dhruv <kaunil_dhruv@intuit.com>
|
2024-08-30 08:21:02 -07:00 |
|
|
|
dc13e99348
|
[MODEL] add Exaone model support (#7819)
|
2024-08-29 23:34:20 -07:00 |
|
|
|
8c56e57def
|
[Doc] fix 404 link (#7966)
|
2024-08-28 13:54:23 -07:00 |
|
|
|
eeffde1ac0
|
[TPU] Upgrade PyTorch XLA nightly (#7967)
|
2024-08-28 13:10:21 -07:00 |
|
|
|
98c12cffe5
|
[Doc] fix the autoAWQ example (#7937)
|
2024-08-28 12:12:32 +00:00 |
|
|
|
fab5f53e2d
|
[Core][VLM] Stack multimodal tensors to represent multiple images within each prompt (#7902)
|
2024-08-28 01:53:56 +00:00 |
|
|
|
57792ed469
|
[Doc] Fix incorrect docs from #7615 (#7788)
|
2024-08-22 10:02:06 -07:00 |
|
|
|
df1a21131d
|
[Model] Fix Phi-3.5-vision-instruct 'num_crops' issue (#7710)
|
2024-08-22 09:36:24 +08:00 |
|
|
|
1ca0d4f86b
|
[Model] Add UltravoxModel and UltravoxConfig (#7615)
|
2024-08-21 22:49:39 +00:00 |
|
|
|
dd53c4b023
|
[misc] Add Torch profiler support (#7451)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-08-21 15:39:26 -07:00 |
|
|
|
4506641212
|
[Doc] Section for Multimodal Language Models (#7719)
|
2024-08-20 23:24:01 -07:00 |
|
|
|
398521ad19
|
[OpenVINO] Updated documentation (#7687)
|
2024-08-20 07:33:56 -06:00 |
|
|
|
d4f0f17b02
|
[Doc] Update quantization supported hardware table (#7595)
|
2024-08-16 13:59:27 -07:00 |
|
|
|
b3f4e17935
|
[Doc] Add docs for llmcompressor INT8 and FP8 checkpoints (#7444)
|
2024-08-16 13:59:16 -07:00 |
|
|
|
22b39e11f2
|
llama_index serving integration documentation (#6973)
Co-authored-by: pavanmantha <pavan.mantha@thevaslabs.io>
|
2024-08-14 15:38:37 -07:00 |
|
|
|
3f674a49b5
|
[VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126)
|
2024-08-14 17:55:42 +00:00 |
|
|
|
199adbb7cf
|
[doc] update test script to include cudagraph (#7501)
|
2024-08-13 21:52:58 -07:00 |
|
|
|
dd164d72f3
|
[Bugfix][Docs] Update list of mock imports (#7493)
|
2024-08-13 20:37:30 -07:00 |
|
|
|
a08df8322e
|
[TPU] Support multi-host inference (#7457)
|
2024-08-13 16:31:20 -07:00 |
|
|
|
00c3d68e45
|
[Frontend][Core] Add plumbing to support audio language models (#7446)
|
2024-08-13 17:39:33 +00:00 |
|
|
|
e20233d361
|
Revert "[Doc] Update supported_hardware.rst (#7276)" (#7467)
|
2024-08-13 01:37:08 -07:00 |
|
|
|
a046f86397
|
[Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208)
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-08-12 22:47:41 +00:00 |
|
|
|
e6e42e4b17
|
[Core][VLM] Support image embeddings as input (#6613)
|
2024-08-12 16:16:06 +08:00 |
|