|
|
2529d09b5a
|
[Frontend] Batch inference for llm.chat() API (#8648)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-09-24 09:44:11 -07:00 |
|
|
|
8ff7ced996
|
[Model] Expose Phi3v num_crops as a mm_processor_kwarg (#8658)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-24 07:36:46 +00:00 |
|
|
|
5b59532760
|
[Model][VLM] Add LLaVA-Onevision model support (#8486)
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-22 10:51:44 -07:00 |
|
|
|
8ca5051b9a
|
[Misc] Use NamedTuple in Multi-image example (#8705)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-09-22 20:56:20 +08:00 |
|
|
|
a54ed80249
|
[Model] Add mistral function calling format to all models loaded with "mistral" format (#8515)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-09-17 17:50:37 +00:00 |
|
|
|
ba77527955
|
[bugfix] torch profiler bug for single gpu with GPUExecutor (#8354)
|
2024-09-12 21:30:00 -07:00 |
|
|
|
360ddbd37e
|
[Misc] Update Pixtral example (#8431)
|
2024-09-12 17:31:18 -07:00 |
|
|
|
c6202daeed
|
[Model] Support multiple images for qwen-vl (#8247)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-12 10:10:54 -07:00 |
|
|
|
d394787e52
|
Pixtral (#8377)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-09-11 14:41:55 -07:00 |
|
|
|
3b7fea770f
|
[Model][VLM] Add Qwen2-VL model support (#7905)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-11 09:31:19 -07:00 |
|
|
|
6a512a00df
|
[model] Support for Llava-Next-Video model (#7559)
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-09-10 22:21:36 -07:00 |
|
|
|
efcf946a15
|
[Hardware][NV] Add support for ModelOpt static scaling checkpoints. (#6112)
|
2024-09-11 00:38:40 -04:00 |
|
|
|
e807125936
|
[Model][VLM] Support multi-images inputs for InternVL2 models (#8201)
|
2024-09-07 16:38:23 +08:00 |
|
|
|
41e95c5247
|
[Bugfix] Fix Hermes tool call chat template bug (#8256)
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
|
2024-09-07 10:49:01 +08:00 |
|
|
|
12dd715807
|
[misc] [doc] [frontend] LLM torch profiler support (#7943)
|
2024-09-06 17:48:48 -07:00 |
|
|
|
23f322297f
|
[Misc] Remove SqueezeLLM (#8220)
|
2024-09-06 16:29:03 -06:00 |
|
|
|
9da25a88aa
|
[MODEL] Qwen Multimodal Support (Qwen-VL / Qwen-VL-Chat) (#8029)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-05 12:48:10 +00:00 |
|
|
|
288a938872
|
[Doc] Indicate more information about supported modalities (#8181)
|
2024-09-05 10:51:53 +00:00 |
|
|
|
008cf886c9
|
[Neuron] Adding support for adding/ overriding neuron configuration a… (#8062)
Co-authored-by: Harsha Bikki <harbikh@amazon.com>
|
2024-09-04 16:33:43 -07:00 |
|
|
|
e02ce498be
|
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
|
2024-09-04 13:18:13 -07:00 |
|
|
|
2be8ec6e71
|
[Model] Add Ultravox support for multiple audio chunks (#7963)
|
2024-09-04 04:38:21 +00:00 |
|
|
|
5231f0898e
|
[Frontend][VLM] Add support for multiple multi-modal items (#8049)
|
2024-08-31 16:35:53 -07:00 |
|
|
|
257afc37c5
|
[Neuron] Adding support for context-lenght, token-gen buckets. (#7885)
Co-authored-by: Harsha Bikki <harbikh@amazon.com>
|
2024-08-29 13:58:14 -07:00 |
|
|
|
0b769992ec
|
[Bugfix]: Use float32 for base64 embedding (#7855)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
|
2024-08-26 03:16:38 +00:00 |
|
|
|
57792ed469
|
[Doc] Fix incorrect docs from #7615 (#7788)
|
2024-08-22 10:02:06 -07:00 |
|
|
|
1ca0d4f86b
|
[Model] Add UltravoxModel and UltravoxConfig (#7615)
|
2024-08-21 22:49:39 +00:00 |
|
|
|
2aa00d59ad
|
[CI/Build] Pin OpenTelemetry versions and make errors clearer (#7266)
[CI/Build] Pin OpenTelemetry versions and make a availability errors clearer (#7266)
|
2024-08-20 10:02:21 -07:00 |
|
|
|
3b19e39dc5
|
Chat method for offline llm (#5049)
Co-authored-by: nunjunj <ray@g-3ff9f30f2ed650001.c.vllm-405802.internal>
Co-authored-by: nunjunj <ray@g-1df6075697c3f0001.c.vllm-405802.internal>
Co-authored-by: nunjunj <ray@g-c5a2c23abc49e0001.c.vllm-405802.internal>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-08-15 19:41:34 -07:00 |
|
|
|
249b88228d
|
[Frontend] Support embeddings in the run_batch API (#7132)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-08-09 09:48:21 -07:00 |
|
|
|
67abdbb42f
|
[VLM][Doc] Add stop_token_ids to InternVL example (#7354)
|
2024-08-09 14:51:04 +00:00 |
|
|
|
7eb4a51c5f
|
[Core] Support serving encoder/decoder models (#7258)
|
2024-08-09 10:39:41 +08:00 |
|
|
|
757ac70a64
|
[Model] Rename MiniCPMVQwen2 to MiniCPMV2.6 (#7273)
|
2024-08-08 14:02:41 +00:00 |
|
|
|
fd95e026e0
|
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942)
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-06 16:51:47 -04:00 |
|
|
|
360bd67cf0
|
[Core] Support loading GGUF model (#5191)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-05 17:54:23 -06:00 |
|
|
|
c0d8f1636c
|
[Model] SiglipVisionModel ported from transformers (#6942)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-08-05 06:22:12 +00:00 |
|
|
|
7cbd9ec7a9
|
[Model] Initialize support for InternVL2 series models (#6514)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-29 10:16:30 +00:00 |
|
|
|
1ad86acf17
|
[Model] Initial support for BLIP-2 (#5920)
Co-authored-by: ywang96 <ywang@roblox.com>
|
2024-07-27 11:53:07 +00:00 |
|
|
|
a57d75821c
|
[bugfix] make args.stream work (#6831)
|
2024-07-27 09:07:02 +00:00 |
|
|
|
925de97e05
|
[Bugfix] Fix VLM example typo (#6859)
|
2024-07-27 14:24:08 +08:00 |
|
|
|
aa46953a20
|
[Misc][VLM][Doc] Consolidate offline examples for vision language models (#6858)
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-07-26 22:44:13 -07:00 |
|
|
|
b5f49ee55b
|
Update README.md (#6847)
|
2024-07-27 00:26:45 +00:00 |
|
|
|
b75e314fff
|
[Bugfix] Add image placeholder for OpenAI Compatible Server of MiniCPM-V (#6787)
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-07-25 09:42:49 -07:00 |
|
|
|
316a41ac1d
|
[Bugfix] Fix encoding_format in examples/openai_embedding_client.py (#6755)
|
2024-07-24 22:48:07 -07:00 |
|
|
|
9e169a4c61
|
[Model] Adding support for MiniCPM-V (#4087)
|
2024-07-24 20:59:30 -07:00 |
|
|
|
c051bfe4eb
|
[doc][distributed] doc for setting up multi-node environment (#6529)
[doc][distributed] add more doc for setting up multi-node environment (#6529)
|
2024-07-22 21:22:09 -07:00 |
|
|
|
1c27d25fb5
|
[core][model] yet another cpu offload implementation (#6496)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-07-17 20:54:35 -07:00 |
|
|
|
5bf35a91e4
|
[Doc][CI/Build] Update docs and tests to use vllm serve (#6431)
|
2024-07-17 07:43:21 +00:00 |
|
|
|
d97011512e
|
[CI/Build] vLLM cache directory for images (#6444)
|
2024-07-15 23:12:25 -07:00 |
|
|
|
4552e37b55
|
[CI/Build][TPU] Add TPU CI test (#6277)
Co-authored-by: kevin <kevin@anyscale.com>
|
2024-07-15 14:31:16 -07:00 |
|
|
|
540c0368b1
|
[Model] Initialize Fuyu-8B support (#3924)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-14 05:27:14 +00:00 |
|