|
|
4db72e57f6
|
[Bugfix][Refactor] Unify model management in frontend (#11660)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-01-01 02:21:51 +00:00 |
|
|
|
74fa1d123c
|
[Bugfix] Fix OpenAI parallel sampling when using xgrammar (#11637)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-31 03:43:54 +00:00 |
|
|
|
101418096f
|
[VLM] Support caching in merged multi-modal processor (#11396)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-27 17:22:48 +00:00 |
|
|
|
7af553ea30
|
[Misc] Abstract the logic for reading and writing media content (#11527)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-27 19:21:23 +08:00 |
|
|
|
9edca6bf8f
|
[Frontend] Online Pooling API (#11457)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-24 17:54:30 +08:00 |
|
|
|
63afbe9215
|
[CI] Expand OpenAI test_chat.py guided decoding tests (#11048)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-23 18:35:38 +00:00 |
|
|
|
5bfb30a529
|
[Bugfix] Fix CFGGuide and use outlines for grammars that can't convert to GBNF (#11389)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-23 23:06:20 +08:00 |
|
|
|
29c748930e
|
[CI] Fix flaky entrypoint tests (#11403)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-12-21 21:08:44 -08:00 |
|
|
|
5aef49806d
|
[Feature] Add load generation config from model (#11164)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
Signed-off-by: Yanyi Liu <wolfsonliu@163.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-12-19 10:50:38 +00:00 |
|
|
|
a30482f054
|
[CI] Expand test_guided_generate to test all backends (#11313)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-19 04:00:38 +00:00 |
|
|
|
c77eb8a33c
|
[Bugfix] Set temperature=0.7 in test_guided_choice_chat (#11264)
|
2024-12-17 16:34:06 -08:00 |
|
|
|
2d1b9baa8f
|
[Bugfix] Fix request cancellation without polling (#11190)
|
2024-12-17 12:26:32 -08:00 |
|
|
|
66d4b16724
|
[Frontend] Add OpenAI API support for input_audio (#11027)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-16 22:09:58 -08:00 |
|
|
|
0064f697d3
|
[CI] Add test case with JSON schema using references + use xgrammar by default with OpenAI parse (#10935)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-17 11:39:58 +08:00 |
|
|
|
551603feff
|
[core] overhaul memory profiling and fix backward compatibility (#10511)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-16 13:32:25 -08:00 |
|
|
|
d927dbcd88
|
[Model] Refactor Ultravox to use merged input processor (#11198)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-16 10:09:53 +00:00 |
|
|
|
9c3dadd1c9
|
[Frontend] Add logits_processors as an extra completion argument (#11150)
Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com>
|
2024-12-14 16:46:42 +00:00 |
|
|
|
0920ab9131
|
[Doc] Reorganize online pooling APIs (#11172)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-14 00:22:22 +08:00 |
|
|
|
eeec9e3390
|
[Frontend] Separate pooling APIs in offline inference (#11129)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-13 10:40:07 +00:00 |
|
|
|
85362f028c
|
[Misc][LoRA] Ensure Lora Adapter requests return adapter name (#11094)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-12 09:25:16 +00:00 |
|
|
|
8f10d5e393
|
[Misc] Split up pooling tasks (#10820)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-11 01:28:00 -08:00 |
|
|
|
a811dd6608
|
[Model] merged input processor for Phi-3-Vision models (#10977)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-09 12:55:10 -08:00 |
|
|
|
8d370e91cb
|
[Bugfix] Fallback to outlines for complex json schemas (#10899)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-05 11:14:06 +08:00 |
|
|
|
9323a3153b
|
[Core][Performance] Add XGrammar support for guided decoding and set it as default (#10785)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-12-03 15:17:00 +08:00 |
|
|
|
d2f058e76c
|
[Misc] Rename embedding classes to pooling (#10801)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-01 14:36:51 +08:00 |
|
|
|
395b1c7454
|
[Frontend] don't block event loop in tokenization (preprocess) in OpenAI compatible server (#10635)
Signed-off-by: Tomer Asida <tomera@ai21.com>
|
2024-11-27 13:21:10 -08:00 |
|
|
|
308cc5e21e
|
[ci] fix slow tests (#10698)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-27 09:26:14 -08:00 |
|
|
|
334d64d1e8
|
[ci] add vllm_test_utils (#10659)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-26 00:20:04 -08:00 |
|
|
|
d04b13a380
|
[Bug]: Authorization ignored when root_path is set (#10606)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2024-11-25 16:21:41 +00:00 |
|
|
|
214efc2c3c
|
Support Cross encoder models (#10400)
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
|
2024-11-24 18:56:20 -08:00 |
|
|
|
7d8ffb344f
|
[Bugfix] Internal Server Error when tool_choice is incorrect. (#10567)
Signed-off-by: Varun Shenoy <varun.vinayak.shenoy@oracle.com>
|
2024-11-22 21:13:29 -08:00 |
|
|
|
9195dbdbca
|
[Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use (#10164)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-11-23 10:17:38 +08:00 |
|
|
|
da7e702c6f
|
[Bug]: When apply continue_final_message for OpenAI server, the "echo":false is ignored (#10180)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2024-11-21 16:24:32 +00:00 |
|
|
|
c68f7ede6a
|
[Bugfix]: allow extra fields in requests to openai compatible server (#10463)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2024-11-20 16:42:21 -05:00 |
|
|
|
32e46e000f
|
[Frontend] Automatic detection of chat content format from AST (#9919)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-16 13:35:40 +08:00 |
|
|
|
f67ce05d0b
|
[Frontend] Pythonic tool parser (#9859)
Signed-off-by: Mike Depinet <mike@fixie.ai>
|
2024-11-14 04:14:34 +00:00 |
|
|
|
6ace6fba2c
|
[V1] AsyncLLM Implementation (#9826)
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-11-11 23:05:38 +00:00 |
|
|
|
28b2877d30
|
Online video support for VLMs (#10020)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: litianjian <litianjian@bytedance.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-07 20:25:59 +00:00 |
|
|
|
d58268c56a
|
[V1] Make v1 more testable (#9888)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-11-06 11:57:35 -08:00 |
|
|
|
ac04a97a9f
|
[Frontend] Add max_tokens prometheus metric (#9881)
Signed-off-by: Tomer Asida <tomera@ai21.com>
|
2024-11-04 22:53:24 +00:00 |
|
|
|
1c45f4c385
|
[CI] Basic Integration Test For TPU (#9968)
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-11-04 11:34:26 -08:00 |
|
|
|
ba0d892074
|
[Frontend] Use a proper chat template for VLM2Vec (#9912)
|
2024-11-01 14:09:07 +00:00 |
|
|
|
06386a64dd
|
[Frontend] Chat-based Embeddings API (#9759)
|
2024-11-01 08:13:35 +00:00 |
|
|
|
031a7995f3
|
[Bugfix][Frontend] Reject guided decoding in multistep mode (#9892)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-11-01 01:09:46 +00:00 |
|
|
|
abbfb6134d
|
[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837)
|
2024-10-30 18:15:56 -07:00 |
|
|
|
67bdf8e523
|
[Bugfix][Frontend] Guard against bad token ids (#9634)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-10-29 14:13:20 -07:00 |
|
|
|
ef7865b4f9
|
[Frontend] re-enable multi-modality input in the new beam search implementation (#9427)
Signed-off-by: Qishuai Ferdinandzhong@gmail.com
|
2024-10-29 11:49:47 +00:00 |
|
|
|
33bab41060
|
[Bugfix]: Make chat content text allow type content (#9358)
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
|
2024-10-24 05:05:49 +00:00 |
|
|
|
150b779081
|
[Frontend] Enable Online Multi-image Support for MLlama (#9393)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-10-23 17:28:57 +00:00 |
|
|
|
c0292211ce
|
[CI/Build] Replaced some models on tests for smaller ones (#9570)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
|
2024-10-22 04:52:14 +00:00 |
|