|
|
ac04a97a9f
|
[Frontend] Add max_tokens prometheus metric (#9881)
Signed-off-by: Tomer Asida <tomera@ai21.com>
|
2024-11-04 22:53:24 +00:00 |
|
|
|
1c45f4c385
|
[CI] Basic Integration Test For TPU (#9968)
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-11-04 11:34:26 -08:00 |
|
|
|
ba0d892074
|
[Frontend] Use a proper chat template for VLM2Vec (#9912)
|
2024-11-01 14:09:07 +00:00 |
|
|
|
06386a64dd
|
[Frontend] Chat-based Embeddings API (#9759)
|
2024-11-01 08:13:35 +00:00 |
|
|
|
031a7995f3
|
[Bugfix][Frontend] Reject guided decoding in multistep mode (#9892)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-11-01 01:09:46 +00:00 |
|
|
|
abbfb6134d
|
[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837)
|
2024-10-30 18:15:56 -07:00 |
|
|
|
67bdf8e523
|
[Bugfix][Frontend] Guard against bad token ids (#9634)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-10-29 14:13:20 -07:00 |
|
|
|
ef7865b4f9
|
[Frontend] re-enable multi-modality input in the new beam search implementation (#9427)
Signed-off-by: Qishuai Ferdinandzhong@gmail.com
|
2024-10-29 11:49:47 +00:00 |
|
|
|
33bab41060
|
[Bugfix]: Make chat content text allow type content (#9358)
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
|
2024-10-24 05:05:49 +00:00 |
|
|
|
150b779081
|
[Frontend] Enable Online Multi-image Support for MLlama (#9393)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-10-23 17:28:57 +00:00 |
|
|
|
c0292211ce
|
[CI/Build] Replaced some models on tests for smaller ones (#9570)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
|
2024-10-22 04:52:14 +00:00 |
|
|
|
76a5e13270
|
[core] move parallel sampling out from vllm core (#9302)
|
2024-10-22 00:31:44 +00:00 |
|
|
|
5b59fe0f08
|
[Bugfix] Pass json-schema to GuidedDecodingParams and make test stronger (#9530)
|
2024-10-20 00:05:02 +00:00 |
|
|
|
c5eea3c8ba
|
[Frontend] Support simpler image input format (#9478)
|
2024-10-18 23:17:07 -07:00 |
|
|
|
337ed76671
|
[Bugfix] Fix offline mode when using mistral_common (#9457)
|
2024-10-18 18:12:32 -07:00 |
|
|
|
d11bf435a0
|
[MISC] Consolidate cleanup() and refactor offline_inference_with_prefix.py (#9510)
|
2024-10-18 14:30:55 -07:00 |
|
|
|
051eaf6db3
|
[Model] Add user-configurable task for models that support both generation and embedding (#9424)
|
2024-10-18 11:31:58 -07:00 |
|
|
|
de4008e2ab
|
[Bugfix][Core] Use torch.cuda.memory_stats() to profile peak memory usage (#9352)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-10-17 22:47:27 -04:00 |
|
|
|
ba30942240
|
[Bugfix] Fix vLLM UsageInfo and logprobs None AssertionError with empty token_ids (#9034)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-15 15:40:43 -07:00 |
|
|
|
e9d517f276
|
[BugFix] Fix chat API continuous usage stats (#9357)
|
2024-10-14 23:19:48 -07:00 |
|
|
|
cbc2ef5529
|
[misc] hide best_of from engine (#9261)
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>
|
2024-10-10 21:30:44 -07:00 |
|
|
|
9a94ca4a5d
|
[Bugfix] fix OpenAI API server startup with --disable-frontend-multiprocessing (#8537)
|
2024-10-08 09:38:40 -07:00 |
|
|
|
069d3bd8d0
|
[Frontend] Add Early Validation For Chat Template / Tool Call Parser (#9151)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-10-08 14:31:26 +00:00 |
|
|
|
8c746226c9
|
[Frontend] API support for beam search for MQLLMEngine (#9117)
|
2024-10-08 05:51:43 +00:00 |
|
|
|
168cab6bbf
|
[Frontend] API support for beam search (#9087)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-10-05 23:39:03 -07:00 |
|
|
|
0dcc8cbe5a
|
Adds truncate_prompt_tokens param for embeddings creation (#8999)
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
|
2024-10-04 18:31:40 +00:00 |
|
|
|
26aa325f4f
|
[Core][VLM] Test registration for OOT multimodal models (#8717)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-04 10:38:25 -07:00 |
|
|
|
062c89e7c9
|
[Frontend][Core] Move guided decoding params into sampling params (#8252)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-01 09:34:25 +08:00 |
|
|
|
6c9ba48fde
|
[Frontend] Added support for HF's new continue_final_message parameter (#8942)
|
2024-09-29 17:59:47 +00:00 |
|
|
|
3b00b9c26c
|
[Core] renamePromptInputs and inputs (#8876)
|
2024-09-26 20:35:15 -07:00 |
|
|
|
4b377d6feb
|
[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829)
|
2024-09-26 16:46:43 -07:00 |
|
|
|
4f1ba0844b
|
Revert "rename PromptInputs and inputs with backward compatibility (#8760) (#8810)
|
2024-09-25 10:36:26 -07:00 |
|
|
|
28e1299e60
|
rename PromptInputs and inputs with backward compatibility (#8760)
|
2024-09-25 09:36:47 -07:00 |
|
|
|
2529d09b5a
|
[Frontend] Batch inference for llm.chat() API (#8648)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-09-24 09:44:11 -07:00 |
|
|
|
1a2aef3e59
|
Add output streaming support to multi-step + async while ensuring RequestOutput obj reuse (#8335)
|
2024-09-23 15:38:04 -07:00 |
|
|
|
260d40b5ea
|
[Core] Support Lora lineage and base model metadata management (#6315)
|
2024-09-20 06:20:56 +00:00 |
|
|
|
7c7714d856
|
[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH (#8157)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-09-18 13:56:58 +00:00 |
|
|
|
f2e263b801
|
[Bugfix] Offline mode fix (#8376)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-09-12 11:11:57 -07:00 |
|
|
|
cea95dfb94
|
[Frontend] Create ErrorResponse instead of raising exceptions in run_batch (#8347)
|
2024-09-11 05:30:11 +00:00 |
|
|
|
db3bf7c991
|
[Core] Support load and unload LoRA in api server (#6566)
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-09-05 18:10:33 -07:00 |
|
|
|
855c262a6b
|
[Frontend] Multimodal support in offline chat (#8098)
|
2024-09-04 05:22:17 +00:00 |
|
|
|
5231f0898e
|
[Frontend][VLM] Add support for multiple multi-modal items (#8049)
|
2024-08-31 16:35:53 -07:00 |
|
|
|
39178c7fbc
|
[Tests] Disable retries and use context manager for openai client (#7565)
|
2024-08-26 21:33:17 -07:00 |
|
|
|
0b769992ec
|
[Bugfix]: Use float32 for base64 embedding (#7855)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
|
2024-08-26 03:16:38 +00:00 |
|
|
|
7d9ffa2ae1
|
[misc][core] lazy import outlines (#7831)
|
2024-08-24 00:51:38 -07:00 |
|
|
|
d81abefd2e
|
[Frontend] add json_schema support from OpenAI protocol (#7654)
|
2024-08-23 23:07:24 -07:00 |
|
|
|
8da48e4d95
|
[Frontend] Publish Prometheus metrics in run_batch API (#7641)
|
2024-08-23 23:04:22 -07:00 |
|
|
|
e25fee57c2
|
[BugFix] Fix server crash on empty prompt (#7746)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-08-23 13:12:44 +00:00 |
|
|
|
b903e1ba7f
|
[Frontend] error suppression cleanup (#7786)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-08-22 21:50:21 +00:00 |
|
|
|
cde9183b40
|
[Bug][Frontend] Improve ZMQ client robustness (#7443)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-08-22 02:18:11 +00:00 |
|