233df6f5c4
[V1][Metrics] Add request_success_total counter, labelled with finish reason ( #12579 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-02-04 19:46:54 -05:00
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files ( #12628 )
...
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**
commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:18:24 2025 -0500
Add SPDX license headers to python source files
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
also be easily used by tools to help manage license compliance.
The Linux Foundation runs license scans against the codebase to help
ensure
we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
More information can be found on the SPDX site:
- https://spdx.dev/learn/handling-license-info/
Signed-off-by: Russell Bryant <rbryant@redhat.com >
commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:36:32 2025 -0500
Check for SPDX headers using pre-commit
Signed-off-by: Russell Bryant <rbryant@redhat.com >
---------
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-02-02 11:58:18 -08:00
f17f1d4608
[V1][Metrics] Add GPU cache usage % gauge ( #12561 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-01-29 18:31:01 -08:00
46fb056749
[V1][Metrics] Add TTFT and TPOT histograms ( #12530 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-01-29 04:11:16 +00:00
a7e3eba66f
[Frontend] Support reasoning content for deepseek r1 ( #12473 )
...
Signed-off-by: Ce Gao <cegao@tensorchord.ai >
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Michael Goin <mgoin@redhat.com >
2025-01-29 11:38:08 +08:00
c386c43ca3
[V1][Metrics] Add per-request prompt/generation_tokens histograms ( #12516 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-01-28 22:07:22 +00:00
3fd1fb63ef
[V1][Metrics] Hook up IterationStats for Prometheus metrics ( #12478 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-01-28 16:38:38 +00:00
01ba927040
[V1][Metrics] Add initial Prometheus logger ( #12416 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-01-27 12:26:28 -05:00
0cc6b383d7
[Frontend] Support scores endpoint in run_batch ( #12430 )
...
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io >
2025-01-27 04:30:17 +00:00
0034b09ceb
[Frontend] Rerank API (Jina- and Cohere-compatible API) ( #12376 )
...
Signed-off-by: Kyle Mistele <kyle@mistele.com >
2025-01-26 19:58:45 -07:00
9ddc35220b
[Frontend] generation_config.json for maximum tokens( #12242 )
...
Signed-off-by: Matthew Hendrey <matthew.hendrey@gmail.com >
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
Co-authored-by: shangmingc <caishangming@linux.alibaba.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-01-26 19:59:25 +08:00
58fd57ff1d
[Bugfix] Fix score api for missing max_model_len validation ( #12119 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com >
2025-01-17 16:24:22 +00:00
07934cc237
[Misc][LoRA] Improve the readability of LoRA error messages ( #12102 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-01-17 19:32:28 +08:00
ac2f3f7fee
[Bugfix] Validate lora adapters to avoid crashing server ( #11727 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-01-10 15:56:36 +08:00
1fe554bac3
treat do_lower_case in the same way as the sentence-transformers library ( #11815 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
2025-01-09 11:05:43 +08:00
4db72e57f6
[Bugfix][Refactor] Unify model management in frontend ( #11660 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2025-01-01 02:21:51 +00:00
74fa1d123c
[Bugfix] Fix OpenAI parallel sampling when using xgrammar ( #11637 )
...
Signed-off-by: mgoin <michael@neuralmagic.com >
2024-12-31 03:43:54 +00:00
101418096f
[VLM] Support caching in merged multi-modal processor ( #11396 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-12-27 17:22:48 +00:00
7af553ea30
[Misc] Abstract the logic for reading and writing media content ( #11527 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-12-27 19:21:23 +08:00
9edca6bf8f
[Frontend] Online Pooling API ( #11457 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-12-24 17:54:30 +08:00
63afbe9215
[CI] Expand OpenAI test_chat.py guided decoding tests ( #11048 )
...
Signed-off-by: mgoin <michael@neuralmagic.com >
2024-12-23 18:35:38 +00:00
29c748930e
[CI] Fix flaky entrypoint tests ( #11403 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2024-12-21 21:08:44 -08:00
5aef49806d
[Feature] Add load generation config from model ( #11164 )
...
Signed-off-by: liuyanyi <wolfsonliu@163.com >
Signed-off-by: Yanyi Liu <wolfsonliu@163.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2024-12-19 10:50:38 +00:00
c77eb8a33c
[Bugfix] Set temperature=0.7 in test_guided_choice_chat ( #11264 )
2024-12-17 16:34:06 -08:00
2d1b9baa8f
[Bugfix] Fix request cancellation without polling ( #11190 )
2024-12-17 12:26:32 -08:00
66d4b16724
[Frontend] Add OpenAI API support for input_audio ( #11027 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-12-16 22:09:58 -08:00
d927dbcd88
[Model] Refactor Ultravox to use merged input processor ( #11198 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2024-12-16 10:09:53 +00:00
9c3dadd1c9
[Frontend] Add logits_processors as an extra completion argument ( #11150 )
...
Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com >
2024-12-14 16:46:42 +00:00
0920ab9131
[Doc] Reorganize online pooling APIs ( #11172 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-12-14 00:22:22 +08:00
eeec9e3390
[Frontend] Separate pooling APIs in offline inference ( #11129 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-12-13 10:40:07 +00:00
85362f028c
[Misc][LoRA] Ensure Lora Adapter requests return adapter name ( #11094 )
...
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2024-12-12 09:25:16 +00:00
8f10d5e393
[Misc] Split up pooling tasks ( #10820 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-12-11 01:28:00 -08:00
a811dd6608
[Model] merged input processor for Phi-3-Vision models ( #10977 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2024-12-09 12:55:10 -08:00
395b1c7454
[Frontend] don't block event loop in tokenization (preprocess) in OpenAI compatible server ( #10635 )
...
Signed-off-by: Tomer Asida <tomera@ai21.com >
2024-11-27 13:21:10 -08:00
d04b13a380
[Bug]: Authorization ignored when root_path is set ( #10606 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2024-11-25 16:21:41 +00:00
214efc2c3c
Support Cross encoder models ( #10400 )
...
Signed-off-by: Max de Bayser <maxdebayser@gmail.com >
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Signed-off-by: Flavia Beo <flavia.beo@ibm.com >
Co-authored-by: Flavia Beo <flavia.beo@ibm.com >
2024-11-24 18:56:20 -08:00
7d8ffb344f
[Bugfix] Internal Server Error when tool_choice is incorrect. ( #10567 )
...
Signed-off-by: Varun Shenoy <varun.vinayak.shenoy@oracle.com >
2024-11-22 21:13:29 -08:00
da7e702c6f
[Bug]: When apply continue_final_message for OpenAI server, the "echo":false is ignored ( #10180 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2024-11-21 16:24:32 +00:00
c68f7ede6a
[Bugfix]: allow extra fields in requests to openai compatible server ( #10463 )
...
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
2024-11-20 16:42:21 -05:00
32e46e000f
[Frontend] Automatic detection of chat content format from AST ( #9919 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-11-16 13:35:40 +08:00
f67ce05d0b
[Frontend] Pythonic tool parser ( #9859 )
...
Signed-off-by: Mike Depinet <mike@fixie.ai >
2024-11-14 04:14:34 +00:00
6ace6fba2c
[V1] AsyncLLM Implementation ( #9826 )
...
Signed-off-by: Nick Hill <nickhill@us.ibm.com >
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nickhill@us.ibm.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2024-11-11 23:05:38 +00:00
28b2877d30
Online video support for VLMs ( #10020 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: litianjian <litianjian@bytedance.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-11-07 20:25:59 +00:00
ac04a97a9f
[Frontend] Add max_tokens prometheus metric ( #9881 )
...
Signed-off-by: Tomer Asida <tomera@ai21.com >
2024-11-04 22:53:24 +00:00
1c45f4c385
[CI] Basic Integration Test For TPU ( #9968 )
...
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com >
2024-11-04 11:34:26 -08:00
ba0d892074
[Frontend] Use a proper chat template for VLM2Vec ( #9912 )
2024-11-01 14:09:07 +00:00
06386a64dd
[Frontend] Chat-based Embeddings API ( #9759 )
2024-11-01 08:13:35 +00:00
031a7995f3
[Bugfix][Frontend] Reject guided decoding in multistep mode ( #9892 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2024-11-01 01:09:46 +00:00
abbfb6134d
[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint ( #9837 )
2024-10-30 18:15:56 -07:00
67bdf8e523
[Bugfix][Frontend] Guard against bad token ids ( #9634 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2024-10-29 14:13:20 -07:00