Commit Graph

307 Commits

Author SHA1 Message Date
41c5dd45b9 [V1][Metrics] Add GPU prefix cache hit rate % gauge (#12592) 2025-02-11 08:27:25 +00:00
fc6485d277 [Bugfix]: Reasoning output bug according to the chat template change (#13025)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
2025-02-11 15:49:03 +08:00
08b2d845d6 [Model] Ultravox Model: Support v0.5 Release (#12912)
Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>
2025-02-10 22:02:48 +00:00
ce26b16268 [Misc] Remove unnecessary detokenization in multimodal processing (#12868) 2025-02-07 06:21:17 -08:00
6e1fc61f0f Prevent unecessary requests to huggingface hub (#12837) 2025-02-06 21:37:41 -08:00
75404d041b [VLM] Update compatibility with transformers 4.49 2025-02-05 19:09:45 -08:00
233df6f5c4 [V1][Metrics] Add request_success_total counter, labelled with finish reason (#12579)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-02-04 19:46:54 -05:00
e489ad7a21 [Misc] Add SPDX-License-Identifier headers to python source files (#12628)
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**

commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:18:24 2025 -0500

    Add SPDX license headers to python source files
    
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
    also be easily used by tools to help manage license compliance.
    
The Linux Foundation runs license scans against the codebase to help
ensure
    we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
    
    More information can be found on the SPDX site:
    
    - https://spdx.dev/learn/handling-license-info/
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:36:32 2025 -0500

    Check for SPDX headers using pre-commit
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

---------

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-02 11:58:18 -08:00
f17f1d4608 [V1][Metrics] Add GPU cache usage % gauge (#12561)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-01-29 18:31:01 -08:00
46fb056749 [V1][Metrics] Add TTFT and TPOT histograms (#12530)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-01-29 04:11:16 +00:00
a7e3eba66f [Frontend] Support reasoning content for deepseek r1 (#12473)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
2025-01-29 11:38:08 +08:00
c386c43ca3 [V1][Metrics] Add per-request prompt/generation_tokens histograms (#12516)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-01-28 22:07:22 +00:00
3fd1fb63ef [V1][Metrics] Hook up IterationStats for Prometheus metrics (#12478)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-01-28 16:38:38 +00:00
01ba927040 [V1][Metrics] Add initial Prometheus logger (#12416)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-01-27 12:26:28 -05:00
0cc6b383d7 [Frontend] Support scores endpoint in run_batch (#12430)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
2025-01-27 04:30:17 +00:00
0034b09ceb [Frontend] Rerank API (Jina- and Cohere-compatible API) (#12376)
Signed-off-by: Kyle Mistele <kyle@mistele.com>
2025-01-26 19:58:45 -07:00
9ddc35220b [Frontend] generation_config.json for maximum tokens(#12242)
Signed-off-by: Matthew Hendrey <matthew.hendrey@gmail.com>
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: shangmingc <caishangming@linux.alibaba.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-01-26 19:59:25 +08:00
58fd57ff1d [Bugfix] Fix score api for missing max_model_len validation (#12119)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
2025-01-17 16:24:22 +00:00
87a0c076af [core] allow callable in collective_rpc (#12151)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-17 20:47:01 +08:00
07934cc237 [Misc][LoRA] Improve the readability of LoRA error messages (#12102)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-17 19:32:28 +08:00
d75ab55f10 [Misc] Add deepseek_vl2 chat template (#12143)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-17 06:34:48 +00:00
edce722eaa [Bugfix] use right truncation for non-generative tasks (#12050)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-01-16 00:31:01 +08:00
ac2f3f7fee [Bugfix] Validate lora adapters to avoid crashing server (#11727)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-10 15:56:36 +08:00
9a228348d2 [Misc] Provide correct Pixtral-HF chat template (#11891)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-09 10:19:37 -07:00
1fe554bac3 treat do_lower_case in the same way as the sentence-transformers library (#11815)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-01-09 11:05:43 +08:00
4db72e57f6 [Bugfix][Refactor] Unify model management in frontend (#11660)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-01-01 02:21:51 +00:00
74fa1d123c [Bugfix] Fix OpenAI parallel sampling when using xgrammar (#11637)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-31 03:43:54 +00:00
101418096f [VLM] Support caching in merged multi-modal processor (#11396)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-27 17:22:48 +00:00
7af553ea30 [Misc] Abstract the logic for reading and writing media content (#11527)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-27 19:21:23 +08:00
9edca6bf8f [Frontend] Online Pooling API (#11457)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-24 17:54:30 +08:00
63afbe9215 [CI] Expand OpenAI test_chat.py guided decoding tests (#11048)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-23 18:35:38 +00:00
5bfb30a529 [Bugfix] Fix CFGGuide and use outlines for grammars that can't convert to GBNF (#11389)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-23 23:06:20 +08:00
29c748930e [CI] Fix flaky entrypoint tests (#11403)
Signed-off-by: Roger Wang <ywang@roblox.com>
2024-12-21 21:08:44 -08:00
5aef49806d [Feature] Add load generation config from model (#11164)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
Signed-off-by: Yanyi Liu <wolfsonliu@163.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2024-12-19 10:50:38 +00:00
a30482f054 [CI] Expand test_guided_generate to test all backends (#11313)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-19 04:00:38 +00:00
c77eb8a33c [Bugfix] Set temperature=0.7 in test_guided_choice_chat (#11264) 2024-12-17 16:34:06 -08:00
2d1b9baa8f [Bugfix] Fix request cancellation without polling (#11190) 2024-12-17 12:26:32 -08:00
66d4b16724 [Frontend] Add OpenAI API support for input_audio (#11027)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-16 22:09:58 -08:00
0064f697d3 [CI] Add test case with JSON schema using references + use xgrammar by default with OpenAI parse (#10935)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-17 11:39:58 +08:00
551603feff [core] overhaul memory profiling and fix backward compatibility (#10511)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2024-12-16 13:32:25 -08:00
d927dbcd88 [Model] Refactor Ultravox to use merged input processor (#11198)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-12-16 10:09:53 +00:00
9c3dadd1c9 [Frontend] Add logits_processors as an extra completion argument (#11150)
Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com>
2024-12-14 16:46:42 +00:00
0920ab9131 [Doc] Reorganize online pooling APIs (#11172)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-14 00:22:22 +08:00
eeec9e3390 [Frontend] Separate pooling APIs in offline inference (#11129)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-13 10:40:07 +00:00
85362f028c [Misc][LoRA] Ensure Lora Adapter requests return adapter name (#11094)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-12 09:25:16 +00:00
8f10d5e393 [Misc] Split up pooling tasks (#10820)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-11 01:28:00 -08:00
a811dd6608 [Model] merged input processor for Phi-3-Vision models (#10977)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-12-09 12:55:10 -08:00
8d370e91cb [Bugfix] Fallback to outlines for complex json schemas (#10899)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-12-05 11:14:06 +08:00
9323a3153b [Core][Performance] Add XGrammar support for guided decoding and set it as default (#10785)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2024-12-03 15:17:00 +08:00
d2f058e76c [Misc] Rename embedding classes to pooling (#10801)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-12-01 14:36:51 +08:00