f690372b68
[Core] Update dtype detection and defaults ( #14858 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-19 13:49:33 +08:00
2bb0e1a799
[Bugfix][ROCm] running new process using spawn method for rocm in tests. ( #14810 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-03-17 11:33:35 +00:00
a73e183e36
[Misc] Replace os environ to monkeypatch in test suite ( #14516 )
...
Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com >
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Aaron Pham <contact@aarnphm.xyz >
2025-03-16 20:35:57 -07:00
74bc397b0a
[Core] Expose API endpoint /is_sleeping ( #14312 )
...
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com >
2025-03-15 06:28:14 -07:00
d4d93db2c5
[V1] V1 Enablement Oracle ( #13726 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
2025-03-14 22:02:20 -07:00
73deea2fdb
[Frontend] track server_load ( #13950 )
2025-03-14 09:53:17 -07:00
cb8bdfade2
[V1] TPU - Add tensor parallel support via Ray ( #13618 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-03-08 08:19:38 -05:00
33f227e16b
[CI/Build] Use a fixed seed to avoid flaky tests ( #14480 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-08 11:30:09 +00:00
47512b3200
Default to generation_config from model ( #12622 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-08 14:46:15 +08:00
cc10281498
[Misc] Set default value of seed to None ( #14274 )
...
Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com >
2025-03-07 10:40:01 +00:00
fa82b93853
[Frontend][Docs] Transcription API streaming ( #13301 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-03-06 10:39:35 +00:00
32985bed7c
[Frontend] Allow return_tokens_as_token_ids to be passed as a request param ( #14066 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
2025-03-05 06:30:40 +00:00
ae122b1cbd
[WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n, and request_params_max_tokens metrics ( #14055 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-03-03 19:04:45 +00:00
cf069aa8aa
Update deprecated Python 3.8 typing ( #13971 )
2025-03-02 17:34:51 -08:00
4be4b26cb7
Fix entrypoint tests for embedding models ( #14052 )
2025-02-28 08:56:44 -08:00
76c89fcadd
Use smaller embedding model when not testing model specifically ( #13891 )
2025-02-28 00:50:43 -08:00
cd711c48b2
[V1][Metrics] Handle preemptions ( #13169 )
2025-02-26 20:04:59 -08:00
4cb6fa0a9c
[Bugfix] Backend option to disable xgrammar any_whitespace ( #12744 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com >
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com >
2025-02-26 10:52:34 -08:00
934bb99c71
[Bugfix] Update expected token counts for Ultravox tests ( #13895 )
2025-02-26 04:56:50 -08:00
5157338ed9
[Misc] Improve LoRA spelling ( #13831 )
2025-02-25 23:43:01 -08:00
2c5e637b57
[ci] Use env var to control whether to use S3 bucket in CI ( #13634 )
2025-02-22 19:19:45 -08:00
8db1b9d0a1
Support SSL Key Rotation in HTTP Server ( #13495 )
2025-02-22 05:17:44 -08:00
1cd981da4f
[V1][Metrics] Support vllm:cache_config_info ( #13299 )
2025-02-22 00:20:00 -08:00
0ffdf8ce0c
[HTTP Server] Make model param optional in request ( #13568 )
2025-02-21 21:55:50 -08:00
1c3c975766
[FEATURE] Enables /score endpoint for embedding models ( #12846 )
2025-02-20 22:09:47 -08:00
bfbc0b32c6
[Frontend] Add backend-specific options for guided decoding ( #13505 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2025-02-20 15:07:58 -05:00
ba81163997
[core] add sleep and wake up endpoint and v1 support ( #12987 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: cennn <2523403608@qq.com >
Co-authored-by: cennn <2523403608@qq.com >
2025-02-20 12:41:17 +08:00
d5d214ac7f
[1/n][CI] Load models in CI from S3 instead of HF ( #13205 )
...
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal >
2025-02-19 07:34:59 +00:00
b53d79983c
Add outlines fallback when JSON schema has enum ( #13449 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-02-18 06:49:41 +00:00
2ad1bc7afe
[V1][Metrics] Add iteration_tokens_total histogram from V0 ( #13288 )
2025-02-15 03:56:19 -08:00
45f90bcbba
[WIP] TPU V1 Support Refactored ( #13049 )
2025-02-14 00:21:53 -08:00
f2b20fe491
Consolidate Llama model usage in tests ( #13094 )
2025-02-13 22:18:03 -08:00
d84cef76eb
[Frontend] Add /v1/audio/transcriptions OpenAI API endpoint ( #12909 )
2025-02-13 07:23:45 -08:00
37dfa60037
[Bugfix] Missing Content Type returns 500 Internal Server Error ( #13193 )
2025-02-13 06:52:22 -08:00
04f50ad9d1
[Bugfix] deepseek_r1_reasoning_parser put reason content in wrong field in certain edge case ( #13097 )
2025-02-12 23:11:26 -08:00
75e6e14516
[V1][Metrics] Add several request timing histograms ( #12644 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-02-11 10:14:00 -05:00
41c5dd45b9
[V1][Metrics] Add GPU prefix cache hit rate % gauge ( #12592 )
2025-02-11 08:27:25 +00:00
fc6485d277
[Bugfix]: Reasoning output bug according to the chat template change ( #13025 )
...
Signed-off-by: Ce Gao <cegao@tensorchord.ai >
2025-02-11 15:49:03 +08:00
08b2d845d6
[Model] Ultravox Model: Support v0.5 Release ( #12912 )
...
Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai >
2025-02-10 22:02:48 +00:00
ce26b16268
[Misc] Remove unnecessary detokenization in multimodal processing ( #12868 )
2025-02-07 06:21:17 -08:00
6e1fc61f0f
Prevent unecessary requests to huggingface hub ( #12837 )
2025-02-06 21:37:41 -08:00
75404d041b
[VLM] Update compatibility with transformers 4.49
2025-02-05 19:09:45 -08:00
233df6f5c4
[V1][Metrics] Add request_success_total counter, labelled with finish reason ( #12579 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-02-04 19:46:54 -05:00
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files ( #12628 )
...
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**
commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:18:24 2025 -0500
Add SPDX license headers to python source files
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
also be easily used by tools to help manage license compliance.
The Linux Foundation runs license scans against the codebase to help
ensure
we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
More information can be found on the SPDX site:
- https://spdx.dev/learn/handling-license-info/
Signed-off-by: Russell Bryant <rbryant@redhat.com >
commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:36:32 2025 -0500
Check for SPDX headers using pre-commit
Signed-off-by: Russell Bryant <rbryant@redhat.com >
---------
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-02-02 11:58:18 -08:00
f17f1d4608
[V1][Metrics] Add GPU cache usage % gauge ( #12561 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-01-29 18:31:01 -08:00
46fb056749
[V1][Metrics] Add TTFT and TPOT histograms ( #12530 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-01-29 04:11:16 +00:00
a7e3eba66f
[Frontend] Support reasoning content for deepseek r1 ( #12473 )
...
Signed-off-by: Ce Gao <cegao@tensorchord.ai >
Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Michael Goin <mgoin@redhat.com >
2025-01-29 11:38:08 +08:00
c386c43ca3
[V1][Metrics] Add per-request prompt/generation_tokens histograms ( #12516 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-01-28 22:07:22 +00:00
3fd1fb63ef
[V1][Metrics] Hook up IterationStats for Prometheus metrics ( #12478 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-01-28 16:38:38 +00:00
01ba927040
[V1][Metrics] Add initial Prometheus logger ( #12416 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-01-27 12:26:28 -05:00