Commit Graph

266 Commits

Author SHA1 Message Date
9665313c39 [V1] Set structured output backend to auto by default (#15724)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-04-10 17:53:26 +00:00
87b4ac56c2 [CI][Bugfix] Fix bad tolerance for test_batch_base64_embedding (#16221)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-09 04:14:46 +00:00
4ebc0b9640 [Bugfix] Proper input validation for multi-modal encoder-decoder models (#16156)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-08 09:45:21 -07:00
b99733d092 [Bugfix] Do not skip "empty" parts of chats that are parsable (#16219)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-08 05:14:15 +00:00
24f1c01e0f [Bugfix][V0] XGrammar structured output supports Enum (#15878)
Signed-off-by: Leon Seidel <leon.seidel@fau.de>
2025-04-07 22:38:25 +00:00
b6be6f8d1e [TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. (#15732)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-04-03 14:23:28 -07:00
cefb9e5a28 [Frontend] Implement Tool Calling with tool_choice='required' (#13483)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at>
Co-authored-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2025-04-02 07:45:45 -07:00
98d7367b61 [Metrics] Hide deprecated metrics (#15458)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-04-02 07:37:19 -07:00
594a8b9030 [Bugfix] Fix the issue where the model name is empty string, causing no response with the model name. (#15938)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-04-02 06:33:52 -07:00
ddb94c2605 [core] Add tags parameter to wake_up() (#15500)
Signed-off-by: Eric <erictang000@gmail.com>
2025-04-02 01:59:27 -07:00
79455cf421 [Misc] Enable V1 LoRA by default (#15320)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-04-01 16:53:56 +08:00
9a2160fa55 [V1] TPU CI - Add basic perf regression test (#15414)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-31 13:25:20 -04:00
7fd8c0f85c fix test_phi3v (#15321)
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
2025-03-30 02:01:34 -07:00
1286211f57 [Bugfix] LoRA V1: add and fix entrypoints tests (#15715)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-28 21:10:41 -07:00
7329ff5468 [V1] Support disable_any_whtespace for guidance backend (#15584)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-03-28 23:46:45 +08:00
a10314c6b3 [Misc] Fix test_sleep to use query parameters (#14373)
Signed-off-by: Lize Cai <lize.cai@sap.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-03-28 18:00:14 +08:00
32b14baf8a [Refactor][Frontend] Keep all logic about reasoning into one class (#14428)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
2025-03-28 00:23:30 -07:00
1711b929b6 [Model] Add Reasoning Parser for Granite Models (#14202)
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
2025-03-26 14:28:07 +00:00
99f536f830 [Misc] Enhance warning information to user-defined chat template (#15408)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-03-26 02:21:15 -07:00
cbcdf2c609 [Bugfix] Fix chat template loading (#15143)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-03-24 13:50:09 +00:00
d6cd59f122 [Frontend] Support tool calling and reasoning parser (#14511)
Signed-off-by: WangErXiao <863579016@qq.com>
2025-03-23 14:00:07 -07:00
f68cce8e64 [ci/build] fix broken tests in LLM.collective_rpc (#15350)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-03-23 14:49:48 +08:00
1f16b7fe74 [Core][V0] Add guidance backend for structured output (#14589)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <lohuynh@microsoft.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
2025-03-19 21:33:51 -07:00
f690372b68 [Core] Update dtype detection and defaults (#14858)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-19 13:49:33 +08:00
2bb0e1a799 [Bugfix][ROCm] running new process using spawn method for rocm in tests. (#14810)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-03-17 11:33:35 +00:00
a73e183e36 [Misc] Replace os environ to monkeypatch in test suite (#14516)
Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
2025-03-16 20:35:57 -07:00
74bc397b0a [Core] Expose API endpoint /is_sleeping (#14312)
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>
2025-03-15 06:28:14 -07:00
d4d93db2c5 [V1] V1 Enablement Oracle (#13726)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2025-03-14 22:02:20 -07:00
73deea2fdb [Frontend] track server_load (#13950) 2025-03-14 09:53:17 -07:00
cb8bdfade2 [V1] TPU - Add tensor parallel support via Ray (#13618)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-03-08 08:19:38 -05:00
33f227e16b [CI/Build] Use a fixed seed to avoid flaky tests (#14480)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-03-08 11:30:09 +00:00
47512b3200 Default to generation_config from model (#12622)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-03-08 14:46:15 +08:00
cc10281498 [Misc] Set default value of seed to None (#14274)
Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
2025-03-07 10:40:01 +00:00
fa82b93853 [Frontend][Docs] Transcription API streaming (#13301)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-03-06 10:39:35 +00:00
32985bed7c [Frontend] Allow return_tokens_as_token_ids to be passed as a request param (#14066)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
2025-03-05 06:30:40 +00:00
ae122b1cbd [WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n, and request_params_max_tokens metrics (#14055)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-03-03 19:04:45 +00:00
cf069aa8aa Update deprecated Python 3.8 typing (#13971) 2025-03-02 17:34:51 -08:00
4be4b26cb7 Fix entrypoint tests for embedding models (#14052) 2025-02-28 08:56:44 -08:00
76c89fcadd Use smaller embedding model when not testing model specifically (#13891) 2025-02-28 00:50:43 -08:00
cd711c48b2 [V1][Metrics] Handle preemptions (#13169) 2025-02-26 20:04:59 -08:00
4cb6fa0a9c [Bugfix] Backend option to disable xgrammar any_whitespace (#12744)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
2025-02-26 10:52:34 -08:00
934bb99c71 [Bugfix] Update expected token counts for Ultravox tests (#13895) 2025-02-26 04:56:50 -08:00
5157338ed9 [Misc] Improve LoRA spelling (#13831) 2025-02-25 23:43:01 -08:00
2c5e637b57 [ci] Use env var to control whether to use S3 bucket in CI (#13634) 2025-02-22 19:19:45 -08:00
8db1b9d0a1 Support SSL Key Rotation in HTTP Server (#13495) 2025-02-22 05:17:44 -08:00
1cd981da4f [V1][Metrics] Support vllm:cache_config_info (#13299) 2025-02-22 00:20:00 -08:00
0ffdf8ce0c [HTTP Server] Make model param optional in request (#13568) 2025-02-21 21:55:50 -08:00
1c3c975766 [FEATURE] Enables /score endpoint for embedding models (#12846) 2025-02-20 22:09:47 -08:00
bfbc0b32c6 [Frontend] Add backend-specific options for guided decoding (#13505)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
2025-02-20 15:07:58 -05:00
ba81163997 [core] add sleep and wake up endpoint and v1 support (#12987)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: cennn <2523403608@qq.com>
Co-authored-by: cennn <2523403608@qq.com>
2025-02-20 12:41:17 +08:00