|
|
c29fb540ff
|
[gpt-oss] tool parser supports for /chat/completions [1/n] (#22386)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-09-04 20:39:12 -07:00 |
|
|
|
51d5e9be7d
|
[Core][Model] Terratorch backend integration (#23513)
Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: Christian Pinto <christian.pinto@ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-04 00:22:41 -07:00 |
|
|
|
712b273f65
|
[Refactor] Introduce basic Renderer for completion-style request (#24010)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2025-09-04 05:21:12 +00:00 |
|
|
|
a38f8bd54c
|
[Feature][Responses API]Support MCP tools with streaming mode + background mode (#23927)
Signed-off-by: wuhang <wuhang6@huawei.com>
|
2025-09-04 04:05:10 +00:00 |
|
|
|
51383bd472
|
[CI] Accelerate mteb test by setting SentenceTransformers mteb score to a constant (#24088)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-03 17:23:56 +08:00 |
|
|
|
70549c1245
|
[CI/Build] Serve images used by multimodal tests through local HTTP Server (#23907)
Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com>
Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-09-03 16:13:11 +08:00 |
|
|
|
d7e1e59972
|
[Doc]: fix typos in Python comments (#24093)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-09-02 21:05:45 -07:00 |
|
|
|
2417798471
|
[Metrics] Deprecate TPOT in favor of ITL (#24110)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-09-02 18:10:10 +00:00 |
|
|
|
f399182e8c
|
Run ruff format on a few files. (#24075)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-09-02 17:55:32 +00:00 |
|
|
|
ce30dca5c4
|
[CI]: reduce HTTP calls inside entrypoints openai tests (#23646)
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
Signed-off-by: Aziz <azizbenothman76@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-02 10:49:32 +00:00 |
|
|
|
d46934b229
|
[Frontend] Gemma3n audio transcriptions/translations endpoint (#23735)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-01 18:07:46 +08:00 |
|
|
|
628d00cd7b
|
[Bugfix] Fix test_lora_resolvers.py (#23984)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-30 11:16:11 +00:00 |
|
|
|
5b31cb1781
|
[Bugfix] Fix --config arg expansion called from api_server.py (#23944)
Signed-off-by: Jean-Francois Dube <dubejf+gh@gmail.com>
Co-authored-by: Jean-Francois Dube <dubejf+gh@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-29 21:36:39 -07:00 |
|
|
|
4d7fe40fc0
|
[RL][BugFix] Fix missing tokenizer error for token-in-token-out (#23904)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-30 01:09:55 +08:00 |
|
|
|
d9e00dbd1f
|
[Performance] V1 Classify Models E2E Performance Optimization (#23541)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-29 03:12:32 -07:00 |
|
|
|
2554b27baa
|
[V0 Deprecation] Remove pooling model support in V0 (#23434)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-29 00:04:02 -07:00 |
|
|
|
b4f9e9631c
|
[CI/Build] Clean up LoRA test (#23890)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-28 23:28:35 -07:00 |
|
|
|
c8b3b299c9
|
[tests] Improve speed and reliability of test_transcription_api_correctness (#23854)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-08-29 04:25:33 +00:00 |
|
|
|
eb1995167e
|
[gpt-oss] Enable unit test for response API harmony integration (#23533)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-26 18:23:26 -07:00 |
|
|
|
ebd5a77bb5
|
feat: add usage to TranscriptionResponse (text and json response_format) (#23576)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
|
2025-08-26 05:26:26 -07:00 |
|
|
|
ce0e9dbd43
|
[CI/Build] Fix typo in #23561 (#23616)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-25 23:13:03 -07:00 |
|
|
|
6fd45e7b8a
|
[CI/Build] Use vLLM client's user agent to fetch images (#23561)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-25 19:34:12 -07:00 |
|
|
|
5e021b4981
|
(Misc): add missing test for zero truncation size. (#23457)
Signed-off-by: teekenl <teekenlau@gmail.com>
|
2025-08-24 18:12:47 +08:00 |
|
|
|
d9a55204ba
|
fix(tests): Correct unreachable assertion in truncation test (#23425)
Signed-off-by: AzizCode92 <azizbenothman76@gmail.com>
|
2025-08-23 05:23:54 +00:00 |
|
|
|
8896eb72eb
|
[Deprecation] Remove prompt_token_ids arg fallback in LLM.generate and LLM.embed (#18800)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-22 10:56:57 +08:00 |
|
|
|
582bbe6bd7
|
[Fix] correct tool_id for kimi-k2 when use tool_choice=required (#21259)
Co-authored-by: wangzhengtao <wangzhengtao@msh.team>
|
2025-08-20 12:59:54 -07:00 |
|
|
|
38217877aa
|
[Fix] fix offline env use local mode path (#22526)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-08-20 13:34:49 +00:00 |
|
|
|
8fd920924c
|
[BugFix] Fix stuck stats/metrics after requests are aborted (#22995)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-08-20 13:50:29 +08:00 |
|
|
|
80141bbf2f
|
fix: use cache_salt for gpt-oss (#23186)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
|
2025-08-19 18:12:25 +00:00 |
|
|
|
f7cf5b512e
|
[Frontend] Add /collective_rpc API endpoint (#23075)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-19 17:29:32 +00:00 |
|
|
|
24f4d1a224
|
Add return_token_ids parameter to OpenAI API endpoints (#22587)
Signed-off-by: Yuge Zhang <scottyugochang@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-08-19 09:48:31 -07:00 |
|
|
|
3253ae765e
|
[Flaky CI] Increase timeout tolerance for test_mp_crash_detection+test_default_mm_lora_chat_completions (#23028)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-16 18:33:08 +00:00 |
|
|
|
68373d3126
|
[Frontend] Added support for HermesToolParser for models without special tokens (#16890)
Signed-off-by: minpeter <kali2005611@gmail.com>
|
2025-08-16 17:38:42 +00:00 |
|
|
|
78863f8c5c
|
[BugFix] Add support for loading prompt embeds tensors serialized on unavailable devices and sparse tensors (#22962)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
2025-08-16 06:25:10 +00:00 |
|
|
|
8a87cd27d9
|
[CI] Speed up Whisper tests by reusing server (#22859)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-15 16:56:31 -04:00 |
|
|
|
540d54ca8d
|
[CI] Re-enable transcriptions test_long_audio_request (#22890)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-08-14 11:34:34 +00:00 |
|
|
|
a353bd083d
|
[CI] remove flaky v0 test (#22864)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-08-13 21:41:51 -07:00 |
|
|
|
b6af24fba7
|
[CI][Entrypoints]: add filter to generation to filter out invalid tool calls (#22826)
Signed-off-by: Will Eaton <weaton@redhat.com>
|
2025-08-13 20:09:07 -07:00 |
|
|
|
653124bd46
|
[Frontend] Add chunked processing to handle long inputs in embedding models (#22280)
Signed-off-by: x22x22 <wadeking@qq.com>
Signed-off-by: Kdump <rootshellexp@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-13 04:14:24 -07:00 |
|
|
|
71683ca6f6
|
[V0 Deprecation] Remove multi-step scheduling (#22138)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-08-12 20:18:39 -07:00 |
|
|
|
ea1292ad3e
|
[CI Failure] Use float32 for tests/entrypoints/openai/test_audio.py (#22686)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-11 20:20:42 -07:00 |
|
|
|
839ab00349
|
Re-enable Xet on TPU tests now that hf_xet has been updated (#22666)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-11 19:54:40 -07:00 |
|
|
|
1891a265d3
|
[gpt-oss] Add test for response API + harmony (but skipped) (#22554)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-08-11 17:47:24 -07:00 |
|
|
|
84cf78acee
|
[Model] Pooling models default to using chunked prefill & prefix caching if supported. (#20930)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-11 09:41:37 -07:00 |
|
|
|
39052dbca8
|
Support token_type_ids in V1 with less code changes (#21985)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-08-10 22:54:59 -07:00 |
|
|
|
b799f4b9ea
|
[CI/Build] Fix tensorizer test for load_format change (#22583)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-10 19:30:00 -07:00 |
|
|
|
311d875614
|
Drop flaky test_healthcheck_response_time (#22539)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-08-08 16:56:47 -07:00 |
|
|
|
baece8c3d2
|
[Frontend] Add unix domain socket support (#18097)
Signed-off-by: <yyweiss@gmail.com>
Signed-off-by: yyw <yyweiss@gmail.com>
|
2025-08-08 16:23:44 -07:00 |
|
|
|
370661856b
|
[Frontend] Update OpenAI error response to upstream format (#22099)
Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>
|
2025-08-06 23:06:00 -07:00 |
|
|
|
586f286789
|
[Model] Pooling model activation supports per request control by PoolingParams (#20538)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-08-05 00:37:00 -07:00 |
|