youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Cyrus Leung	f690372b68	[Core] Update dtype detection and defaults (#14858 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-19 13:49:33 +08:00
vllmellm	2bb0e1a799	[Bugfix][ROCm] running new process using spawn method for rocm in tests. (#14810 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-03-17 11:33:35 +00:00
Sibi	a73e183e36	[Misc] Replace os environ to monkeypatch in test suite (#14516 ) Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-16 20:35:57 -07:00
Jun Duan	74bc397b0a	[Core] Expose API endpoint `/is_sleeping` (#14312 ) Signed-off-by: Jun Duan <jun.duan.phd@outlook.com>	2025-03-15 06:28:14 -07:00
Robert Shaw	d4d93db2c5	[V1] V1 Enablement Oracle (#13726 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-03-14 22:02:20 -07:00
daniel-salib	73deea2fdb	[Frontend] track server_load (#13950 )	2025-03-14 09:53:17 -07:00
Alexander Matveev	cb8bdfade2	[V1] TPU - Add tensor parallel support via Ray (#13618 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-03-08 08:19:38 -05:00
Cyrus Leung	33f227e16b	[CI/Build] Use a fixed seed to avoid flaky tests (#14480 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-08 11:30:09 +00:00
Harry Mellor	47512b3200	Default to `generation_config` from model (#12622 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-03-08 14:46:15 +08:00
மனோஜ்குமார் பழனிச்சாமி	cc10281498	[Misc] Set default value of seed to None (#14274 ) Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>	2025-03-07 10:40:01 +00:00
Nicolò Lucchesi	fa82b93853	[Frontend][Docs] Transcription API streaming (#13301 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-03-06 10:39:35 +00:00
Benjamin Chislett	32985bed7c	[Frontend] Allow return_tokens_as_token_ids to be passed as a request param (#14066 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-03-05 06:30:40 +00:00
Mark McLoughlin	ae122b1cbd	[WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n, and request_params_max_tokens metrics (#14055 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-03-03 19:04:45 +00:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
Harry Mellor	4be4b26cb7	Fix entrypoint tests for embedding models (#14052 )	2025-02-28 08:56:44 -08:00
Harry Mellor	76c89fcadd	Use smaller embedding model when not testing model specifically (#13891 )	2025-02-28 00:50:43 -08:00
Mark McLoughlin	cd711c48b2	[V1][Metrics] Handle preemptions (#13169 )	2025-02-26 20:04:59 -08:00
Wallas Henrique	4cb6fa0a9c	[Bugfix] Backend option to disable xgrammar any_whitespace (#12744 ) Signed-off-by: Wallas Santos <wallashss@ibm.com> Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>	2025-02-26 10:52:34 -08:00
Cyrus Leung	934bb99c71	[Bugfix] Update expected token counts for Ultravox tests (#13895 )	2025-02-26 04:56:50 -08:00
Jee Jee Li	5157338ed9	[Misc] Improve LoRA spelling (#13831 )	2025-02-25 23:43:01 -08:00
Kevin H. Luu	2c5e637b57	[ci] Use env var to control whether to use S3 bucket in CI (#13634 )	2025-02-22 19:19:45 -08:00
Keyun Tong	8db1b9d0a1	Support SSL Key Rotation in HTTP Server (#13495 )	2025-02-22 05:17:44 -08:00
Mark McLoughlin	1cd981da4f	[V1][Metrics] Support `vllm:cache_config_info` (#13299 )	2025-02-22 00:20:00 -08:00
Keyun Tong	0ffdf8ce0c	[HTTP Server] Make model param optional in request (#13568 )	2025-02-21 21:55:50 -08:00
Gabriel Marinho	1c3c975766	[FEATURE] Enables /score endpoint for embedding models (#12846 )	2025-02-20 22:09:47 -08:00
Joe Runde	bfbc0b32c6	[Frontend] Add backend-specific options for guided decoding (#13505 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2025-02-20 15:07:58 -05:00
youkaichao	ba81163997	[core] add sleep and wake up endpoint and v1 support (#12987 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: cennn <2523403608@qq.com> Co-authored-by: cennn <2523403608@qq.com>	2025-02-20 12:41:17 +08:00
Kevin H. Luu	d5d214ac7f	[1/n][CI] Load models in CI from S3 instead of HF (#13205 ) Signed-off-by: <> Co-authored-by: EC2 Default User <ec2-user@ip-172-31-20-117.us-west-2.compute.internal>	2025-02-19 07:34:59 +00:00
Michael Goin	b53d79983c	Add outlines fallback when JSON schema has enum (#13449 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-02-18 06:49:41 +00:00
Mark McLoughlin	2ad1bc7afe	[V1][Metrics] Add iteration_tokens_total histogram from V0 (#13288 )	2025-02-15 03:56:19 -08:00
Alexander Matveev	45f90bcbba	[WIP] TPU V1 Support Refactored (#13049 )	2025-02-14 00:21:53 -08:00
Harry Mellor	f2b20fe491	Consolidate Llama model usage in tests (#13094 )	2025-02-13 22:18:03 -08:00
Nicolò Lucchesi	d84cef76eb	[Frontend] Add `/v1/audio/transcriptions` OpenAI API endpoint (#12909 )	2025-02-13 07:23:45 -08:00
Vaibhav Jain	37dfa60037	[Bugfix] Missing Content Type returns 500 Internal Server Error (#13193 )	2025-02-13 06:52:22 -08:00
LikeSundayLikeRain	04f50ad9d1	[Bugfix] deepseek_r1_reasoning_parser put reason content in wrong field in certain edge case (#13097 )	2025-02-12 23:11:26 -08:00
Mark McLoughlin	75e6e14516	[V1][Metrics] Add several request timing histograms (#12644 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-02-11 10:14:00 -05:00
Cody Yu	41c5dd45b9	[V1][Metrics] Add GPU prefix cache hit rate % gauge (#12592 )	2025-02-11 08:27:25 +00:00
Ce Gao	fc6485d277	[Bugfix]: Reasoning output bug according to the chat template change (#13025 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai>	2025-02-11 15:49:03 +08:00
Farzad Abdolhosseini	08b2d845d6	[Model] Ultravox Model: Support v0.5 Release (#12912 ) Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai>	2025-02-10 22:02:48 +00:00
Cyrus Leung	ce26b16268	[Misc] Remove unnecessary detokenization in multimodal processing (#12868 )	2025-02-07 06:21:17 -08:00
Maximilien de Bayser	6e1fc61f0f	Prevent unecessary requests to huggingface hub (#12837 )	2025-02-06 21:37:41 -08:00
Cyrus Leung	75404d041b	[VLM] Update compatibility with transformers 4.49	2025-02-05 19:09:45 -08:00
Mark McLoughlin	233df6f5c4	[V1][Metrics] Add request_success_total counter, labelled with finish reason (#12579 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-02-04 19:46:54 -05:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Mark McLoughlin	f17f1d4608	[V1][Metrics] Add GPU cache usage % gauge (#12561 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-01-29 18:31:01 -08:00
Mark McLoughlin	46fb056749	[V1][Metrics] Add TTFT and TPOT histograms (#12530 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-01-29 04:11:16 +00:00
Ce Gao	a7e3eba66f	[Frontend] Support reasoning content for deepseek r1 (#12473 ) Signed-off-by: Ce Gao <cegao@tensorchord.ai> Co-authored-by: Rafael Vasquez <rafvasq21@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com>	2025-01-29 11:38:08 +08:00
Mark McLoughlin	c386c43ca3	[V1][Metrics] Add per-request prompt/generation_tokens histograms (#12516 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-01-28 22:07:22 +00:00
Mark McLoughlin	3fd1fb63ef	[V1][Metrics] Hook up IterationStats for Prometheus metrics (#12478 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-01-28 16:38:38 +00:00
Mark McLoughlin	01ba927040	[V1][Metrics] Add initial Prometheus logger (#12416 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-01-27 12:26:28 -05:00

1 2 3 4 5

243 Commits