youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Nick Hill	7f6d47c1a2	[V1][BugFix] Exit properly if engine core fails during startup (#16137 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-07 15:30:15 -07:00
Cyrus Leung	66d433b94f	[V1] Revert the default `max_num_seqs` to V0 values for most hardware (#16158 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-07 13:54:36 -04:00
Nick Hill	15dac210f0	[V1] AsyncLLM data parallel (#13923 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-27 16:14:41 -07:00
Cody Yu	54aa619459	[V1] Refactor num_computed_tokens logic (#15307 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-27 04:54:36 +00:00
marko	27df5199d9	Support SHA256 as hash function in prefix caching (#15297 ) Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>	2025-03-26 11:11:28 -07:00
Nick Hill	9d72daf4ce	[V1][Perf] Simpler request output queues (#15156 ) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-03-24 22:44:08 +00:00
Jason	d8e82bc06d	[Bugfix] fix V1 Engine crash while handling requests with duplicate request id (#15043 ) Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>	2025-03-20 10:01:02 -07:00
Murali Andoorveedu	61c7a1b856	[V1] Minor V1 async engine test refactor (#15075 ) Signed-off-by: andoorve <murali.andoorveedu@mail.utoronto.ca> Co-authored-by: andoorve <murali.andoorveedu@mail.utoronto.ca>	2025-03-19 10:37:17 -07:00
Cyrus Leung	f690372b68	[Core] Update dtype detection and defaults (#14858 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-19 13:49:33 +08:00
vllmellm	2bb0e1a799	[Bugfix][ROCm] running new process using spawn method for rocm in tests. (#14810 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-03-17 11:33:35 +00:00
Sibi	a73e183e36	[Misc] Replace os environ to monkeypatch in test suite (#14516 ) Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-16 20:35:57 -07:00
Robert Shaw	d4d93db2c5	[V1] V1 Enablement Oracle (#13726 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-03-14 22:02:20 -07:00
afeldman-nm	02fcaa3d0a	[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output (#14624 ) Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>	2025-03-13 19:07:34 +00:00
Nick Hill	f5d3acd474	[BugFix][V1] Fix parallel sampling finishing/aborts (#14512 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-12 10:29:48 -07:00
afeldman-nm	ef64044079	[V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC (#13949 )	2025-03-08 01:48:12 +00:00
Nick Hill	8ed5421aaa	[V1] Eagerly remove finished requests from the batch (#14388 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-07 10:56:00 -08:00
Nick Hill	5db6b2c961	[V1][BugFix] Fix remaining sync engine client shutdown errors/hangs (#13869 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-04 15:06:47 +00:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
afeldman-nm	befc402d34	[V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) (#10980 ) Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-02-24 08:29:41 -08:00
Nick Hill	cbae7af552	[V1][BugFix] Fix engine core client shutdown hangs (#13298 ) Even though ZMQ context.destroy() is meant to close open sockets before terminating the context, it appears to be necessary to do this explicitly or else it can hang in the context.term() method. Close zmq sockets explicitly before terminating context, make shutdown of client resource more robust, shut down engine core process prior to terminating zmq context. Signed-off-by: Nick Hill <nhill@redhat.com>	2025-02-23 13:07:43 -08:00
youkaichao	eb24dc4a45	[v1] torchrun compatibility (#13642 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-23 22:47:24 +08:00
Nick Hill	caf7ff4456	[V1][Core] Generic mechanism for handling engine utility (#13060 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-02-19 17:09:22 +08:00
Murali Andoorveedu	a4d577b379	[V1][Tests] Adding additional testing for multimodal models to V1 (#13308 ) Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>	2025-02-18 09:53:14 -08:00
Cody Yu	9206b3d7ec	[V1][PP] Run engine busy loop with batch queue (#13064 )	2025-02-15 03:59:01 -08:00
Harry Mellor	f2b20fe491	Consolidate Llama model usage in tests (#13094 )	2025-02-13 22:18:03 -08:00
Mark McLoughlin	75e6e14516	[V1][Metrics] Add several request timing histograms (#12644 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-02-11 10:14:00 -05:00
afeldman-nm	0630d4537a	[V1] Logprobs and prompt logprobs support (#9880 ) This PR is adding support for sample logprobs & prompt logprobs to vLLM v1. New behavior: - During model execution, model runner computes sample logprobs (if user-provided logprobs setting is not None) and prompt logprobs (if user-provided prompt_logprobs setting is not None). For both sample and prompt logprobs, the engine core returns 3 vectors: token ids, token logprob values, token ranks. Ranks reflect tokens' 1-indexed positions in the vocabulary vector after sorting the vocabulary by log probability in descending order. - In scheduler.update_from_output(), sample and prompt logprobs are incorporated into the EngineCoreOutput data structure which is transferred to the engine client. If multiprocessing is enabled, then sample and prompt logprobs will be (de)serialized when the EngineCoreOutput data structure is (de)serialized. - During output processing, the LogprobsProcessor transforms the triplet of token ids, token logprobs values, and token ranks into the OpenAI-compatible List[Dict[token id,Logprob]] format (for sample and prompt logprobs respectively.) - Each Logprob instance (whether sample- or prompt-) consists of a token's log-probability, rank, and detokenized string representation. Note that logprob detokenization is handled by the LogprobsProcessor not the detokenizer. Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-02-07 07:26:20 -08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Woosuk Kwon	3f1fc7425a	[V1][CI/Test] Do basic test for top-p & top-k sampling (#12469 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-01-27 09:40:04 -08:00
Nick Hill	24b0205f58	[V1][Frontend] Coalesce bunched `RequestOutput`s (#12298 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2025-01-23 17:17:41 -08:00
Robert Shaw	619ae268c3	[V1] [2/n] Logging and Metrics - `OutputProcessor` Abstraction (#11973 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2025-01-13 04:54:10 +00:00
Robert Shaw	9597a095f2	[V1][Core][1/n] Logging and Metrics (#11962 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2025-01-12 21:02:02 +00:00
Chen Zhang	cf5f000d21	[torch.compile] Hide KV cache behind torch.compile boundary (#11677 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-01-10 13:14:42 +08:00
Rui Qiao	022c5c6944	[V1] Refactor get_executor_cls (#11754 )	2025-01-06 07:59:16 +00:00
Robert Shaw	80c751e7f6	[V1] Simplify Shutdown (#11659 )	2025-01-03 17:25:38 +00:00
Robert Shaw	4fb8e329fd	[V1] [5/N] API Server: unify `Detokenizer` and `EngineCore` input (#11545 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>	2024-12-28 20:51:57 +00:00
Robert Shaw	df04dffade	[V1] [4/N] API Server: ZMQ/MP Utilities (#11541 )	2024-12-28 01:45:08 +00:00
sroy745	dcb1a944d4	[V1] Adding min tokens/repetition/presence/frequence penalties to V1 sampler (#10681 ) Signed-off-by: Sourashis Roy <sroy@roblox.com> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-12-26 19:02:58 +09:00
Cody Yu	bf8717ebae	[V1] Prefix caching for vision language models (#11187 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>	2024-12-17 16:37:59 -08:00
Alexander Matveev	4e11683368	[V1] VLM preprocessor hashing (#11020 ) Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: Alexander Matveev <alexm@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-12-12 00:55:30 +00:00
Alexander Matveev	3bc94cab69	[V1] VLM - Run the mm_mapper preprocessor in the frontend process (#10640 ) Signed-off-by: Roger Wang <ywang@roblox.com> Co-authored-by: Michael Goin <michael@neuralmagic.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-12-03 10:33:10 +00:00
Ricky Xu	d9b4b3f069	[Bug][CLI] Allow users to disable prefix caching explicitly (#10724 ) Signed-off-by: rickyx <rickyx@anyscale.com>	2024-11-27 23:59:28 -08:00
Ricky Xu	519e8e4182	[v1] EngineArgs for better config handling for v1 (#10382 ) Signed-off-by: rickyx <rickyx@anyscale.com>	2024-11-25 21:09:43 -08:00
youkaichao	25d806e953	[misc] add torch.compile compatibility check (#10618 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-24 23:40:08 -08:00
Woosuk Kwon	112fa0bbe5	[V1] Fix CI tests on V1 engine (#10272 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-11-12 16:17:20 -08:00
Robert Shaw	6ace6fba2c	[V1] `AsyncLLM` Implementation (#9826 ) Signed-off-by: Nick Hill <nickhill@us.ibm.com> Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-11-11 23:05:38 +00:00

46 Commits