youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Aaron Pham	4a98edff1f	[Structured Outputs][V1] Skipping with models doesn't contain tokenizers (#20365 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-07-04 15:05:49 +08:00
Nick Hill	657f2f301a	[DP] Support external DP Load Balancer mode (#19790 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-02 10:21:52 -07:00
Seiji Eicher	65397e40f5	[Bugfix] Allow `CUDA_VISIBLE_DEVICES=''` in `Platform.device_id_to_physical_device_id` (#18979 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2025-06-26 00:01:57 -07:00
Isotr0py	ee9a1531aa	[CI/Build][Bugfix] Fix deadlock on v1 engine test CI (#19872 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-20 09:51:07 +08:00
kourosh hakhamaneshi	e2148dc5ea	[Bugfix] Add check_health to v1 async client. (#19821 ) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>	2025-06-18 21:47:01 -07:00
Maximilien de Bayser	799397ee4f	Support embedding models in V1 (#16188 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com> Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-18 21:36:33 -07:00
Isotr0py	1173804dca	[Bugfix] Fix TP inference for Flex attention backend (#19657 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-16 11:21:37 +00:00
Isotr0py	2db9044ab6	[Bugfix] Fix auto dtype casting for BatchFeature (#19316 ) Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-06-14 15:13:08 +00:00
Nick Hill	d5bdf899e4	[BugFix] Work-around incremental detokenization edge case error (#19449 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-06-12 06:43:20 +00:00
Adolfo Victoria	ca27f0f9c1	[Bugfix][Core] Update cancellation logic in `generate()` to handle Generator exits (#19225 ) Co-authored-by: Adolfo Victoria <adovi@meta.com>	2025-06-06 20:17:54 +00:00
jmswen	7353492a47	[Core] Raise when non-multi-instance DP clients target a DP rank (#19227 ) Signed-off-by: Jon Swenson <jmswen@gmail.com>	2025-06-06 19:03:01 +08:00
jmswen	c8dcc15921	Allow AsyncLLMEngine.generate to target a specific DP rank (#19102 ) Signed-off-by: Jon Swenson <jmswen@gmail.com>	2025-06-04 08:26:47 -07:00
Yan Ru Pei	b712be98c7	feat: add data parallel rank to KVEventBatch (#18925 )	2025-06-03 17:14:20 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Nick Hill	2dbe8c0774	[Perf] API-server scaleout with many-to-many server-engine comms (#17546 )	2025-05-30 08:17:00 -07:00
Nick Hill	d1d61f3351	[BugFix] Make DP work with connector-delayed new requests (#18559 ) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Will Eaton <weaton@redhat.com>	2025-05-29 18:04:18 +00:00
Mark McLoughlin	06a0338015	[V1][Metrics] Add API for accessing in-memory Prometheus metrics (#17010 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-27 09:37:06 +00:00
David Xia	de71fec81b	[CI] don't skip fixed `test_kv_cache_events()` (#18183 ) Signed-off-by: David Xia <david@davidxia.com>	2025-05-14 23:17:16 -07:00
Russell Bryant	78aa341d12	[CI] Fix race condition in test_kv_cache_events test (#18169 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-05-14 16:27:48 -07:00
Robert Shaw	856865008e	[CI] Disable Failing Tests (#18165 )	2025-05-14 13:49:56 -07:00
Nick Hill	55aa7af994	[V1] DP scale-out (2/N): Decouple engine process management and comms (#15977 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-13 10:48:21 -07:00
Nick Hill	5ea5c514da	[BugFix] Increase timeout for startup failure test (#17642 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-05-05 20:53:19 +00:00
Michael Goin	aa4502e7f3	[CI][Bugfix] Fix failing V1 Test due to missing 'cache_salt' arg (#17500 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-30 21:03:30 -07:00
Alec	0be6d05b5e	[V1][Metrics] add support for kv event publishing (#16750 ) Signed-off-by: alec-flowers <aflowers@nvidia.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-04-30 07:44:45 -07:00
Marko Rosenmueller	77073c77bc	[Core] Prevent side-channel attacks via cache salting (#17045 ) Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>	2025-04-30 20:27:21 +08:00
Nick Hill	df6f3ce883	[Core] Remove prompt string from engine core data structures (#17214 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-25 23:41:05 -07:00
Zijing Liu	53e8cf53a4	[V1][Metrics] Allow V1 AsyncLLM to use custom logger (#14661 ) Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-04-25 22:05:40 -07:00
Rui Qiao	c0dfd97519	[V1][PP] Optimization: continue scheduling prefill chunks (#17080 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2025-04-24 05:27:08 -07:00
Harry Mellor	0a05ed57e6	Simplify `TokenizerGroup` (#16790 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-24 04:43:56 -07:00
Jeffrey Li	0e4254492f	[Bugfix]: fix issue with n>1 sampling on v1 requests overriding each other (#16863 ) Signed-off-by: Jeffrey Li <jeffrey.dot.li@gmail.com>	2025-04-22 11:40:19 +08:00
Nick Hill	7f6d47c1a2	[V1][BugFix] Exit properly if engine core fails during startup (#16137 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-04-07 15:30:15 -07:00
Cyrus Leung	66d433b94f	[V1] Revert the default `max_num_seqs` to V0 values for most hardware (#16158 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-07 13:54:36 -04:00
Nick Hill	15dac210f0	[V1] AsyncLLM data parallel (#13923 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-27 16:14:41 -07:00
Cody Yu	54aa619459	[V1] Refactor num_computed_tokens logic (#15307 ) Signed-off-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-03-27 04:54:36 +00:00
marko	27df5199d9	Support SHA256 as hash function in prefix caching (#15297 ) Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>	2025-03-26 11:11:28 -07:00
Nick Hill	9d72daf4ce	[V1][Perf] Simpler request output queues (#15156 ) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com> Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com>	2025-03-24 22:44:08 +00:00
Jason	d8e82bc06d	[Bugfix] fix V1 Engine crash while handling requests with duplicate request id (#15043 ) Signed-off-by: Jiahui Sun <jhsun2020@gmail.com>	2025-03-20 10:01:02 -07:00
Murali Andoorveedu	61c7a1b856	[V1] Minor V1 async engine test refactor (#15075 ) Signed-off-by: andoorve <murali.andoorveedu@mail.utoronto.ca> Co-authored-by: andoorve <murali.andoorveedu@mail.utoronto.ca>	2025-03-19 10:37:17 -07:00
Cyrus Leung	f690372b68	[Core] Update dtype detection and defaults (#14858 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-03-19 13:49:33 +08:00
vllmellm	2bb0e1a799	[Bugfix][ROCm] running new process using spawn method for rocm in tests. (#14810 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-03-17 11:33:35 +00:00
Sibi	a73e183e36	[Misc] Replace os environ to monkeypatch in test suite (#14516 ) Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com> Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-03-16 20:35:57 -07:00
Robert Shaw	d4d93db2c5	[V1] V1 Enablement Oracle (#13726 ) Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-03-14 22:02:20 -07:00
afeldman-nm	02fcaa3d0a	[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output (#14624 ) Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>	2025-03-13 19:07:34 +00:00
Nick Hill	f5d3acd474	[BugFix][V1] Fix parallel sampling finishing/aborts (#14512 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-12 10:29:48 -07:00
afeldman-nm	ef64044079	[V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC (#13949 )	2025-03-08 01:48:12 +00:00
Nick Hill	8ed5421aaa	[V1] Eagerly remove finished requests from the batch (#14388 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-07 10:56:00 -08:00
Nick Hill	5db6b2c961	[V1][BugFix] Fix remaining sync engine client shutdown errors/hangs (#13869 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-03-04 15:06:47 +00:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
afeldman-nm	befc402d34	[V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) (#10980 ) Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-02-24 08:29:41 -08:00
Nick Hill	cbae7af552	[V1][BugFix] Fix engine core client shutdown hangs (#13298 ) Even though ZMQ context.destroy() is meant to close open sockets before terminating the context, it appears to be necessary to do this explicitly or else it can hang in the context.term() method. Close zmq sockets explicitly before terminating context, make shutdown of client resource more robust, shut down engine core process prior to terminating zmq context. Signed-off-by: Nick Hill <nhill@redhat.com>	2025-02-23 13:07:43 -08:00

1 2

76 Commits