youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Ye (Charlotte) Qi	6fb2788163	[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-09-09 10:02:35 +00:00
Ye (Charlotte) Qi	e7c4f9ee86	[CI/Build][Doc] Move existing benchmark scripts in CI/document/example to vllm bench CLI (#21355 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-07-26 07:10:14 -07:00
Reid	6fa718a460	[Misc] Modularize CLI Argument Parsing in Benchmark Scripts (#19593 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-14 16:54:52 +08:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Divakar Verma	774c5fde30	[V1] fix torch profiling for V1 offline scenarios (#18445 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-05-28 04:16:30 +00:00
cascade	aaa4ac1c95	Disable prefix cache by default for benchmark (#18639 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-05-27 20:06:34 +08:00
Harry Mellor	009d9e7590	Convert `benchmarks` to `ruff format` (#18068 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 13:43:29 +00:00
Jeremy Arnold	58abe35455	[Benchmarks] Make detokenization optional in benchmark scripts (#11697 ) Signed-off-by: Jeremy Arnold <Jeremy.Arnold@amd.com>	2025-03-07 08:09:00 -08:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
Huy Do	e7ef74e26e	Fix some issues with benchmark data output (#13641 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-02-24 10:23:18 +08:00
Cyrus Leung	7f6bae561c	[CI/Build] Fix pre-commit errors (#13696 )	2025-02-22 00:31:26 -08:00
Robin	8aca27fa11	[Bugfix] Fix benchmark script bug: inaccurate stats for vllm backend when max_model_len < input_len + output_len (#13691 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-02-22 14:10:38 +08:00
Huy Do	45186834a0	Run v1 benchmark and integrate with PyTorch OSS benchmark database (#13068 ) Signed-off-by: Huy Do <huydhn@gmail.com>	2025-02-17 08:16:32 +00:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Ye (Charlotte) Qi	1d967acb45	[Bugfix] fix beam search input errors and latency benchmark script (#11875 ) Signed-off-by: Ye Qi <yeq@meta.com> Co-authored-by: yeq <yeq@devgpu004.lla3.facebook.com>	2025-01-09 17:36:39 +08:00
Divakar Verma	4d29e91be8	[Misc] sort torch profiler table by kernel timing (#11813 )	2025-01-08 10:57:04 +08:00
Jeremy Arnold	cb6fdaa0a0	[Misc] Make benchmarks use EngineArgs (#9529 )	2024-10-22 15:40:38 -07:00
Kuntai Du	81ede99ca4	[Core] Deprecating block manager v1 and make block manager v2 default (#8704 ) Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).	2024-10-17 11:38:15 -05:00
sroy745	f3a507f1d3	[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149 )	2024-10-10 14:17:17 +08:00
youkaichao	18b296fdb2	[core] remove beam search from the core (#9105 )	2024-10-07 05:47:04 +00:00
Cyrus Leung	3b00b9c26c	[Core] rename`PromptInputs` and `inputs` (#8876 )	2024-09-26 20:35:15 -07:00
Simon Mo	4f1ba0844b	Revert "rename PromptInputs and inputs with backward compatibility (#8760 ) (#8810 )	2024-09-25 10:36:26 -07:00
Cyrus Leung	28e1299e60	rename PromptInputs and inputs with backward compatibility (#8760 )	2024-09-25 09:36:47 -07:00
Simon Mo	3185fb0cca	Revert "[Core] Rename `PromptInputs` to `PromptType`, and `inputs` to `prompt`" (#8750 )	2024-09-24 05:45:20 +00:00
Cyrus Leung	0057894ef7	[Core] Rename `PromptInputs` and `inputs`(#8673 )	2024-09-20 19:00:54 -07:00
Aarni Koskela	8baa454937	[Misc] Move device options to a single place (#8322 )	2024-09-11 13:25:58 -07:00
Cyrus Leung	739b61a348	[Frontend] Refactor prompt processing (#4028 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-22 10:13:53 -07:00
Alexander Matveev	3476ed0809	[Core] Optimize block_manager_v2 vs block_manager_v1 (to make V2 default) (#5602 )	2024-07-01 20:10:37 -07:00
Ilya Lavrenov	57f09a419c	[Hardware][Intel] OpenVINO vLLM backend (#5379 )	2024-06-28 13:50:16 +00:00
Woo-Yeon Lee	2ce5d6688b	[Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414 )	2024-06-25 09:56:06 +00:00
Michael Goin	8065a7e220	[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718 )	2024-06-20 17:00:13 -06:00
DearPlanet	d8714530d1	[Misc]Add param max-model-len in benchmark_latency.py (#5629 )	2024-06-19 18:19:08 +08:00
Ronen Schaffer	7879f24dcc	[Misc] Add OpenTelemetry support (#4687 ) This PR adds basic support for OpenTelemetry distributed tracing. It includes changes to enable tracing functionality and improve monitoring capabilities. I've also added a markdown with print-screens to guide users how to use this feature. You can find it here	2024-06-19 01:17:03 +09:00
Kuntai Du	9e4e6fe207	[CI] the readability of benchmarking and prepare for dashboard (#5571 ) [CI] Improve the readability of performance benchmarking results and prepare for upcoming performance dashboard (#5571)	2024-06-17 11:41:08 -07:00
Kunshang Ji	728c4c8a06	[Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814 ) Co-authored-by: Jiang Li <jiang1.li@intel.com> Co-authored-by: Abhilash Majumder <abhilash.majumder@intel.com> Co-authored-by: Abhilash Majumder <30946547+abhilash1910@users.noreply.github.com>	2024-06-17 11:01:25 -07:00
Kuntai Du	319ad7f1d3	[CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with `perf-benchmarks` label (#5073 ) Co-authored-by: simon-mo <simon.mo@hey.com>	2024-06-13 22:36:20 -07:00
Woosuk Kwon	1a8bfd92d5	[Hardware] Initial TPU integration (#5292 )	2024-06-12 11:53:03 -07:00
Benjamin Kitor	b3376e5c76	[Misc] Add args for selecting distributed executor to benchmarks (#5335 )	2024-06-08 09:20:16 +08:00
Marut Pandya	616e600e0b	[Misc] add gpu_memory_utilization arg (#5079 ) Signed-off-by: pandyamarut <pandyamarut@gmail.com>	2024-05-28 17:16:18 -07:00
Cyrus Leung	5ae5ed1e60	[Core] Consolidate prompt arguments to LLM engines (#4328 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-05-28 13:29:31 -07:00
Cody Yu	a3a73ab069	[Misc] Load FP8 kv-cache scaling factors from checkpoints (#4893 ) The 2nd PR for #4532. This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).	2024-05-22 13:28:20 -07:00
Simon Mo	f09edd8a25	Add JSON output support for benchmark_latency and benchmark_throughput (#4848 )	2024-05-16 10:02:56 -07:00
Cody Yu	973617ae02	[Speculative decoding][Re-take] Enable TP>1 speculative decoding (#4840 ) Co-authored-by: Cade Daniel <edacih@gmail.com> Co-authored-by: Cade Daniel <cade@anyscale.com>	2024-05-16 00:53:51 -07:00
Michael Goin	53b018edcb	[Bugfix] Get available quantization methods from quantization registry (#4098 )	2024-04-18 00:21:55 -07:00
SangBin Cho	67b4221a61	[Core][5/N] Fully working chunked prefill e2e (#3884 )	2024-04-10 17:56:48 -07:00
Zedong Peng	c013d32c75	[Benchmark] Add cpu options to bench scripts (#3915 )	2024-04-09 21:30:03 -07:00
youkaichao	e4be7d70bb	[CI/Benchmark] add more iteration and use median for robust latency benchmark (#3889 )	2024-04-06 21:32:30 +00:00
Adrian Abeyta	2ff767b513	Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 ) Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: HaiShaw <hixiao@gmail.com> Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu> Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com> Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com> Co-authored-by: guofangze <guofangze@kuaishou.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-04-03 14:15:55 -07:00
SangBin Cho	b51c1cc9d2	[2/N] Chunked prefill data update (#3538 )	2024-03-28 10:06:01 -07:00
AmadeusChan	1956931436	[Misc] add the "download-dir" option to the latency/throughput benchmarks (#3621 )	2024-03-27 13:39:05 -07:00

1 2

72 Commits