youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Aaron Pham	21063c11c7	[CI/Build] drop support for Python 3.8 EOL (#8464 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2024-11-06 07:11:55 +00:00
lkchen	d2e80332a7	[Feature] Update benchmark_throughput.py to support image input (#9851 ) Signed-off-by: Linkun Chen <github+anyscale@lkchen.net> Co-authored-by: Linkun Chen <github+anyscale@lkchen.net>	2024-11-05 19:30:02 +00:00
lkchen	9a5664d4a4	[Misc] Refactor benchmark_throughput.py (#9779 ) Signed-off-by: Linkun Chen <github+anyscale@lkchen.net> Co-authored-by: Linkun Chen <lkchen@github.com> Co-authored-by: Linkun Chen <github+anyscale@lkchen.net>	2024-11-04 14:32:16 -08:00
Tran Quang Dai	ea4adeddc1	[Bugfix] Fix E2EL mean and median stats (#9984 ) Signed-off-by: daitran2k1 <tranquangdai7a@gmail.com>	2024-11-04 09:37:58 +00:00
Guillaume Calmettes	abbfb6134d	[Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837 )	2024-10-30 18:15:56 -07:00
wangshuai09	622b7ab955	[Hardware] using current_platform.seed_everything (#9785 ) Signed-off-by: wangshuai09 <391746016@qq.com>	2024-10-29 14:47:44 +00:00
youkaichao	32176fee73	[torch.compile] support moe models (#9632 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-10-27 21:58:04 -07:00
Michael Goin	fd0e2cfdb2	[Misc] Separate total and output tokens in benchmark_throughput.py (#8914 )	2024-10-23 16:47:20 +00:00
Chen Zhang	65050a40e6	[Bugfix] Generate exactly input_len tokens in benchmark_throughput (#9592 )	2024-10-22 17:45:35 -07:00
Jeremy Arnold	cb6fdaa0a0	[Misc] Make benchmarks use EngineArgs (#9529 )	2024-10-22 15:40:38 -07:00
Andy Dai	855e0e6f97	[Frontend][Misc] Goodput metric support (#9338 )	2024-10-20 18:39:32 +00:00
Russell Bryant	7dbe738d65	[Misc] benchmark: Add option to set max concurrency (#9390 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-10-18 11:15:28 -07:00
Kai Wu	d65049daab	[Bugfix] Add random_seed to sample_hf_requests in benchmark_serving script (#9013 ) Co-authored-by: Isotr0py <2037008807@qq.com>	2024-10-17 21:11:11 +00:00
Kuntai Du	81ede99ca4	[Core] Deprecating block manager v1 and make block manager v2 default (#8704 ) Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).	2024-10-17 11:38:15 -05:00
Cyrus Leung	7e7eae338d	[Misc] Standardize RoPE handling for Qwen2-VL (#9250 )	2024-10-16 13:56:17 +08:00
Grace Ho	5d264f4ab8	pass ignore_eos parameter to all benchmark_serving calls (#9349 )	2024-10-15 13:30:44 -07:00
Andy Dai	94bf9ae4e9	[Misc] Fix sampling from sonnet for long context case (#9235 )	2024-10-11 00:33:16 +00:00
sroy745	f3a507f1d3	[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149 )	2024-10-10 14:17:17 +08:00
youkaichao	18b296fdb2	[core] remove beam search from the core (#9105 )	2024-10-07 05:47:04 +00:00
Brendan Wong	168cab6bbf	[Frontend] API support for beam search (#9087 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-10-05 23:39:03 -07:00
Cody Yu	27302dd584	[Misc] Fix CI lint (#9085 )	2024-10-04 16:07:54 -07:00
Andy Dai	0cc566ca8f	[Misc] Add random seed for prefix cache benchmark (#9081 )	2024-10-04 21:58:57 +00:00
Kuntai Du	fbb74420e7	[CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang (#7412 )	2024-10-04 14:01:44 -07:00
vlsav	22f5851b80	Update benchmark_serving.py to read and write json-datasets, results in UTF8, for better compatibility with Windows (#8997 )	2024-10-01 11:07:06 -07:00
Chen Zhang	e585b583a9	[Bugfix] Support testing prefill throughput with benchmark_serving.py --hf-output-len 1 (#8891 )	2024-09-28 18:51:22 +00:00
Peter Pan	0e088750af	[MISC] Fix invalid escape sequence '\' (#8830 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>	2024-09-27 01:13:25 -07:00
Cyrus Leung	3b00b9c26c	[Core] rename`PromptInputs` and `inputs` (#8876 )	2024-09-26 20:35:15 -07:00
Simon Mo	4f1ba0844b	Revert "rename PromptInputs and inputs with backward compatibility (#8760 ) (#8810 )	2024-09-25 10:36:26 -07:00
Cyrus Leung	28e1299e60	rename PromptInputs and inputs with backward compatibility (#8760 )	2024-09-25 09:36:47 -07:00
Archit Patke	6da1ab6b41	[Core] Adding Priority Scheduling (#5958 )	2024-09-24 19:50:50 -07:00
Simon Mo	3185fb0cca	Revert "[Core] Rename `PromptInputs` to `PromptType`, and `inputs` to `prompt`" (#8750 )	2024-09-24 05:45:20 +00:00
youkaichao	0250dd68c5	re-implement beam search on top of vllm core (#8726 ) Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>	2024-09-23 22:08:12 -07:00
Lucas Wilkinson	86e9c8df29	[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 ) Co-authored-by: mgoin <michael@neuralmagic.com> Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-09-23 13:46:26 -04:00
Cyrus Leung	0057894ef7	[Core] Rename `PromptInputs` and `inputs`(#8673 )	2024-09-20 19:00:54 -07:00
Kunshang Ji	855c8ae2c9	[MISC] remove engine_use_ray in benchmark_throughput.py (#8615 )	2024-09-18 22:33:20 -07:00
Kuntai Du	c52ec5f034	[Bugfix] fixing sonnet benchmark bug in benchmark_serving.py (#8616 )	2024-09-19 05:24:24 +00:00
Aaron Pham	9d104b5beb	[CI/Build] Update Ruff version (#8469 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-09-18 11:00:56 +00:00
Cyrus Leung	6ffa3f314c	[CI/Build] Avoid CUDA initialization (#8534 )	2024-09-18 10:38:11 +00:00
Isotr0py	1b6de8352b	[Benchmark] Support sample from HF datasets and image input for benchmark_serving (#8495 )	2024-09-17 07:34:27 +00:00
Aarni Koskela	8baa454937	[Misc] Move device options to a single place (#8322 )	2024-09-11 13:25:58 -07:00
Wei-Sheng Chin	795b662cff	Enable Random Prefix Caching in Serving Profiling Tool (benchmark_serving.py) (#8241 )	2024-09-06 20:18:16 -07:00
afeldman-nm	e5cab71531	[Frontend] Add --logprobs argument to `benchmark_serving.py` (#8191 )	2024-09-06 09:01:14 -07:00
Cody Yu	77d9e514a2	[MISC] Replace input token throughput with total token throughput (#8164 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-04 20:23:22 +00:00
Nick Hill	d4db9f53c8	[Benchmark] Add `--async-engine` option to benchmark_throughput.py (#7964 )	2024-09-03 20:57:41 -04:00
Wei-Sheng Chin	0c785d344d	Add more percentiles and latencies (#7759 )	2024-08-29 16:48:11 -07:00
Philipp Schmid	345be0e244	[benchmark] Update TGI version (#7917 )	2024-08-27 15:07:53 -07:00
Megha Agarwal	2eedede875	[Core] Asynchronous Output Processor (#7049 ) Co-authored-by: Alexander Matveev <alexm@neuralmagic.com>	2024-08-26 20:53:20 -07:00
Alexander Matveev	9db93de20c	[Core] Add multi-step support to LLMEngine (#7789 )	2024-08-23 12:45:53 -07:00
Jiaxin Shan	d3b5b98021	[Misc] Enhance prefix-caching benchmark tool (#6568 )	2024-08-22 09:32:02 -07:00
Luka Govedič	7937009a7e	[Kernel] Replaced `blockReduce[...]` functions with `cub::BlockReduce` (#7233 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-21 20:18:00 -04:00

1 2 3 4

184 Commits