youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
ElizaWszola	9fb2d22032	[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762 ) Signed-off-by: ElizaWszola <ewszola@redhat.com>	2025-07-17 09:56:44 -04:00
Asher	5a7fb3ab9e	[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820 ) Signed-off-by: Asher Zhang <asherszhang@tencent.com>	2025-07-17 09:10:09 +00:00
Pavani Majety	7bd4c37ae7	[Core] Add Flashinfer TRTLLM Backend for Flashinfer decode path (SM100). (#19825 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: shuw <shuw@nvidia.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-11 09:23:23 +00:00
Luka Govedič	31d5c1797f	[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf (#19830 ) Signed-off-by: Luka Govedic <lgovedic@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-11 04:56:28 +00:00
Wentao Ye	e2de455c34	[Feature] Integrate SM100 DeepGEMM support (#20087 )	2025-07-10 20:18:05 -07:00
Kuntai Du	5b6fe23d05	[Bugfix][Benchmark] Make sure the output length > 0 when testing prefill workload. (#20786 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-07-10 14:52:46 -07:00
Michael Goin	0bbac1c1b4	[Bench] Add NVFP4 GEMM benchmark script (#20578 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-09 13:23:48 -04:00
Li Wang	9ff2af6d2b	[Benchmark] Parameterization of streaming loading of multimodal datasets (#20528 ) Signed-off-by: wangli <wangli858794774@gmail.com>	2025-07-09 13:35:16 +00:00
Brayden Zhong	cede942b87	[Benchmark] Add support for multiple batch size benchmark through CLI in `benchmark_moe.py` (#20516 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-07-06 09:20:11 +00:00
Jee Jee Li	1caca5a589	[Misc] Add SPDX-FileCopyrightText (#20428 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-04 07:40:42 +00:00
bnellnm	c1909e7e8c	[Kernels] MoE refactor (#19636 ) Signed-off-by: Bill Nell <bnell@redhat.com> Signed-off-by: ElizaWszola <ewszola@redhat.com> Co-authored-by: ElizaWszola <ewszola@redhat.com>	2025-07-02 06:08:27 -07:00
Kebe	b1c1fe35a5	[Misc] remove redundant char (#20287 ) Signed-off-by: Kebe <mail@kebe7jun.com>	2025-07-01 15:33:22 +08:00
czhu-cohere	9909726d2a	Enable ZP Support for Machete (#20268 ) Signed-off-by: czhu-cohere <conway.zhu@cohere.com>	2025-07-01 07:12:20 +00:00
Reid	167aca45cb	[Misc] Use collapsible blocks for benchmark examples. (#20017 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-26 03:35:16 -07:00
Ekagra Ranjan	9502c38138	[Benchmark][Bug] Fix multiple bugs in bench and add args to spec_decode offline (#20083 )	2025-06-25 22:06:27 -07:00
Wentao Ye	879f69bed3	[Refactor] Remove duplicate `ceil_div` (#20023 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-25 05:19:09 +00:00
Wentao Ye	a6c4b87fbc	Revert "[Feature] Integrate new deepgemm (#19820 )" (#20049 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-24 19:45:22 -07:00
Wentao Ye	c6e3bba8e6	[Feature] Integrate new deepgemm (#19820 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-24 12:51:56 -07:00
d.transposed	c635c5f744	[Misc][Benchmarking] Add variable request-rate ("ramp-up") to the benchmarking client. (#19423 ) Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal> Co-authored-by: Roger Wang <hey@rogerw.me>	2025-06-24 18:41:49 +00:00
Reid	3014c920da	add some examples for other benchmark scripts (#19893 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-24 05:57:46 +00:00
Chenyaaang	ee5ad8d2c5	[Misc][Tools][Benchmark] Add profile to autotune script (#19711 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-06-24 00:59:41 +00:00
22quinn	4671ac6e2a	[Bugfix][Benchmark] Fix Marlin benchmark (#19929 )	2025-06-24 07:25:12 +09:00
Wang, Yi	202c5df935	[Benchmark] fix request loss if "ping" is returned (#19535 ) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-06-22 07:21:04 +00:00
Brayden Zhong	5aa4a015ce	[Benchmark] Fix `Value of type "SampleRequest" is not indexable` (#18032 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-06-19 21:28:55 -07:00
Robert Shaw	10d82f9ac5	[Benchmark][Bugfix] Fix Dataset Length Calculation (#19868 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2025-06-19 18:30:41 -07:00
afeldman-nm	dfada85eee	[Frontend] Expose custom args in OpenAI APIs (#16862 ) Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com> Signed-off-by: Andrew Feldman <afeldman@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-06-18 17:41:11 -07:00
Wentao Ye	ffb2cd6b54	[Perf] Optimize `moe_align_block_size` CUDA kernel (#19572 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-06-17 11:49:26 -07:00
Wentao Ye	3d330c4c09	[Benchmark] Refactor benchmark script for fp8 & int8 (#19627 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-15 15:15:37 +08:00
Reid	6fa718a460	[Misc] Modularize CLI Argument Parsing in Benchmark Scripts (#19593 ) Signed-off-by: reidliu41 <reid201711@gmail.com> Co-authored-by: reidliu41 <reid201711@gmail.com>	2025-06-14 16:54:52 +08:00
Wentao Ye	b6efafd9e4	[Perf] Vectorize static / dynamic INT8 quant kernels (#19233 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-12 06:51:41 -07:00
Tianyu Guo	4589b94032	[Bugfix] Fix benchmark_moe.py (#19016 ) Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>	2025-06-09 18:04:36 -07:00
Lifans	4e4f63ad45	[Nit][Benchmark]Fix example in benchmark_serving_structured_output.py (#19311 ) Signed-off-by: Lifan Shen <lifans@meta.com>	2025-06-07 18:25:38 +08:00
ElizaWszola	84166fee97	[Kernel] Integrate CUTLASS MoE kernel with PPLX (#18762 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-06-06 18:26:11 -07:00
Chenyaaang	441b65d8c7	[Misc][Tools][Benchmark] Fix and improve auto tune script (#19163 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2025-06-06 23:31:19 +00:00
Benjamin Chislett	3465b87ef8	[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B (#19033 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-06-05 19:10:08 -07:00
Chiyue Wei	61059bee40	[Hardware][NVIDIA] FP4 MoE kernel optimization (#19110 ) Signed-off-by: Chiyue Wei <chiyuew@nvidia.com> Co-authored-by: Chiyue Wei <chiyuew@nvidia.com>	2025-06-05 09:48:26 -07:00
Huy Do	0678b52251	Handle non-serializable objects when dumping benchmark results (#19114 )	2025-06-04 22:40:04 -07:00
Ekagra Ranjan	135cf55cd1	[V1][Spec Decode][Ngram] 1.35x gain -> 1.95x gain on InstructCoder with prompt fix (#18971 )	2025-06-03 15:26:33 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Ekagra Ranjan	bbfa0c61d1	[Misc][Benchmark] Add support for CustomDataset (#18511 )	2025-05-31 19:07:38 +00:00
Michael Goin	f49239cb45	Benchmark script for fp8 vs bf16 gemm (#17126 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-30 10:56:11 -06:00
Rabi Mishra	6acb7a6285	[Misc]Fix benchmarks/README.md for speculative decoding (#18897 ) Signed-off-by: rabi <ramishra@redhat.com>	2025-05-30 07:58:04 +00:00
Cyrus Leung	1aa2f81b43	[Misc] Update type annotation for rotary embedding `base` (#18914 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-30 10:17:01 +08:00
Duyi-Wang	b169d5f7b6	[Misc][Tools][Benchmark] Add benchmark_serving supports for llama.cpp. (#18692 ) Signed-off-by: Duyi-Wang <duyi.wang@intel.com>	2025-05-29 20:02:08 +08:00
Divakar Verma	774c5fde30	[V1] fix torch profiling for V1 offline scenarios (#18445 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-05-28 04:16:30 +00:00
cascade	aaa4ac1c95	Disable prefix cache by default for benchmark (#18639 ) Signed-off-by: cascade812 <cascade812@outlook.com>	2025-05-27 20:06:34 +08:00
Calvin Chen	4693a3438c	[Doc] cleanup deprecated flag for doc (#18715 ) Signed-off-by: calvin chen <120380290@qq.com>	2025-05-27 07:12:02 +00:00
Cyrus Leung	82e2339b06	[Doc] Move examples and further reorganize user guide (#18666 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-26 07:38:04 -07:00
Feng XiaoLong	4fc1bf813a	[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454 ) Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com> Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>	2025-05-23 16:16:26 -07:00
Teruaki Ishizaki	4be2255c81	[Bugfix][Benchmarks] Fix a benchmark of deepspeed-mii backend to use api_key (#17291 ) Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com>	2025-05-23 12:30:47 +08:00

1 2 3 4 5 ...

369 Commits