|
|
0271c2ff2f
|
[Test] Add Benchmark and Unit Test for per_token_group_quant (#21860)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-30 07:15:02 -07:00 |
|
|
|
533db0935d
|
[benchmark] add max-concurrency in result table (#21095)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
|
2025-07-30 01:15:43 -07:00 |
|
|
|
ba5c5e5404
|
[Docs] Switch to better markdown linting pre-commit hook (#21851)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-29 19:45:08 -07:00 |
|
|
|
58b11b24a6
|
[Bugfix] Fix workspace buffer None issue for Flashinfer TRTLLM Backend (#21525)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-07-29 10:34:00 -04:00 |
|
|
|
18cc33dd60
|
[bugfix] fix profile impact benchmark results (#21507)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-07-27 22:44:24 -07:00 |
|
|
|
57c22e57f9
|
Fix CUDA permute/unpermute for use with DeepGemm Moe (#17934)
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>
|
2025-07-27 07:08:00 -07:00 |
|
|
|
01a395e9e7
|
[CI/Build][Doc] Clean up more docs that point to old bench scripts (#21667)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-07-27 04:02:12 +00:00 |
|
|
|
e7c4f9ee86
|
[CI/Build][Doc] Move existing benchmark scripts in CI/document/example to vllm bench CLI (#21355)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-07-26 07:10:14 -07:00 |
|
|
|
56e544f24b
|
[Refactor] Remove moe_align_block_size_triton (#21335)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-07-26 07:08:29 -07:00 |
|
|
|
e98def439c
|
[Take 2] Correctly kill vLLM processes after benchmarks (#21646)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-07-26 06:06:05 -07:00 |
|
|
|
947edd099e
|
[Misc][Tools] make max-model-len a parameter in auto_tune script (#21321)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-07-24 22:46:43 -07:00 |
|
|
|
34ddcf9ff4
|
[Frontend] run-batch supports V1 (#21541)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-24 20:05:55 -07:00 |
|
|
|
4f76a05f4f
|
[BugFix] Update python to python3 calls for image; fix prefix & input calculations. (#21391)
Signed-off-by: Eric Hanley <ericehanley@google.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-07-22 20:33:00 -07:00 |
|
|
|
10904e6d75
|
[benchmark] Port benchmark request sent optimization to benchmark_serving (#21209)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-07-22 05:28:00 -07:00 |
|
|
|
e7b2042681
|
Revert "[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762) (#21334)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-07-21 21:49:01 -07:00 |
|
|
|
0ec82edda5
|
[perf] Speed up align sum kernels (#21079)
Signed-off-by: Himanshu Jaju <hj@mistral.ai>
|
2025-07-21 11:19:23 -07:00 |
|
|
|
10eb24cc91
|
GLM-4 Update (#20736)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Lu Fang <fanglu@fb.com>
|
2025-07-19 22:40:31 +00:00 |
|
|
|
3a2cb2649d
|
[Misc][Tools][Benchmark] Add readme file for auto_tune script (#20779)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-07-19 09:06:59 +00:00 |
|
|
|
0f199f197b
|
[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue (#21005)
Signed-off-by: Jialin Ouyang <jialino@meta.com>
|
2025-07-18 12:34:40 -07:00 |
|
|
|
9fb2d22032
|
[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
|
2025-07-17 09:56:44 -04:00 |
|
|
|
5a7fb3ab9e
|
[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820)
Signed-off-by: Asher Zhang <asherszhang@tencent.com>
|
2025-07-17 09:10:09 +00:00 |
|
|
|
7bd4c37ae7
|
[Core] Add Flashinfer TRTLLM Backend for Flashinfer decode path (SM100). (#19825)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: shuw <shuw@nvidia.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 09:23:23 +00:00 |
|
|
|
31d5c1797f
|
[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf (#19830)
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 04:56:28 +00:00 |
|
|
|
e2de455c34
|
[Feature] Integrate SM100 DeepGEMM support (#20087)
|
2025-07-10 20:18:05 -07:00 |
|
|
|
5b6fe23d05
|
[Bugfix][Benchmark] Make sure the output length > 0 when testing prefill workload. (#20786)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-07-10 14:52:46 -07:00 |
|
|
|
0bbac1c1b4
|
[Bench] Add NVFP4 GEMM benchmark script (#20578)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-09 13:23:48 -04:00 |
|
|
|
9ff2af6d2b
|
[Benchmark] Parameterization of streaming loading of multimodal datasets (#20528)
Signed-off-by: wangli <wangli858794774@gmail.com>
|
2025-07-09 13:35:16 +00:00 |
|
|
|
cede942b87
|
[Benchmark] Add support for multiple batch size benchmark through CLI in benchmark_moe.py (#20516)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-07-06 09:20:11 +00:00 |
|
|
|
1caca5a589
|
[Misc] Add SPDX-FileCopyrightText (#20428)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-04 07:40:42 +00:00 |
|
|
|
c1909e7e8c
|
[Kernels] MoE refactor (#19636)
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: ElizaWszola <ewszola@redhat.com>
|
2025-07-02 06:08:27 -07:00 |
|
|
|
b1c1fe35a5
|
[Misc] remove redundant char (#20287)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2025-07-01 15:33:22 +08:00 |
|
|
|
9909726d2a
|
Enable ZP Support for Machete (#20268)
Signed-off-by: czhu-cohere <conway.zhu@cohere.com>
|
2025-07-01 07:12:20 +00:00 |
|
|
|
167aca45cb
|
[Misc] Use collapsible blocks for benchmark examples. (#20017)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-26 03:35:16 -07:00 |
|
|
|
9502c38138
|
[Benchmark][Bug] Fix multiple bugs in bench and add args to spec_decode offline (#20083)
|
2025-06-25 22:06:27 -07:00 |
|
|
|
879f69bed3
|
[Refactor] Remove duplicate ceil_div (#20023)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-25 05:19:09 +00:00 |
|
|
|
a6c4b87fbc
|
Revert "[Feature] Integrate new deepgemm (#19820)" (#20049)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-24 19:45:22 -07:00 |
|
|
|
c6e3bba8e6
|
[Feature] Integrate new deepgemm (#19820)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-24 12:51:56 -07:00 |
|
|
|
c635c5f744
|
[Misc][Benchmarking] Add variable request-rate ("ramp-up") to the benchmarking client. (#19423)
Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-06-24 18:41:49 +00:00 |
|
|
|
3014c920da
|
add some examples for other benchmark scripts (#19893)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-24 05:57:46 +00:00 |
|
|
|
ee5ad8d2c5
|
[Misc][Tools][Benchmark] Add profile to autotune script (#19711)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-06-24 00:59:41 +00:00 |
|
|
|
4671ac6e2a
|
[Bugfix][Benchmark] Fix Marlin benchmark (#19929)
|
2025-06-24 07:25:12 +09:00 |
|
|
|
202c5df935
|
[Benchmark] fix request loss if "ping" is returned (#19535)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-06-22 07:21:04 +00:00 |
|
|
|
5aa4a015ce
|
[Benchmark] Fix Value of type "SampleRequest" is not indexable (#18032)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-06-19 21:28:55 -07:00 |
|
|
|
10d82f9ac5
|
[Benchmark][Bugfix] Fix Dataset Length Calculation (#19868)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-06-19 18:30:41 -07:00 |
|
|
|
dfada85eee
|
[Frontend] Expose custom args in OpenAI APIs (#16862)
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com>
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-06-18 17:41:11 -07:00 |
|
|
|
ffb2cd6b54
|
[Perf] Optimize moe_align_block_size CUDA kernel (#19572)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-06-17 11:49:26 -07:00 |
|
|
|
3d330c4c09
|
[Benchmark] Refactor benchmark script for fp8 & int8 (#19627)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-15 15:15:37 +08:00 |
|
|
|
6fa718a460
|
[Misc] Modularize CLI Argument Parsing in Benchmark Scripts (#19593)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-06-14 16:54:52 +08:00 |
|
|
|
b6efafd9e4
|
[Perf] Vectorize static / dynamic INT8 quant kernels (#19233)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-06-12 06:51:41 -07:00 |
|
|
|
4589b94032
|
[Bugfix] Fix benchmark_moe.py (#19016)
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>
|
2025-06-09 18:04:36 -07:00 |
|