|
|
d544d141ec
|
update benchmark_serving_structured_output to include auto backend (#16438)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-04-11 12:25:52 +08:00 |
|
|
|
3e397a9484
|
check input length of sonnet samples (#16423)
Signed-off-by: alexey-belyakov <alexey.belyakov@intel.com>
|
2025-04-11 10:15:06 +08:00 |
|
|
|
268c325078
|
Fix range_ratio Bug in RandomDataset (#16126)
Signed-off-by: jadewang21 <jadewangcn@outlook.com>
|
2025-04-10 15:31:17 -07:00 |
|
|
|
7cd0bd7212
|
[Bugfix] Fix output token length check logic (#16419)
Signed-off-by: look <eeslook@163.com>
|
2025-04-10 20:16:48 +00:00 |
|
|
|
5fbab20e02
|
[Bugfix] Fix bug when dataset is json (#15899)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-04-10 18:35:41 +00:00 |
|
|
|
417bcefbae
|
fix sonnet dataset sample when prefix len is very small (#16379)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2025-04-10 05:35:07 +00:00 |
|
|
|
b2ce859bd2
|
Fix benchmark_throughput.py --backend=hf (#16352)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-09 19:09:28 +00:00 |
|
|
|
04149cce27
|
[BugFix] fix some typos found by typos. (#16314)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
|
2025-04-09 03:43:59 -07:00 |
|
|
|
55dcce91df
|
Upstream Llama4 Support to Main (#16113)
Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com>
Signed-off-by: Chris Thi <chris.c.thi@gmail.com>
Signed-off-by: drisspg <drisspguessous@gmail.com>
Signed-off-by: Jon Swenson <jmswen@gmail.com>
Signed-off-by: Keyun Tong <tongkeyun@gmail.com>
Signed-off-by: Lu Fang <fanglu@meta.com>
Signed-off-by: Xiaodong Wang <xdwang@meta.com>
Signed-off-by: Yang Chen <yangche@fb.com>
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Signed-off-by: Lu Fang <lufang@fb.com>
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-07 08:06:27 -07:00 |
|
|
|
ba10801961
|
[Benchmark] Add sampling parameters to benchmark_serving. (#16022)
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
|
2025-04-06 12:30:35 +08:00 |
|
|
|
95862f7b4d
|
[Benchmark][Doc] Update throughput benchmark and README (#15998)
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-04-04 09:39:02 -07:00 |
|
|
|
06f21ce7a5
|
[Benchmark] Add AIMO Dataset to Benchmark (#15955)
Signed-off-by: Ziji Shi <shi.ziji.sm@gmail.com>
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>
|
2025-04-03 06:09:18 +00:00 |
|
|
|
252937806c
|
[Bugfix][Benchmarks] Ensure async_request_deepspeed_mii uses the OpenAI choices key (#15926)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-04-02 02:19:35 -07:00 |
|
|
|
aa557e6422
|
[Benchmark]Fix error message (#15866)
Signed-off-by: wangli <wangli858794774@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2025-04-02 01:32:24 -07:00 |
|
|
|
e59ca942f5
|
Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. (#13932)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2025-04-01 12:07:43 -04:00 |
|
|
|
effc5d24fa
|
[Benchmark] Update Vision Arena Dataset and HuggingFaceDataset Setup (#15748)
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>
|
2025-03-31 15:38:58 +08:00 |
|
|
|
70e132244a
|
[Minor] Remove TGI launching script (#15646)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-28 09:30:08 -07:00 |
|
|
|
e7f720ea56
|
[Misc]add coding benchmark for speculative decoding (#15303)
Signed-off-by: CXIAAAAA <cxia0209@gmail.com>
|
2025-03-28 10:47:05 +08:00 |
|
|
|
9239bf718e
|
[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972)
Signed-off-by: ElizaWszola <eliza@neuralmagic.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com>
|
2025-03-27 00:54:44 +00:00 |
|
|
|
23114d3364
|
[Misc] Warn about v0 in benchmark_paged_attn.py (#15495)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-03-25 20:31:04 -07:00 |
|
|
|
f90d34b498
|
[Misc] Add tuned R1 w8a8 and MoE configs for NVIDIA L20 (#15322)
Signed-off-by: DefTruth <qiustudent_r@163.com>
|
2025-03-23 01:10:10 -07:00 |
|
|
|
1f16b7fe74
|
[Core][V0] Add guidance backend for structured output (#14589)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Loc Huynh <lohuynh@microsoft.com>
Co-authored-by: Michal Moskal <michal@moskal.me>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-19 21:33:51 -07:00 |
|
|
|
b88be22165
|
[Benchmark] Allow oversample request in benchmark dataset (#15170)
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>
|
2025-03-20 12:32:58 +08:00 |
|
|
|
40828ce5fe
|
fix "Total generated tokens:" is 0 if using --backend tgi and --endpo… (#14673)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
|
2025-03-19 20:56:16 -07:00 |
|
|
|
6c5a3195db
|
[Misc][Benchmark] Add support for different tokenizer_mode (#15040)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-19 14:56:50 +00:00 |
|
|
|
400d483e87
|
[Kernels] LoRA - Retire SGMV and BGMV Kernels (#14685)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-18 09:47:53 +00:00 |
|
|
|
583a9778e0
|
[Benchmark] Do not save detailed info to json by default (#14879)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-03-16 21:48:11 -07:00 |
|
|
|
3453b964a3
|
[Misc][Doc] Minor benchmark README update (#14874)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-16 09:46:17 +08:00 |
|
|
|
09269b3127
|
[BugFix]Fix performance serving benchmark when enable profiling (#14737)
Signed-off-by: wangli <wangli858794774@gmail.com>
|
2025-03-14 07:02:05 +00:00 |
|
|
|
a6e0d096dd
|
[Feature] Add visionarena offline support for benchmark_throughput (#14654)
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Jennifer Zhao <JenZhao@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2025-03-14 04:07:54 +00:00 |
|
|
|
a73122de96
|
[Bugfix] fix benchmark moe (#14653)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-13 16:12:42 +08:00 |
|
|
|
4a42b9f5d6
|
[Doc] Update benchmarks README (#14646)
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2025-03-11 19:23:04 -07:00 |
|
|
|
a1c8f3796c
|
dynamic distpatch of fp8 kernels (#14245)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
|
2025-03-11 10:54:56 -04:00 |
|
|
|
08a1a1121d
|
benchmarks: simplify test jsonschema (#14567)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-11 13:39:30 +00:00 |
|
|
|
432d6dad15
|
Fix typo in benchmark_serving_structured_output.py (#14566)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-10 14:58:58 -07:00 |
|
|
|
5ff0d32580
|
[V1] LoRA - Add triton kernels for V1 (#13096)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-10 17:27:53 -04:00 |
|
|
|
3b352a2f92
|
Correct capitalisation: VLLM -> vLLM (#14562)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-10 16:36:21 +00:00 |
|
|
|
1253b15774
|
[Feature] Consolidate performance benchmark datasets (#14036)
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-10 07:23:11 +00:00 |
|
|
|
9085aabd62
|
[benchmarks] Add option to use unique jsonschema for each request (#14457)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-08 06:36:39 -08:00 |
|
|
|
58abe35455
|
[Benchmarks] Make detokenization optional in benchmark scripts (#11697)
Signed-off-by: Jeremy Arnold <Jeremy.Arnold@amd.com>
|
2025-03-07 08:09:00 -08:00 |
|
|
|
80e9afb5bc
|
[V1][Core] Support for Structured Outputs (#12388)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-07 07:19:11 -08:00 |
|
|
|
0ca3b8e01c
|
[BUGFIX] Skip tokenization support for throughput benchmark (#12712)
Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu>
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2025-03-07 02:51:47 -08:00 |
|
|
|
c34eeec58d
|
[Bugfix] Correctly call cudaProfilerStop in benchmarks script (#14183)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-03-07 00:42:49 +00:00 |
|
|
|
ad60bbb2b2
|
[Doc] Fix a typo (#14385)
|
2025-03-06 16:31:52 -08:00 |
|
|
|
ca100c90fe
|
Add benchmark for DeepGEMM and vLLM Block FP8 Dense GEMM (#13917)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-03-05 17:08:51 -08:00 |
|
|
|
a4f1ee35d6
|
Deprecate best_of Sampling Parameter in anticipation for vLLM V1 (#13997)
Signed-off-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com>
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-05 20:22:43 +00:00 |
|
|
|
7bab4bb048
|
[Misc] Add Qwen2MoeForCausalLM moe tuning support (#14276)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-05 23:11:29 +08:00 |
|
|
|
f78c0be80a
|
Fix benchmark_moe.py tuning for CUDA devices (#14164)
|
2025-03-03 21:11:03 -08:00 |
|
|
|
bb5b640359
|
[core] moe fp8 block quant tuning support (#14068)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-03-04 01:30:23 +00:00 |
|
|
|
848a6438ae
|
[ROCm] Faster Custom Paged Attention kernels (#12348)
|
2025-03-03 09:24:45 -08:00 |
|