youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Cyrus Leung	6c117cff7d	[Frontend] Pass API server count to each process (#23717 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-20 01:15:19 +08:00
Aaron Pham	29283e8976	[Chore] Cleanup guided namespace, move to structured outputs config (#22772 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-18 09:20:27 +00:00
bnellnm	5963b98b46	[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-09-17 17:43:31 -06:00
Karan Goel	2a4d6412e6	Add a batched auto tune script (#25076 ) Signed-off-by: Karan Goel <karangoel@google.com> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-17 22:41:18 +00:00
dolpm	1b962e2457	[fix] lora benchmarks pass no_lora_flag_cpu (#23774 ) Signed-off-by: Dylan Maloy <34420038+dolpm@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-17 21:22:25 +08:00
Daniel Serebrenik	43a62c51be	Add more documentation and improve usability of lognormal dist (benchmark_serving_multi_turn) (#23255 ) Signed-off-by: daniels <daniels@pliops.com>	2025-09-17 05:53:17 +00:00
Isotr0py	5a411ef6c4	[Benchmarks] Add MMVU video dataset support and clean up deprecated datasets (#24719 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-17 03:29:43 +00:00
Tahsin Tunan	cef32104b4	[FP8] Extend per-token-group quantization support to QuantFP8 (#24342 ) Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com>	2025-09-16 18:31:06 -07:00
Ye (Charlotte) Qi	85e0df1392	[Docs] move benchmarks README to contributing guides (#24820 )	2025-09-16 05:52:57 -07:00
Jee Jee Li	04ad0dc275	[benchmark] Add triton version in the moe tuned config (#24769 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-16 14:10:54 +08:00
Elvir Crnčević	98229db244	[Kernels][DP/EP] Optimize Silu Kernel for R1 (#24054 ) Signed-off-by: elvircrn <elvircrn@gmail.com>	2025-09-13 00:17:27 -07:00
Didier Durand	bcb06d7baf	[Doc]: fix typos in various files (#24726 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-12 06:43:12 -07:00
Michael Goin	c3aea10dc8	[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-11 15:43:14 -07:00
Ilya Markov	1fdd5c42d7	[Kernels] Enable Torch Symmetric Memory All-Reduce By Default (#24111 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-09-11 09:45:31 -07:00
Jee Jee Li	d11ec124a0	[Bench] Add qwen-next in benchmark_moe.py (#24661 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-11 21:29:43 +08:00
TaehyunKim	9bd831f501	[Model] New model support for Motif-1-Tiny (#23414 ) Signed-off-by: ca1207 <ca1207zzz@gmail.com> Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com> Co-authored-by: WyldeCat <skan1543@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-10 23:29:40 -07:00
Ekagra Ranjan	0dc9cbb527	[Benchmark] Update bench doc with mtbench, blazedit, spec bench (#24450 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>	2025-09-09 21:15:41 +00:00
Ye (Charlotte) Qi	6fb2788163	[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-09-09 10:02:35 +00:00
elvischenv	bba1042c6f	[Flashinfer] Support Flashinfer TRTLLM FP8-qkv BF16/FP16-out Attention Kernel (#23647 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-09-08 20:53:07 -07:00
Jee Jee Li	62f66be1f7	[Bugfix] Fix Qwen3-coder moe tuned config (#24072 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-07 05:19:46 +00:00
Jiangyun Zhu	77aec83b8c	[Benchmark] add benchmark for custom activation op (#23908 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-06 20:12:05 -07:00
Didier Durand	83609ca91d	[Doc]: fix typos in Python comments (#24173 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-04 08:52:17 -07:00
anthonsu	04f3c35cff	Improve flexibility of auto_tune.sh execution. (#23766 ) Signed-off-by: Anthony Su <50185138+anthonsu@users.noreply.github.com> Signed-off-by: anthonsu <50185138+anthonsu@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-04 09:41:41 +00:00
Weida Hong	12e1e63cc5	[Misc] Enhance output readability of helper script (#24214 ) Signed-off-by: Weida Hong <wdhongtw@google.com>	2025-09-04 06:38:26 +00:00
Peter Pan	b5ee1e3261	Remove deprecated `PyNcclConnector` (#24151 ) Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>	2025-09-03 22:49:16 +00:00
Didier Durand	02d411fdb2	[Doc]: fix typos in Python comments (#24115 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-02 21:14:07 -07:00
co63oc	1bd007f234	fix some typos (#24071 ) Signed-off-by: co63oc <co63oc@users.noreply.github.com>	2025-09-02 20:44:50 -07:00
Jiangyun Zhu	c83c4ff815	[Benchmark] Add support for local hf dataset path in benchmark (#23999 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-09-02 17:49:16 +00:00
Michael Goin	b7adf94c4a	Tuned H100/H200 triton fp8 block configs for fused_qkv_a_proj (#23939 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-29 10:28:35 -07:00
YUQI.CHENG	66548f6603	[Bugfix] Fix benchmark_moe.py for blockwise fp8. (#23823 ) Signed-off-by: crischeng <420985011@qq.com> Co-authored-by: cris <grace@guisenbindeMacBook-Pro.local>	2025-08-28 21:44:09 +08:00
Michael Goin	a781e84ec2	[Perf] Tune configs for triton block fp8 gemm H100/H200 (#23748 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-28 11:12:53 +08:00
Chen Zhang	142ac08030	[Frontend] Optimize beam search performance by limiting concurrency (#23599 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-27 04:59:14 +00:00
Didier Durand	7c04779afa	[Doc]: fix various spelling issues in multiple files (#23636 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-08-26 14:05:29 +00:00
Breno Baldas Skuk	0cb7b065c3	Feature/benchmark/random mm data/images (#23119 ) Signed-off-by: breno.skuk <breno.skuk@hcompany.ai>	2025-08-25 01:28:35 -07:00
Ming Yang	504d914314	[Perf] Add Triton config for DeepSeek V3 FP8 EP32 H200 (#23504 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-08-24 18:06:35 -07:00
czhu-cohere	e76e233540	[kernel] Support W4A8 on Hopper (#23198 ) Signed-off-by: czhu-cohere <conway.zhu@cohere.com>	2025-08-24 06:18:04 +00:00
elvischenv	24d0c9e6ed	[NVIDIA][torch.compile] Support Flashinfer TRTLLM FP8-q/kv NVFP4-out Attention Kernel (#22703 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-08-22 22:09:05 +00:00
Michael Goin	3bbe11cc13	[Perf] Small optimizations for silu_mul_fp8_quant_deep_gemm (#23265 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-21 17:56:15 -04:00
Pavani Majety	1d353b6352	[Core] Always use tensor cores for Flashinfer Decode Wrapper (#23214 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-08-21 16:02:11 -04:00
Cyrus Leung	0c31e28e95	[Bugfix] Fix extra whitespace in strings caused by newline (#23272 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-20 22:03:00 -07:00
Michael Goin	0cdbf5e61c	[Kernel/Quant] Remove the original marlin format and qqq (#23204 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-20 15:13:36 -04:00
shixianc	b17109beea	[Kernel] CUTLASS MoE FP8: Integrate cuda moe permute/unpermute (#23045 ) Signed-off-by: Shixian Cui <shixian@amazon.com>	2025-08-20 10:35:26 -04:00
Zhewen Li	f729023272	[CI/Build] Also check DP in benchmarks throughput script (#23038 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-08-20 04:09:27 +00:00
Chenheli Hua	1630cc8d0f	[Benchmarks] Add video inputs to ShareGPTDataset. (#23199 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-08-19 23:42:31 +00:00
Ruixiang Tan	03d4235fd2	[Misc] Fix the benchmark's README and improve the error messages for the benchmark's argument checks (#22654 ) Signed-off-by: tanruixiang <tanruixiang0104@gmail.com>	2025-08-19 10:18:51 -07:00
Jee Jee Li	4d9c61993a	[Bugfix] Fix benchmark_moe.py (#23177 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-19 13:39:40 +00:00
elvischenv	03752dba8f	[NVIDIA] Support Flashinfer TRTLLM FP8-q/kv/out Attention Kernel (#21716 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-08-19 08:22:15 -04:00
hustxiayang	31436e8b4f	[Misc] Add request_id into benchmark_serve.py (#23065 ) Signed-off-by: yangxia <yangxiast@gmail.com>	2025-08-19 08:32:18 +00:00
Daniel Serebrenik	3c8a787247	[Benchmark] Add flag --served-model-name to benchmark_serving_multi_turn (#22889 ) Signed-off-by: daniels <daniels@pliops.com>	2025-08-19 07:48:07 +00:00
Michael Goin	4fc722eca4	[Kernel/Quant] Remove AQLM (#22943 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-08-16 19:38:21 +00:00

1 2 3 4 5 ...

456 Commits