youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jee Jee Li	67f3fb0844	[Bench] Add DeepSeekV32 to MoE benchmark (#25962 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-30 14:13:48 -07:00
bnellnm	5963b98b46	[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-09-17 17:43:31 -06:00
Jee Jee Li	04ad0dc275	[benchmark] Add triton version in the moe tuned config (#24769 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-16 14:10:54 +08:00
Jee Jee Li	d11ec124a0	[Bench] Add qwen-next in benchmark_moe.py (#24661 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-11 21:29:43 +08:00
Jee Jee Li	62f66be1f7	[Bugfix] Fix Qwen3-coder moe tuned config (#24072 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-07 05:19:46 +00:00
YUQI.CHENG	66548f6603	[Bugfix] Fix benchmark_moe.py for blockwise fp8. (#23823 ) Signed-off-by: crischeng <420985011@qq.com> Co-authored-by: cris <grace@guisenbindeMacBook-Pro.local>	2025-08-28 21:44:09 +08:00
Jee Jee Li	4d9c61993a	[Bugfix] Fix benchmark_moe.py (#23177 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-19 13:39:40 +00:00
Jee Jee Li	6d3da472bc	[Misc] Add --save-dir option to benchmark_moe (#23020 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-16 07:26:10 +00:00
Jee Jee Li	384a052971	[Misc] benchmark_moe supports expert parallel (#22251 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-11 00:13:27 -07:00
Jee Jee Li	8d705996df	[Misc] Minor enhancement of benchmark_moe (#22068 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-02 01:35:30 +08:00
Yuxuan Zhang	10eb24cc91	GLM-4 Update (#20736 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Lu Fang <fanglu@fb.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Lu Fang <fanglu@fb.com>	2025-07-19 22:40:31 +00:00
Asher	5a7fb3ab9e	[Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820 ) Signed-off-by: Asher Zhang <asherszhang@tencent.com>	2025-07-17 09:10:09 +00:00
Wentao Ye	e2de455c34	[Feature] Integrate SM100 DeepGEMM support (#20087 )	2025-07-10 20:18:05 -07:00
Brayden Zhong	cede942b87	[Benchmark] Add support for multiple batch size benchmark through CLI in `benchmark_moe.py` (#20516 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-07-06 09:20:11 +00:00
Wentao Ye	a6c4b87fbc	Revert "[Feature] Integrate new deepgemm (#19820 )" (#20049 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-24 19:45:22 -07:00
Wentao Ye	c6e3bba8e6	[Feature] Integrate new deepgemm (#19820 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-24 12:51:56 -07:00
Tianyu Guo	4589b94032	[Bugfix] Fix benchmark_moe.py (#19016 ) Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>	2025-06-09 18:04:36 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Harry Mellor	009d9e7590	Convert `benchmarks` to `ruff format` (#18068 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 13:43:29 +00:00
xsank	0a9bbaa104	[Misc] support model prefix & add deepseek vl2 tiny fused moe config (#17763 ) Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com> Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com>	2025-05-08 07:50:22 +00:00
Mengqing Cao	f9bc5a0693	[Bugfix] Fix triton import with local TritonPlaceholder (#17446 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-05-06 17:53:09 +08:00
Xiaodong Wang	9352cdb56d	[Hardware][AMD] Improve OAM device ID + llama4 Maverick MOE tuning (#16263 ) Signed-off-by: Lu Fang <lufang@fb.com> Co-authored-by: Lu Fang <lufang@fb.com>	2025-05-02 19:44:19 +00:00
Caleb_Du	3e887d2e0c	permute/unpermute kernel for moe optimization (#14568 ) Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>	2025-05-02 11:31:55 -07:00
Michael Goin	8fc88d63f1	[Model] Add tuned triton fused_moe configs for Qwen3Moe (#17328 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-04-28 15:20:24 -07:00
Harry Mellor	423e9f1cbe	Use Transformers helper `get_text_config()` instead of checking for `text_config` (#17105 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-25 08:47:35 -07:00
Lu Fang	55dcce91df	Upstream Llama4 Support to Main (#16113 ) Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com> Signed-off-by: Chris Thi <chris.c.thi@gmail.com> Signed-off-by: drisspg <drisspguessous@gmail.com> Signed-off-by: Jon Swenson <jmswen@gmail.com> Signed-off-by: Keyun Tong <tongkeyun@gmail.com> Signed-off-by: Lu Fang <fanglu@meta.com> Signed-off-by: Xiaodong Wang <xdwang@meta.com> Signed-off-by: Yang Chen <yangche@fb.com> Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com> Signed-off-by: Yong Hoon Shin <yhshin@meta.com> Signed-off-by: Zijing Liu <liuzijing2014@gmail.com> Signed-off-by: Lu Fang <lufang@fb.com> Signed-off-by: Lu Fang <fanglu@fb.com> Signed-off-by: Lucia Fang <fanglu@fb.com> Signed-off-by: Roger Wang <ywang@roblox.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Lu Fang <fanglu@fb.com> Co-authored-by: Roger Wang <ywang@roblox.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-04-07 08:06:27 -07:00
bnellnm	e59ca942f5	Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. (#13932 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-04-01 12:07:43 -04:00
Jee Jee Li	a73122de96	[Bugfix] fix benchmark moe (#14653 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-13 16:12:42 +08:00
Jeff Daily	a1c8f3796c	dynamic distpatch of fp8 kernels (#14245 ) Signed-off-by: Jeff Daily <jeff.daily@amd.com>	2025-03-11 10:54:56 -04:00
Brayden Zhong	c34eeec58d	[Bugfix] Correctly call `cudaProfilerStop` in benchmarks script (#14183 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-03-07 00:42:49 +00:00
Jee Jee Li	7bab4bb048	[Misc] Add Qwen2MoeForCausalLM moe tuning support (#14276 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-05 23:11:29 +08:00
Michael Goin	f78c0be80a	Fix benchmark_moe.py tuning for CUDA devices (#14164 )	2025-03-03 21:11:03 -08:00
Divakar Verma	bb5b640359	[core] moe fp8 block quant tuning support (#14068 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-03-04 01:30:23 +00:00
Harry Mellor	cf069aa8aa	Update deprecated Python 3.8 typing (#13971 )	2025-03-02 17:34:51 -08:00
Jongseok Park	781096e385	Expert Parallelism (EP) Support for DeepSeek V2 (#12583 )	2025-02-24 07:33:20 -08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Tyler Michael Smith	cfa134d247	[Bugfix/CI] Fixup benchmark_moe.py (#12562 ) Fixes `is_marlin` not being passed into `get_default_config` Also allow `--tensor-parallel-size` in addition to `-tp` and `--tp-size` Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-01 13:41:35 +08:00
Divakar Verma	1c1bb0bbf2	[Misc][MoE] add Deepseek-V3 moe tuning support (#12558 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-01-30 00:47:30 +00:00
Divakar Verma	2acba47d9b	[bugfix] moe tuning. rm is_navi() (#12273 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-01-21 22:47:32 +00:00
Divakar Verma	8027a72461	[ROCm][MoE] moe tuning support for rocm (#12049 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2025-01-17 14:49:16 +08:00
wangshuai09	622b7ab955	[Hardware] using current_platform.seed_everything (#9785 ) Signed-off-by: wangshuai09 <391746016@qq.com>	2024-10-29 14:47:44 +00:00
youkaichao	32176fee73	[torch.compile] support moe models (#9632 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-10-27 21:58:04 -07:00
Cyrus Leung	6ffa3f314c	[CI/Build] Avoid CUDA initialization (#8534 )	2024-09-18 10:38:11 +00:00
Mor Zusman	7fc23be81c	[Kernel] W8A16 Int8 inside FusedMoE (#7415 )	2024-08-16 10:06:51 -07:00
Michael Goin	8065a7e220	[Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718 )	2024-06-20 17:00:13 -06:00
Cyrus Leung	0e9164b40a	[mypy] Enable type checking for test directory (#5017 )	2024-06-15 04:45:31 +00:00
Philipp Moritz	51a08e7d8f	[Kernel] Re-tune Mixtral MoE configurations for FP8 on H100 (#5238 )	2024-06-05 10:59:14 -07:00
Woosuk Kwon	27208be66e	[Kernel] Add back batch size 1536 and 3072 to MoE tuning (#5242 )	2024-06-04 09:58:47 -07:00
Woosuk Kwon	3a434b07ed	[Kernel] Enhance MoE benchmarking & tuning script (#4921 )	2024-06-03 20:06:59 -07:00

49 Commits