youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jee Jee Li	61a6905ab0	[Model] Refactor JambaForCausalLM (#21394 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-29 18:25:07 +08:00
Reza Barazesh	37efc63b64	[V0 deprecation] Guided decoding (#21347 ) Signed-off-by: Reza Barazesh <rezabarazesh@meta.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-29 03:15:30 -07:00
Isotr0py	a4528f0cac	[Model]: Fused MoE for nomic-embed-text-v2-moe (#18321 ) Signed-off-by: isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-07-29 03:13:27 -07:00
Cyrus Leung	a2480251ec	[Doc] Link to RFC for pooling optimizations (#21806 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-28 23:53:18 -07:00
Nick Hill	7234fe2685	[Misc] Rework process titles (#21780 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-29 05:14:47 +00:00
Benji Beck	f1e2c095ec	Migrate InternVLImageInputs and InternVLVideoInputs to TensorSchema (#21684 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-07-28 22:09:45 -07:00
Gregory Shtrasberg	12a223ef9b	[AMD][CI/Build][Bugfix] Guarding CUDA specific functions by ifndef ROCM (#21766 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-07-29 03:35:37 +00:00
Calvin Chen	e18f085103	skip fusedmoe layer for start_load_kv (#21378 ) Signed-off-by: calvin chen <wen.chen@dynamia.ai>	2025-07-28 18:59:44 -07:00
Michael Goin	afa2607596	[CI] Parallelize Kernels MoE Test (#21764 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-28 18:56:24 -07:00
Wentao Ye	48b763d6b5	[Refactor] Merge Compressed Tensor FP8 `CompressedTensorsW8A8Fp8MoEMethod` and `CompressedTensorsW8A8Fp8MoECutlassMethod` (#21775 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-28 19:47:21 -06:00
Michael Goin	947e982ede	[Docs] Minimize spacing for supported_hardware.md table (#21779 )	2025-07-28 18:46:39 -07:00
lyrisz	c6c9122d50	[Kernel] SM90 CUTLASS FP8 GEMM: add support for swap AB + kernel tuning (#20396 ) Signed-off-by: Faqin Zhong <faqin.zhong@gmail.com> Co-authored-by: Duncan Moss <djm.moss@gmail.com>	2025-07-28 23:13:58 +00:00
Lucas Wilkinson	8aa1485fcf	[Perf] Disable chunked local attention by default with llama4 (#21761 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-07-28 18:49:04 -04:00
Nikhil Gupta	89ac266b26	[Feat]: Add support for Dynamic Quant 4 bit CPU kleidiai kernels (#17112 ) Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-28 20:55:15 +00:00
Clayton Coleman	c6f36cfa26	[Bugfix] DeepGEMM is not enabled on B200 due to `_lazy_init()` (#21472 ) Signed-off-by: Clayton Coleman <smarterclayton@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-28 20:51:22 +00:00
Kuntai Du	b18b417fbf	Revert "[V1] Exception Handling when Loading KV Cache from Remote Store" (#21778 ) Signed-off-by: KuntaiDu <kuntai@uchicago.edu>	2025-07-28 20:15:18 +00:00
Lu Fang	9ba1c88a93	[AMD][CI/Build] Fix the AMD issue caused by inappropriate of symbol exposure (#21647 ) Signed-off-by: Lu Fang <lufang@fb.com>	2025-07-28 20:11:16 +00:00
Wentao Ye	e0e58f9729	[Bug] Enforce contiguous input for `dynamic_scaled_fp8_quant` and `static_scaled_fp8_quant` (#21773 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-07-28 19:55:48 +00:00
rasmith	b361f14e39	[AMD][BugFix] Fix omission of wvSplitK kernel for small batch sizes (1-4) due to torch.compile (#21350 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-07-28 15:38:20 -04:00
weiliang	01c753ed98	update flashinfer to v0.2.9rc2 (#21701 ) Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>	2025-07-28 19:31:47 +00:00
Harry Mellor	94b71ae106	Use `metavar` to list the choices for a CLI arg when custom values are also accepted (#21760 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-28 19:31:10 +00:00
Nick Hill	7d44c691b0	[P/D] Log warnings related to prefill KV expiry (#21753 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-07-28 18:40:53 +00:00
Cyrus Leung	e17a4d3bf9	[Bugfix] Fix granite speech shape validation (#21762 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-28 14:19:21 -04:00
Chaojun Zhang	ec261b0291	[XPU] IPEX-optimized Punica Wrapper on XPU (#21703 ) Signed-off-by: chzhang <chaojun.zhang@intel.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-28 16:43:37 +00:00
Cyrus Leung	04fe61aa3d	[CI/Build] Fix plugin tests (#21758 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-28 15:08:05 +00:00
Michard Hugo	25708d317a	[Bugfix] Mistral crashes on tool with no description (#21167 ) Signed-off-by: HugoMichard <hugo@harfanglab.fr>	2025-07-28 08:03:35 -07:00
Cyrus Leung	0e18a5d058	[Misc] Reduce logs for model resolution (#21765 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-28 07:59:56 -07:00
Michael Goin	34a20c49b3	[Logs] Change flashinfer sampler logs to once (#21759 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-07-28 06:59:51 -07:00
Isotr0py	31084b3b1f	[Bugfix][CI/Build] Update peft version in test requirement (#21729 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-07-28 06:17:43 -07:00
wuhang	bccc43c033	[Bugfix]check health for engine core process exiting unexpectedly (#21728 ) Signed-off-by: wuhang <wuhang6@huawei.com>	2025-07-28 06:17:31 -07:00
Harry Mellor	1395dd9c28	[Docs] Add revision date to rendered docs (#21752 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-28 06:12:46 -07:00
Keyang Ru	9ace2eaf35	[Bugfix] Improve JSON extraction in LlamaToolParser (#19024 ) Signed-off-by: keru <keyang.ru@oracle.com> Co-authored-by: keru <keyang.ru@oracle.com>	2025-07-28 12:36:58 +00:00
Anton Vlasjuk	656c24f1b5	[`Ernie 4.5`] Name Change for Base 0.3B Model (#21735 ) Signed-off-by: vasqu <antonprogamer@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-28 12:22:32 +00:00
Chauncey	63fe3a700f	[PD] let p2p nccl toy proxy handle /chat/completions (#21734 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-07-28 11:45:50 +00:00
Isotr0py	0ae970ed15	[Bugfix] Fix glm4.1v video_grid_thw tensor shape scheme (#21744 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-07-28 04:26:49 -07:00
Li, Jiang	65e8466c37	[Bugfix] Fix environment variable setting in CPU Dockerfile (#21730 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-28 11:02:39 +00:00
Jee Jee Li	1b769dccf3	[Bugfix] Fix Ernie4_5_MoeForCausalLM shared experts (#21717 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-28 11:02:25 +00:00
rongfu.leng	2cc571199b	[feature] add log non default args in LLM (#21680 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-07-28 02:21:22 -07:00
Cyrus Leung	a4ed731546	[Model] Prioritize Transformers fallback over suffix matching (#21719 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-28 02:15:31 -07:00
Benji Beck	d128d0d554	Migrate KeyeImageInputs and KeyeVideoInputs to TensorSchema (#21686 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-07-28 01:16:35 -07:00
Asaf Joseph Gardin	a6c050286a	[v1][mamba] Added mamba_type into MambaSpec (#21715 ) Signed-off-by: asafg <asafg@ai21.com> Co-authored-by: asafg <asafg@ai21.com>	2025-07-28 08:15:55 +00:00
Lucas Wilkinson	139a7f07bd	[BugFix] Fix ChunkedLocalAttention when the hybrid kv-cache is disabled (#21707 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-07-28 07:18:47 +00:00
Ning Xie	150d9e6337	[Bugfix] fix max-file-size type from str to int (#21675 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-07-28 00:06:52 -07:00
Cyrus Leung	139a97ec56	[Bugfix] Fix shape checking for Fuyu (#21709 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-07-28 00:05:56 -07:00
rongfu.leng	18cc33dd60	[bugfix] fix profile impact benchmark results (#21507 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-07-27 22:44:24 -07:00
Hongsheng Liu	7656cf4cf3	[Bugfix] [issue-21565] Fix the incompatibility issue with stream and named function calling when Thinking is disabled (#21573 ) Signed-off-by: wangzi <3220100013@zju.edu.cn> Co-authored-by: wangzi <3220100013@zju.edu.cn>	2025-07-27 22:43:50 -07:00
Benji Beck	3ea57a56d9	Migrate Idefics3ImagePixelInputs and Idefics3ImageEmbeddingInputs to … (#21683 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-07-27 22:37:23 -07:00
Benji Beck	75856bc2cb	Migrate GraniteSpeechAudioInputs to TensorSchema (#21682 ) Signed-off-by: Benji Beck <benjibeck@meta.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-07-27 22:37:20 -07:00
Benji Beck	304dcdf575	Migrate GLMVImagePixelInputs to TensorSchema (#21679 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-07-27 22:36:11 -07:00
Benji Beck	88e46c7c8d	Migrate Glm4vImageInputs, Glm4vVideoInputs to TensorSchema (#21678 ) Signed-off-by: Benji Beck <benjibeck@meta.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-07-27 22:36:08 -07:00

1 2 3 4 5 ...

8088 Commits