youngkingdom/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Robert Shaw	0470cac520	updaed Signed-off-by: Robert Shaw <robshaw@redhat.com>	2025-08-14 02:14:03 +00:00
Cyrus Leung	0ca2393b47	[CI/Build] Increase pooling tolerance to pass CI (#22844 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-08-13 18:52:48 -04:00
Jialin Ouyang	31a500c86f	[Core] [N-gram SD Optimization][1/n] Propose tokens with a single KMP (#22437 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-08-13 14:44:06 -07:00
Isotr0py	df0e0f023e	[CI/Build] Skip gpt_big model test because of broken HF model (#22848 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-13 20:36:28 +00:00
Cyrus Leung	b4b78d6317	[CI/Build] Fix param mismatch in `test_eagle_correctness` (#22847 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-13 10:55:25 -07:00
Nicolò Lucchesi	12817a8ac7	[CI] Fix `tests/v1/e2e/test_kv_sharing_fast_prefill.py` import on test (#22815 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-13 10:35:50 -07:00
Cyrus Leung	c9232d41f4	[CI/Build] Update VLM common tests (#22841 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-13 10:03:05 -07:00
Cyrus Leung	19b927e52d	[Core] Use individual MM items in P0/P1 cache and model runner (#22570 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-13 07:18:07 -07:00
Nicolò Lucchesi	6b794c756c	[Nixl][CI] Fix tests (#22806 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-13 06:03:53 -07:00
Kdump	653124bd46	[Frontend] Add chunked processing to handle long inputs in embedding models (#22280 ) Signed-off-by: x22x22 <wadeking@qq.com> Signed-off-by: Kdump <rootshellexp@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-13 04:14:24 -07:00
Giancarlo Delfin	d94e3026de	[V1] Add tree drafting tests for eagle spec decoding (#22705 ) Signed-off-by: Giancarlo Delfin <gdelfin@meta.com>	2025-08-13 04:11:28 -07:00
Duc-Viet Hoang	a01e0018b5	[Bugfix] Fix Nemotron VL image processing (#22739 ) Co-authored-by: ducviet00-h2 <viet.d.hoang@h2corporation.jp>	2025-08-13 03:11:36 -07:00
shixianc	4c558cf62e	[Perf] Support topk softmax fused kernel for broader num_experts (#22211 ) Signed-off-by: Shixian Cui <shixian@amazon.com> Co-authored-by: Shixian Cui <shixian@amazon.com>	2025-08-12 21:34:47 -07:00
Woosuk Kwon	c5830381af	[V0 Deprecation] Remove args for multi-step scheduling (#22779 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-08-12 20:38:18 -07:00
Woosuk Kwon	d31f97cf57	[Misc] Remove tests/multi_step/__init__.py (#22778 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-08-12 20:21:18 -07:00
Woosuk Kwon	71683ca6f6	[V0 Deprecation] Remove multi-step scheduling (#22138 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-08-12 20:18:39 -07:00
RUTHLESS-BOT	53c730286c	[Misc] parametrize 'dtype' in test_flash_mla (#22641 ) Signed-off-by: RUTHLESS-BOT <wujiafeng@cmbchina.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-12 16:31:48 -04:00
Nicolò Lucchesi	422f22e012	[CI][Nixl] Check kv cache layout during handshake (#22745 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-12 12:53:52 -07:00
TeeKen Lau	c42fe0b63a	Add more test scenario for tensor schema (#22733 ) Signed-off-by: teekenl <teekenlau@gmail.com>	2025-08-12 16:34:41 +00:00
Rahul Tuli	5a4b4b3729	Add: `SupportsEagle3` interface for explicit EAGLE3 support (#22642 ) Signed-off-by: Rahul Tuli <rtuli@redhat.com>	2025-08-12 09:24:52 -07:00
Nicolò Lucchesi	3d9d40efde	[Bugfix][CI] Fix `test_remote_decode_lifecycle.py::test_short_prompt_lifecycle` (#22727 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-08-12 07:30:17 -07:00
Harry Mellor	80bb1e8afe	Officially support SmolLM3 using the Transformers backend (#22665 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-12 05:38:48 -07:00
Yongye Zhu	007dd90859	[gpt-oss] Enable gpt-oss on ampere (#22714 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>	2025-08-12 03:21:44 -07:00
RishiAstra	46ae7f6666	[Bugfix] Mamba2 SSD varlen bug fix initstates decay, improve test, assert chunk pwr 2 (#21783 ) Signed-off-by: Rishi Astra <40644327+RishiAstra@users.noreply.github.com>	2025-08-12 02:04:37 -07:00
phantomlei	bc8372efc3	[Bugfix] Fix erroneous randomly generated cases in bad word testing (#22170 ) Signed-off-by: phantomlei <phantomlei3@gmail.com>	2025-08-12 02:03:22 -07:00
dongluw	9f909b8996	[New Model] Support Command-A-Vision (#22660 ) Signed-off-by: donglu <donglu@cohere.com>	2025-08-12 01:39:54 -07:00
wang.yuqi	6d729c43fb	[Bugfix] Fix ModernBert load & Enable sliding window attention for bidirectional attention. (#22637 ) Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Max de Bayser <mbayser@br.ibm.com>	2025-08-12 00:23:17 -07:00
Michael Goin	93d0652433	[CI] Increase timeout for test_completion_with_image_embeds (#22670 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-11 20:31:36 -07:00
Michael Goin	ea1292ad3e	[CI Failure] Use float32 for tests/entrypoints/openai/test_audio.py (#22686 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-11 20:20:42 -07:00
Harry Mellor	839ab00349	Re-enable Xet on TPU tests now that `hf_xet` has been updated (#22666 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-11 19:54:40 -07:00
Chen Zhang	1891a265d3	[gpt-oss] Add test for response API + harmony (but skipped) (#22554 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-08-11 17:47:24 -07:00
TJian	65abe111a3	[CI] Skip Tree Attn Test in `test_max_len.py` to unblock CI (#22664 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-08-11 10:36:05 -07:00
22quinn	807d21b80d	[BugFix] [Spec Decode] Remove LlamaForCausalLMEagle3 to fix CI (#22611 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-11 10:31:36 -07:00
Isotr0py	c90fb03df5	[CI/Build] Skip Mllama HF runner tests with Transformers v4.55.0 (#22659 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-08-11 10:00:58 -07:00
wang.yuqi	84cf78acee	[Model] Pooling models default to using chunked prefill & prefix caching if supported. (#20930 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-08-11 09:41:37 -07:00
GuanLuo	16fb668b61	fix: NIXL connector transfers partial block to pass full multi-modal context (#21074 ) Signed-off-by: GuanLuo <gluo@nvidia.com>	2025-08-11 09:40:55 -07:00
Wentao Ye	f7dcce7a4a	[Feature] Add `VLLM_USE_DEEP_GEMM_E8M0` Env to Control E8M0 Scale (#21968 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-08-11 09:39:08 -07:00
Cyrus Leung	ebf7605b0d	[Misc] Move tensor schema tests (#22612 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-11 00:15:27 -07:00
Maximilien de Bayser	39052dbca8	Support token_type_ids in V1 with less code changes (#21985 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2025-08-10 22:54:59 -07:00
Nick Hill	5898b135ab	[BugFix] Fix KVConnectorOutput TPU breakage (#22598 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-08-10 19:33:48 -07:00
22quinn	b799f4b9ea	[CI/Build] Fix tensorizer test for load_format change (#22583 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-08-10 19:30:00 -07:00
Benji Beck	68b254d673	Fix TensorSchema validation test for symbolic dims (#22366 ) Signed-off-by: Benji Beck <benjibeck@meta.com>	2025-08-10 17:16:44 +00:00
Isotr0py	b76753f0b5	[Bugfix][Kernel] Support partial rotary embedding for MRoPE triton kernel (#22593 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-08-10 09:00:36 -07:00
Isotr0py	049c245143	[Misc] Replace flaky image urls in pixtral test (#22574 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-08-10 06:18:21 -07:00
Ning Xie	326976291b	[Misc] code clean duplicate set_current_vllm_config in _set_vllm_config (#22566 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-08-10 00:08:48 -07:00
Harry Mellor	c49848396d	Refactor sliding window configuration to Transformers best practice (#21927 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-09 20:50:48 -07:00
Chengji Yao	2a84fb422f	[TPU] kv cache update kernel doesn't need to be padded slices to multiple of num_slices_per_block (#22394 ) Signed-off-by: Chengji Yao <chengjiyao@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@gmail.com>	2025-08-09 20:49:04 -07:00
Le Chen	3d7363e61c	[Config] add "qwen" as a native eagle3 target supported model (#22333 ) Signed-off-by: lechen <lecself@163.com> Signed-off-by: LeChen <lecself@163.com>	2025-08-09 20:21:05 -07:00
Thomas Parnell	61f67d8acd	[V1] [Hybrid] Enable Full CUDA Graph (decode-only) for Mamba layers (#21401 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-08-09 20:16:11 -07:00
TJian	42172ad18f	[FEAT] [Performance] Add triton mrope to replace the torch code path (#22375 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-08-09 11:50:03 -07:00

1 2 3 4 5 ...

2596 Commits