Commit Graph

2343 Commits

Author SHA1 Message Date
5895afd780 [Bugfix] The special_tokens in tokenizer should also be controlled by do_lower_case in encoder_config. (#20750)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-07-18 09:10:47 +00:00
ca4eb82bcb [Model] Re-add the implicit conversion feature for as_seq_cls_model (#21103)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-07-18 07:15:07 +00:00
5780121c95 [Perf] Add swap_ab to SM90 FP8 non-block CUTLASS moe grouped gemm (#20911)
Signed-off-by: Shixian Cui <shixian@amazon.com>
Co-authored-by: Shixian Cui <shixian@amazon.com>
2025-07-18 04:34:43 +00:00
90bd2ab6e3 [Model] Update pooling model interface (#21058)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-17 16:05:40 +00:00
9fb2d22032 [Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
2025-07-17 09:56:44 -04:00
4ef00b5cac [VLM] Add Nemotron-Nano-VL-8B-V1 support (#20349)
Signed-off-by: Kyle Huang <kylhuang@nvidia.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-17 03:07:55 -07:00
5a7fb3ab9e [Model] Add ToolParser and MoE Config for Hunyuan A13B (#20820)
Signed-off-by: Asher Zhang <asherszhang@tencent.com>
2025-07-17 09:10:09 +00:00
11dfdf21bf [Kernel] DeepGemm MoE : Integrate triton permute / unpermute kernels (#20903)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-17 08:10:37 +00:00
fdc5b43d20 [Bugfix]: Fix final_res_batch list index out of range error (#21055)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-17 00:29:09 -07:00
4fcef49ec4 [V1] [KVConnector] Fix MultiprocExecutor worker output aggregation (#21048)
Signed-off-by: David Ben-David <davidb@pliops.com>
Co-authored-by: David Ben-David <davidb@pliops.com>
2025-07-17 13:29:45 +08:00
76b494444f [Attention] Refactor attention metadata builder interface (#20466)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-07-17 04:44:25 +00:00
4e7dfbe7b4 Update PyTorch to torch==2.7.1 for CUDA (#21011)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-17 02:30:44 +00:00
18bdcf4113 feat - add a new endpoint get_tokenizer_info to provide tokenizer/chat-template information (#20575)
Signed-off-by: m-misiura <mmisiura@redhat.com>
2025-07-16 21:52:14 +08:00
d0dc4cfca4 Fix inadvertently silenced PP tests for mp, add DeepSeek V2/V3 model family to PP tests (#20831)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-07-16 00:14:49 -07:00
c11013db8b [Meta] Llama4 EAGLE Support (#20591)
Signed-off-by: qizixi <qizixi@meta.com>
Co-authored-by: qizixi <qizixi@meta.com>
2025-07-15 21:14:15 -07:00
1eb2b9c102 [CI] update typos config for CI pre-commit and fix some spells (#20919)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
2025-07-15 21:12:40 -07:00
6ebf313790 Avoid direct comparison of floating point numbers (#21002)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-07-15 21:12:14 -07:00
cfbcb9ed87 [Voxtral] Add more tests (#21010)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-15 21:11:49 -07:00
34cda778a0 [Frontend] OpenAI Responses API supports input image (#20975)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-15 18:59:36 -06:00
1e36c8687e [Deprecation] Remove nullable_kvs (#20969)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-15 17:21:50 +00:00
e7e3e6d263 Voxtral (#20970)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-15 07:35:30 -07:00
56fe4bedd6 [Deprecation] Remove TokenizerPoolConfig (#20968)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-15 14:00:50 +00:00
3534c39a20 [V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli (#20840)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-07-15 04:04:35 -07:00
37a7d5d74a [Misc] Refactor AllReduceFusionPass. Remove parameter (#20918)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-07-15 06:57:40 +00:00
d4d309409f Implement Async Scheduling (#19970)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-14 23:01:46 -07:00
80305c1b24 [CI] Fix flaky test_streaming_response test (#20913)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-14 20:15:15 -07:00
d4170fad39 Use w8a8 quantized matmul Pallas kernel (#19170)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-07-15 03:06:33 +00:00
ba8c300018 [BugFix] VLLM_DISABLE_COMPILE_CACHE=1 should disable all reads and writes from the cache (#20942)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2025-07-15 01:26:18 +00:00
149f2435a5 [Misc] Relax translations tests (#20856)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-14 20:08:36 +00:00
38efa28278 [Model] Add Ling implementation (#20680)
Signed-off-by: vito.yy <vito.yy@antgroup.com>
2025-07-14 22:10:32 +08:00
dcf2a5e208 [CI/Build] Fix OOM issue in Jina-VL test (#20907)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-14 10:32:35 +00:00
1e9438e0b0 [MISC] Move bind_kv_cache to worker module (#20900)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-14 09:40:00 +00:00
66f6fbd393 [Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) (#20511)
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
2025-07-14 02:45:31 +00:00
8632e831ba [Core] Add update_config RPC method (#20095)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-07-14 00:49:18 +00:00
bd4c1e6fdb Support for LlamaForSequenceClassification (#20807)
Signed-off-by: thechaos16 <thechaos16@gmail.com>
2025-07-13 00:09:34 -07:00
99b4f080d8 Renable google/gemma-3-1b-it accuracy test. (#20866)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-07-12 21:48:56 -07:00
020f58abcd [Core] Support multiple tasks per model (#20771)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-12 19:40:11 -07:00
c1acd6d7d4 [Refactor] Change the way of import triton (#20774)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-12 19:39:55 -07:00
42d440c22b [Perf] Use Triton instead of Torch for DeepGEMM Per Token Group Quant (#20841)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-07-12 19:38:45 -07:00
f45a332886 [Sched] Enhance the logic to remove stopped requests from queues (#20739) 2025-07-12 15:33:13 -07:00
2c11a738b3 [Model] New model support for microsoft/Phi-4-mini-flash-reasoning (#20702)
Signed-off-by: Congcong Chen <congcongchen@microsoft.com>
2025-07-12 06:02:10 -07:00
147afb448b [Bugfix] Replace unavailable video url in multimodal test (#20854)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-12 05:25:39 +00:00
01cae37713 [CI/Build] Ensure compatability with Transformers v4.53 (#20541)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-07-11 20:53:07 -07:00
11c0198615 [Bugfix] Fix tensor parallel issue in Qwen3 reranker weight loading (#20682)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-07-11 20:52:43 -07:00
a8593237c0 Add pynccl all-gatherv and reducescatterv (#20154)
Signed-off-by: Trevor Morris <tmorris@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-11 18:59:23 -07:00
fc0f41d10a Integration SM100 FlashInfer fused allreduce RMSNorm (#20691)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-07-11 18:58:15 -07:00
5f0af36af5 Update kimi-k2 tool calling docs, enable unit tests (#20821)
Signed-off-by: wangzhengtao <wangzhengtao@moonshot.cn>
Co-authored-by: wangzhengtao <wangzhengtao@moonshot.cn>
Co-authored-by: wangzhengtao <wangzhengtao@msh.team>
2025-07-11 20:16:14 +00:00
0d21b2664c [Bugfix] Fix OOM in language generation test (#20814)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-07-11 11:21:52 -07:00
53fa457391 [Misc] Add unit tests for MoE ModularKernel combinations + Profiling utility (#20449)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-11 07:51:46 -07:00
b4f0b5f9aa Temporarily suspend google/gemma-3-1b-it. (#20722)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-07-11 11:21:26 +00:00