Commit Graph

8800 Commits

Author SHA1 Message Date
998720859c Migrate MiniCPMOAudioInputs to TensorSchema (#21847)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-08-22 16:43:29 +08:00
0ba1b54ac6 [gpt-oss] add input/output usage in responses api when harmony context is leveraged (#22667)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
2025-08-22 08:32:24 +00:00
53415653ff [P/D][Nixl] Make kv cache register compatible with hybrid memory allocator (#23079)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2025-08-21 22:30:48 -07:00
17373dcd93 [Attention] Refactor AttentionMetadata Preparation for Encoder-only Models (#23154)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-22 05:05:59 +00:00
5964069367 [New Model] Add Seed-Oss model (#23241)
Signed-off-by: jiabin.00 <jiabin.00@bytedance.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-22 04:58:10 +00:00
de9c085e17 [Misc] Add gemma3 chat template with pythonic-style function calling (#17149)
Signed-off-by: Philip Chung <philip.f.chung@gmail.com>
2025-08-21 21:06:50 -07:00
111692bb8c [CI] Add end-to-end V1 min_tokens test coverage (#22495)
Signed-off-by: Arjun Reddy <189282188+arjunbreddy22@users.noreply.github.com>
Co-authored-by: Arjun Reddy <189282188+arjunbreddy22@users.noreply.github.com>
2025-08-21 22:04:07 -06:00
394591e343 [Feature] Enable DeepGEMM Linear on B200; 1.5% E2E throughput improvement (#23351)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-08-21 21:01:08 -07:00
3ac849665d [CI/Build] Skip Idefics3 and SmolVLM generation test again (#23356)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-22 03:39:46 +00:00
0b9cc56fac Migrate MllamaImagePixelInputs to TensorSchema (#22020)
Signed-off-by: Benji Beck <benjibeck@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-08-22 11:28:49 +08:00
8896eb72eb [Deprecation] Remove prompt_token_ids arg fallback in LLM.generate and LLM.embed (#18800)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-22 10:56:57 +08:00
19fe1a0510 [Kernel] Add FP8 support with FlashMLA backend (#22668)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-08-22 02:26:32 +00:00
480bdf5a7b [Core] Support custom executor qualname (#23314)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-08-22 09:40:54 +08:00
5368f76855 [Feature][Responses API] Support logprobs(non-stream) (#23319)
Signed-off-by: Kebe <mail@kebe7jun.com>
2025-08-21 23:09:16 +00:00
8ef6b8a38c Always use cache mounts when installing vllm to avoid populating pip cache in the image. Also remove apt cache. (#23270)
Signed-off-by: Valentyn Tymofieiev <valentyn@google.com>
2025-08-21 18:01:03 -04:00
3bbe11cc13 [Perf] Small optimizations for silu_mul_fp8_quant_deep_gemm (#23265)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-21 17:56:15 -04:00
c5041f899f [CI] improve pr comments bot (#23380) 2025-08-21 14:49:03 -07:00
8b5fe6eb51 [CI] Clean up actions: remove helm, publish workflows and improve pr … (#23377) 2025-08-21 14:29:04 -07:00
800349c2a5 [Structured Outputs] Refactor bitmask construction into get_grammar_bitmask (#23361)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-21 20:53:33 +00:00
044931f97b Make sure that vectorize_with_alignment produced vectorized global loads (#23182) 2025-08-21 20:06:54 +00:00
1d353b6352 [Core] Always use tensor cores for Flashinfer Decode Wrapper (#23214)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-08-21 16:02:11 -04:00
3496274663 [Misc] Convert VLLM_TORCH_PROFILER_DIR path to absolute (#23191)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-08-21 15:49:09 -04:00
8a19303173 [BugFix][gpt-oss] Fix Chat Completion with Multiple Output Message (#23318)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-08-21 10:31:11 -07:00
603fbbbce0 [Misc] Misc code cleanup/simplification (#23304)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-08-21 17:22:55 +00:00
10f535c086 [Bugfix] Fix port conflict by obtaining a list of open ports upfront (#21894)
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-08-21 10:22:18 -07:00
48bfb0c9b7 [Bug] Fix R1 Accuracy 0 Bug (#23294)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-21 13:11:28 -04:00
f8ce022948 add tg-mxfp4-moe-test (#22540)
Signed-off-by: siyuanf <siyuanf@nvidia.com>
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-21 17:05:47 +00:00
0278f1ac3a Fix nvfp4 swizzling (#23140)
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-08-21 16:54:50 +00:00
a482e4e769 Migrate MolmoImageInputs to TensorSchema (#22022)
Signed-off-by: Benji Beck <benjibeck@meta.com>
2025-08-21 16:54:08 +00:00
e0b056e443 [ci/build] Fix abi tag for aarch64 (#23329)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-08-21 23:32:55 +08:00
79f05e4436 [Multimodal] Always enable hashing mm data (#23308)
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-21 07:23:28 -07:00
f8daddcc4c [Bugfix] set system_message in phi4mini chat template (#23309)
Signed-off-by: zhuangqh <zhuangqhc@gmail.com>
2025-08-21 14:22:39 +00:00
c8e33c72c6 [V1] Remove unnecessary check for main thread (#23298)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-08-21 14:08:35 +00:00
d70a16625d [Performance] V1 Pooling Models E2E Performance Optimization (#23162)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-08-21 13:26:09 +00:00
5cc54f7c5b [Doc] Fix batch-level DP example (#23325)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-08-21 06:16:38 -07:00
0c6e40bbaa [Refactor] Simplify code for MM budget (#23310)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-21 08:00:16 +00:00
2e2000f352 [Model] Add LFM2 architecture (#22845)
Signed-off-by: Paul Pak <paulpak58@gmail.com>
2025-08-21 09:35:07 +02:00
31282401b6 [BugFix] Fix Python 3.9 Support (#23306)
Signed-off-by: Jared O'Connell <46976761+jaredoconnell@users.noreply.github.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-08-20 23:23:56 -07:00
0c31e28e95 [Bugfix] Fix extra whitespace in strings caused by newline (#23272)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-20 22:03:00 -07:00
f571ff8eb6 [Sampler] Support returning final logprobs (#22387)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-20 21:28:32 -07:00
f64ee61d9e [CI] Block the cu126 wheel build while broken (#23285)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-21 04:21:05 +00:00
8993073dc1 [CI] Delete images older than 24h. (#23291)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
2025-08-20 21:15:20 -07:00
655a09f653 [Model][VLM] Support R-4B Model (#23246)
Signed-off-by: yannqi <yannqi@qq.com>
Signed-off-by: 杨奇(yann qi) <51905299+yannqi@users.noreply.github.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: yannqiyang <yannqiyang@tencent.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-08-21 04:08:52 +00:00
f94bf9b924 [Compile] Fix Compile Warning SM100 Cutlass MLA (#23287)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-08-21 03:09:39 +00:00
3663870c72 [V1][Mamba1] - Full CUDA and Piecewise CUDA Graphs Support (#23035)
Signed-off-by: asafg <asafg@ai21.com>
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
Co-authored-by: asafg <asafg@ai21.com>
2025-08-20 20:08:51 -07:00
2461d9e562 [CI/Build] Split out mm processor tests (#23260)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-20 20:05:20 -07:00
7be5d113d8 [CPU] Refactor CPU W8A8 scaled_mm (#23071)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-08-21 09:34:24 +08:00
b029de9902 [Optimization] Make new_block_ids None if empty (#23262)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-08-20 18:25:56 -07:00
bbea1cefdd [CI Bugfix] Fix CI by fully removing --enable-prompt-adapter (#23284)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-20 17:18:12 -07:00
f5aa307d77 Remove duplicate entry in vllm.attention.__all__ (#23296)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-08-20 17:14:59 -07:00