Commit Graph

9566 Commits

Author SHA1 Message Date
90d24dee04 enable piecewise cudagraphs for eagle
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-09-17 20:48:14 +00:00
883131544f [Bugfix] Update import path for bc_linter_include (#24766)
Signed-off-by: Mohammad Miadh Angkad <mangkad.bsdsba2027@aim.edu>
2025-09-17 20:33:11 +00:00
ee5fd49150 [Misc] Update owners for KV connector and V1 offloading (#25041)
Signed-off-by: ApostaC <yihua98@uchicago.edu>
2025-09-17 12:37:29 -07:00
7ae9887542 [V1] Logits processor docs (#22919)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com>
Co-authored-by: Joseph Marinier <Joseph.Marinier@gmail.com>
2025-09-17 11:53:12 -07:00
e3db5ebb66 [CI Bugfix] Fix failing test_model_load_with_params tests due to tokenizer refactor (#25086)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-17 11:15:05 -07:00
9d442b7c48 [V0 Deprecation] Remove V0 tests in test_sequence.py (#25088)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-17 11:08:45 -07:00
eb68c2dcd9 [CI] Revert back prepare_prompts and check_answers (#25087)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-17 11:03:16 -07:00
8b32464ac1 Change log level from info to debug for IOProcessor (#24999)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2025-09-17 10:21:28 -07:00
99cc41ad50 [V0 Deprecation] Remove unused output processor util (#25023)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-17 09:50:07 -07:00
d6a518fdde Remove unused find_cuda_init helper script (#25044) 2025-09-17 09:47:40 -07:00
4aa8c7b047 cleanup: remove adapter commons (#25045)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-17 16:46:29 +00:00
4b946d693e [V0 Deprecation] Remove V0 Core tests (#25082)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-17 09:32:42 -07:00
087c6ffc92 [CI Bugfix] Fix failing test_invalid_env (#25078)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-17 08:28:58 -07:00
4a2d33e371 [Docs] vllm/benchmarks/datasets.py fix docstring param format. (#24970)
Signed-off-by: samzong <samzong.lu@gmail.com>
2025-09-17 08:11:51 -07:00
8f3616f422 Remove old cutlass mla (#23961)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-09-17 14:31:43 +00:00
47f670b03b [Docs] improve code formatting and comments for eliminate griffe build warning. (#25010)
Signed-off-by: samzong <samzong.lu@gmail.com>
2025-09-17 07:31:20 -07:00
dd6a910aac [Bugfix][Qwen3-Next] fixes the varlen issue in qwen3-next's MTP implementation. (#24957)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
2025-09-17 21:59:09 +08:00
1b962e2457 [fix] lora benchmarks pass no_lora_flag_cpu (#23774)
Signed-off-by: Dylan Maloy <34420038+dolpm@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-17 21:22:25 +08:00
bfe9380161 Apply fixes for CUDA 13 (#24599)
Signed-off-by: Aidyn-A <aidyn.b.aitzhan@gmail.com>
2025-09-17 09:15:42 -04:00
9fccd04e30 [Bugfix] Fix Stream usage in CPU model runner and OneDNN kernel check (#25046)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-09-17 05:54:02 -07:00
252ada5559 Add RADIO Vision Encoder Support to vLLM (#24595)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
Co-authored-by: root <root@cw-dfw-h100-001-305-026.cm.cluster>
2025-09-17 05:53:30 -07:00
e120533d7a [Misc] Avoid use of deprecated AutoModelForVision2Seq (#25065)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-17 12:19:15 +00:00
2b85697031 [BugFix] enable DOTALL to match multi-line tool_call parameters in extract_tool_call_required_streaming (#24668)
Signed-off-by: Shijun Yin <shijun.yin@outlook.com>
2025-09-17 09:21:18 +00:00
544fe76b95 [Frontend] Support returning all prompt logprobs (#24956)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-09-17 09:03:52 +00:00
bb58dc8c20 [DP] Create placement groups by ray_device_key (#25026)
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2025-09-17 08:57:25 +00:00
0fb2551c23 [Docs] Fix griffe warning in base_static_graph.py (#25018)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-09-17 08:49:19 +00:00
6c47f6bfa4 [Core] Remove tokenizer group in vLLM (#24078)
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
2025-09-17 08:42:59 +00:00
whx
c15309a730 [Model] Apply SharedFusedMoE to glm4_moe. (#24849)
Signed-off-by: whx-sjtu <2952154980@qq.com>
2025-09-17 16:02:31 +08:00
whx
4a9375fe9d [Model] Pass param prefix to LLMHead (#24862)
Signed-off-by: whx-sjtu <2952154980@qq.com>
2025-09-17 16:01:27 +08:00
03191cd8f0 [Core][MultiModalHasher] Hash images without converting image mode (#24969)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-09-17 00:57:34 -07:00
b77bf34e53 [EPLB] Support EPLB for Mixtral Model (#22842)
Signed-off-by: rouchenzi <ruochenwen@gmail.com>
Signed-off-by: rouchenzi <40842833+rouchenzi@users.noreply.github.com>
Co-authored-by: Bowen Wang <abmfy@icloud.com>
2025-09-17 07:27:34 +00:00
dd39baf717 [XPU] Fix xpu model runner call torch.cuda APIs (#25011)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-09-17 06:45:25 +00:00
43a62c51be Add more documentation and improve usability of lognormal dist (benchmark_serving_multi_turn) (#23255)
Signed-off-by: daniels <daniels@pliops.com>
2025-09-17 05:53:17 +00:00
ca2d1925ef [Rocm] [quantization] Fix quark ptpc moe and add test case (#24649)
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
Co-authored-by: Haoyang Li <haoyang.li@amd.com>
2025-09-16 22:15:13 -07:00
0f7acdd73c [Model] Support Qwen3-VL Model Series (#24727)
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Huang Jie <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: 松灵 <26085463+wulipc@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-17 05:01:04 +00:00
5801e49776 [V0 Deprecation] Remove MQLLMEngine (#25019)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-16 21:29:27 -07:00
58d4c705a8 [Core] Get num_encoder_tokens from scheduler config (#24989)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-09-16 20:59:07 -07:00
ea3de5ef0d [misc] fix typo in value error (#24995)
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
2025-09-16 20:58:38 -07:00
67532a1a68 [UX] Remove "quantization is not fully optimized yet" log (#25012)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-16 20:57:51 -07:00
5672ba90bd [Docs] fix invalid doc link (#25017)
Signed-off-by: zxw <1020938856@qq.com>
2025-09-16 20:53:23 -07:00
dd83a157f1 [UX] Enforce valid choices for envs like VLLM_ATTENTION_BACKEND, etc (#24761)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
2025-09-16 20:42:23 -07:00
5a411ef6c4 [Benchmarks] Add MMVU video dataset support and clean up deprecated datasets (#24719)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-09-17 03:29:43 +00:00
eeb135eb87 [Core] Use CpuGpuBuffer for block table tensors (#24795)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-09-16 19:18:06 -07:00
3059b9cc6b [Doc] Add --force-overwrite option to generate_cmake_presets.py (#24375)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-09-16 18:45:29 -07:00
64ad551878 Removes source compilation of nixl dependency (#24874)
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Daniele <36171005+dtrifiro@users.noreply.github.com>
2025-09-17 01:33:18 +00:00
cef32104b4 [FP8] Extend per-token-group quantization support to QuantFP8 (#24342)
Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
2025-09-16 18:31:06 -07:00
493b10f8bf [CI] GPT-OSS GPQA eval test for Blackwell (#24920)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-16 18:13:21 -07:00
d119fc8614 [CI][Bugfix] Fix failing Blackwell test (#24993)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-09-16 15:55:02 -07:00
dbebb7f812 [Perf] Reuse workspace for FP8+FP4 Marlin MoE (#20500)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-16 15:45:10 -06:00
3053a22b33 fp8 kv cache support fix for torch.compile (#22758)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
2025-09-16 21:27:11 +00:00