Commit Graph

6768 Commits

Author SHA1 Message Date
05a4324f8e Initialize the delta tool call fields explicitly (#17340)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: igmainc <igmainc@icloud.com>
2025-05-12 13:28:58 +00:00
7ea6cb28b2 [Misc] Improve modelscope import error (#17983)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-05-12 10:46:45 +00:00
9fbf2bfbd5 Correcting testcases in builkite job for IBM Power (#17675)
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com>
2025-05-12 08:11:55 +00:00
3a5ea75129 [Feature] Support DeepSeekV3 Function Call (#17784)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
Signed-off-by: Xu Wenqing <xuwq1993@qq.com>
2025-05-12 00:45:21 -07:00
891b9d33de [Fix] Benchmark "EngineClient" has no attribute "model_config" (#17976)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-05-11 22:55:53 -07:00
430783018c [Bugfix][TPU] Use np array when updating cache slot_mapping (#17971)
Signed-off-by: Siyuan Liu <lsiyuan@google.com>
2025-05-12 12:58:33 +08:00
19a3c78d1f [Bugfix] Fix pydantic.errors.PydanticUserError (#17962)
Signed-off-by: wangli <wangli858794774@gmail.com>
2025-05-12 12:58:23 +08:00
ada50aa295 [bugfix] fix the wrong parser (#17958)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-12 04:58:02 +00:00
08bf784078 [Bugfix] validate grammar and throw 400 error instead of crashing the engine when xgrammar validation fails (#17623)
Signed-off-by: Jason Cheng <jasoncky96@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-05-12 09:06:10 +08:00
d45fe333fb [misc] add instructions on how to install nvshmem/pplx/deepep (#17964)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-05-11 18:02:39 -07:00
021c16c7ca [Model] Broadcast Ovis2 implementation to fit Ovis1.6 (#17861)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-05-11 17:56:30 -07:00
7de18d541b [BUG] [ROCm] [MLA] Fix variable name bug due to change in variable name in PR #17483 (#17961)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-05-11 09:14:30 -07:00
a810b5b088 [BugFix] [ROCm]: Bugfix and handle addition case of input for rocm_aiter_rms_norm (#17857)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-05-11 04:17:11 -07:00
009b3d5382 [Misc] not show --model in vllm serve --help (#16691)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-11 08:47:58 +00:00
e4b8713380 [New Model]: nomic-embed-text-v2-moe (#17785) 2025-05-11 00:59:43 -07:00
06c0922a69 [FP8][ROCm][Attention] Enable FP8 KV cache on ROCm for V1 (#17870)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-05-11 15:58:45 +08:00
cd3edfc908 [Misc] Add compressed-tensors NVFP4A16 emulation support (#17914)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Dipika <dipikasikka1@gmail.com>
2025-05-11 15:58:38 +08:00
9cea90eab4 [Frontend] Add /classify endpoint (#17032)
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com>
2025-05-11 07:57:07 +00:00
d1110f5b5a [doc] update lora doc (#17936)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-11 15:56:21 +08:00
8132365b74 [Bugfix]: v1 engine - consider lora adapters in allowed_token_ids (#17855)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-05-11 00:53:58 -07:00
eea22a56ab fix amd triton mla path (#17871) 2025-05-11 07:53:31 +00:00
9112155283 [Perf] Use small max_num_batched_tokens for A100 (#17885)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
2025-05-11 07:53:23 +00:00
90d0a74b60 [Bugfix] Add revision to transformers.Auto*.from_pretrained processors (#17948)
Signed-off-by: Xin Li <xin@centml.ai>
2025-05-11 07:52:44 +00:00
d74e5f37bc [Kernel] fp4 marlin kernel (#17687)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
2025-05-10 19:58:49 -07:00
ca66a1674c [v1] Rename specialized_manager.py to single_type_kv_cache_manager.py (#17946)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-05-10 16:14:12 -07:00
950751a987 [v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders (#17483)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-05-10 16:12:04 -07:00
4c31218f80 [Misc] remove --model from vllm serve usage (#17944)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-10 13:23:31 +00:00
68311891f5 Don't default construct ModelConfig when default constructing VllmConfig (#17943)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-10 13:23:00 +00:00
fc4441a4ee Add missing content type headers to /ping and /health (#17036) (#17786)
Signed-off-by: Ximo Guanter <ximo.guanter@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-10 07:13:32 +01:00
246e3e0a36 fix broken test vllm:test_kernels - test_attention_selector.py::test_flash_attn (#17873)
Co-authored-by: Stephen Chen <tracelog@meta.com>
2025-05-10 10:46:54 +08:00
7042cc96b0 [V1][Spec Decoding] Log accumulated metrics after system goes idle (#17913)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-05-09 18:23:07 -07:00
0c0fdae84f [Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model (#16362) 2025-05-09 16:24:41 -07:00
3b602cdea7 AMD conditional all test execution // new test groups (#17556)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu>
2025-05-09 15:35:58 -07:00
4b2ed7926a Improve configs - the rest! (#17562)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-09 15:18:44 -07:00
7e3571134f [V1][Spec Decoding] Include bonus tokens in mean acceptance length (#17908)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-05-09 13:32:36 -07:00
ea2236bf95 Add option to use torch._inductor.standalone_compile (#17057)
Signed-off-by: rzou <zou3519@gmail.com>
2025-05-09 12:59:04 -07:00
7d4aedae7c Handle error when str passed to /v1/audio/transcriptions (#17909)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-09 19:23:59 +00:00
22481fbfa3 Update CT WNA16MarlinMoE integration (#16666)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-09 13:19:45 -04:00
5c4c08f6f1 [Misc] Auto fallback to float16 for pre-Ampere GPUs when detected bfloat16 config (#17265)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-05-09 17:16:12 +00:00
c44c384b1c [Misc] Add references in ray_serve_deepseek example (#17907)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-05-09 16:59:36 +00:00
85b72cb7b1 Revert "[BugFix][AMD] Compatible patch for latest AITER(05/07/2025)" (#17910) 2025-05-09 08:58:18 -07:00
6e5595ca39 [CI/Build] Automatically retry flaky tests (#17856)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-05-09 09:55:17 -06:00
200da9a517 [v1] Move block management logic from KVCacheManager to SpecializedManager (#17474)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-05-09 15:25:34 +00:00
9f64e93415 [BugFix][AMD] Compatible patch for latest AITER(05/07/2025) (#17864)
Signed-off-by: Qiang Li <qiang.li2@amd.com>
2025-05-09 08:59:36 -06:00
ec61ea20a8 [Misc] add dify integration (#17895)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-05-09 03:42:39 -07:00
c6798baa9c Change top_k to be disabled with 0 (still accept -1 for now) (#17773)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-09 10:01:49 +00:00
5b2dcbf0b8 Fix Whisper crash caused by invalid`` max_num_batched_tokens`` config (#17853)
Signed-off-by: inkcherry <mingzhi.liu@intel.com>
2025-05-09 09:16:26 +00:00
6e4a93e3f7 [Bugfix][CPU] Fix broken AVX2 CPU TP support (#17252)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-05-09 08:55:14 +00:00
217db4baa6 [Bugfix][ROCm] Fix AITER MLA V1 (#17880)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-05-09 08:38:21 +00:00
ff8c400502 [Doc] remove visible token in doc (#17884)
Signed-off-by: yan <yanma1@habana.ai>
2025-05-09 01:21:31 -07:00