Commit Graph

7722 Commits

Author SHA1 Message Date
b6381ced9c updated
Signed-off-by: Robert Shaw <robshaw@redhat.com>
2025-07-15 13:50:42 +00:00
d91278181d [doc] Add more details for Ray-based DP (#20948)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
2025-07-15 05:37:12 -07:00
20149d84d9 [MISC] Add init files for python package (#20908)
Signed-off-by: wangli <wangli858794774@gmail.com>
2025-07-15 12:16:33 +00:00
3534c39a20 [V1] [Hybrid] Refactor mamba state shape calculation; enable V1 via cli (#20840)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-07-15 04:04:35 -07:00
c586b55667 [TPU] Optimize kv cache update kernel (#20415)
Signed-off-by: Yifei Teng <tengyifei88@gmail.com>
2025-07-15 03:56:43 -07:00
33d560001e [Docs] Improve documentation for ray cluster launcher helper script (#20602)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-07-15 03:55:45 -07:00
f148c44c6a [frontend] Refactor CLI Args for a better modular integration (#20206)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
2025-07-15 02:23:42 -07:00
235bfd5dfe [Docs] Improve documentation for RLHF example (#20598)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-07-15 01:54:10 -07:00
68d28e37b0 [frontend] Add --help=page option for paginated help output (#20961)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-15 00:42:00 -07:00
37a7d5d74a [Misc] Refactor AllReduceFusionPass. Remove parameter (#20918)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-07-15 06:57:40 +00:00
d4d309409f Implement Async Scheduling (#19970)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-14 23:01:46 -07:00
85bd6599e4 [Model] Add AutoWeightsLoader support for BERT, RoBERTa (#20534)
Signed-off-by: Jennifer He <islandhe@gmail.com>
Signed-off-by: <islandhe@gmail.com>
Signed-off-by: Jen H <islandhe@gmail.com>
2025-07-15 13:34:24 +08:00
91b3d190ae [cold start] replace VLLM_COMPILE_DEPYF with debug_dump_dir (#20940)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-07-15 13:02:17 +08:00
fc017915f5 [Doc] Clearer mistral3 and pixtral model support description (#20926)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-14 21:56:53 -07:00
9ad0a4588b [Bugfix] Switch bailout logic for kv-cache-dtype with SM100 Flashinfer (#20934)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-07-15 03:27:50 +00:00
016b8d1b7f Enabled BnB NF4 inference on Gaudi (#20172)
Signed-off-by: Ruheena Suhani Shaik <rsshaik@habana.ai>
2025-07-14 20:26:08 -07:00
80305c1b24 [CI] Fix flaky test_streaming_response test (#20913)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-14 20:15:15 -07:00
37e2ecace2 feat: add image zoom to improve image viewing experience (#20763)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-14 20:14:23 -07:00
054c8657e3 [Docs] Add Kuberay to deployment integrations (#20592)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-07-14 20:13:55 -07:00
d4170fad39 Use w8a8 quantized matmul Pallas kernel (#19170)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
2025-07-15 03:06:33 +00:00
946aadb4a0 [CI/Build] Split Entrypoints Test into LLM and API Server (#20945)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-15 02:44:18 +00:00
bcdfb2a330 [Bugfix] Fix incorrect dispatch for CutlassBlockScaledGroupedGemm and DeepGEMM (#20933)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-15 01:42:17 +00:00
ba8c300018 [BugFix] VLLM_DISABLE_COMPILE_CACHE=1 should disable all reads and writes from the cache (#20942)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2025-07-15 01:26:18 +00:00
8cdc371217 SM100 Cutlass MLA decode with unrestricted num_heads (< 128) for DeepSeek TP (#20769)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-07-15 01:06:38 +00:00
61e20828da Fall back if flashinfer comm module not found (#20936)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-07-14 23:11:18 +00:00
55e1c66da5 [Docs] remove outdated performance benchmark (#20935)
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
2025-07-14 22:14:17 +00:00
86f3ac21ce Fix overflow indexing in causal_conv1d kernel (#20938)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-07-14 21:43:07 +00:00
149f2435a5 [Misc] Relax translations tests (#20856)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-07-14 20:08:36 +00:00
c0569dbc82 [Misc] ModularKernel : Perform WeightAndReduce inside TritonExperts & DeepGemmExperts (#20725)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
2025-07-14 19:47:16 +00:00
8bb43b9c9e Add benchmark dataset for mlperf llama tasks (#20338)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-14 19:10:07 +00:00
559756214b Change default model to Qwen3-0.6B (#20335)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-07-14 16:54:52 +00:00
6d0cf239c6 [CI/Build] Add Transformers nightly tests in CI (#20924)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-14 16:33:17 +00:00
3fc964433a [Misc] Clean up Aimv2 config registration in Ovis config (#20921)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-07-14 15:36:43 +00:00
0caf61c08a [CI] Update codeowner for compilation code (#20929)
Signed-off-by: Lu Fang <lufang@fb.com>
2025-07-14 08:33:19 -07:00
667624659b [CI] cc folks on changes to vllm/compilation (#20925)
Signed-off-by: Richard Zou <zou3519@gmail.com>
2025-07-14 07:52:17 -07:00
38efa28278 [Model] Add Ling implementation (#20680)
Signed-off-by: vito.yy <vito.yy@antgroup.com>
2025-07-14 22:10:32 +08:00
e8cc53af5e [Misc] Log the reason for falling back to FlexAttention (#20699)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-14 04:16:51 -07:00
a4851cfe68 [Bugfix]: Fix messy code when using logprobs (#20910)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-07-14 11:06:45 +00:00
9887e8ec50 [Misc] Remove unused function (#20909)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-14 10:48:55 +00:00
f326ab9c88 [Bugfix] Bump up mistral_common to support v13 tokenizer (#20905)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-07-14 10:45:03 +00:00
dcf2a5e208 [CI/Build] Fix OOM issue in Jina-VL test (#20907)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-14 10:32:35 +00:00
1e9438e0b0 [MISC] Move bind_kv_cache to worker module (#20900)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-07-14 09:40:00 +00:00
697ef765ee [Refactor][V1] Move outlines utils for V1 imports (#20878)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
2025-07-14 00:58:35 -07:00
a99b9f7dee [Quantization] add BNB for MixtralForCausalLM (#20893)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-07-14 07:34:34 +00:00
c488b928a7 [ROCm] [Bugfix] [Critical]: Fix mamba compilation bug (#20883)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-07-14 15:23:28 +08:00
2c7fa47161 Fix: Add missing EOFError handling in CLI complete command (#20896)
Signed-off-by: reidliu41 <reid201711@gmail.com>
2025-07-14 07:09:57 +00:00
88fc8a97e3 Removing redundant python version check (#20888)
Signed-off-by: Dannyso05 <dansong1177@gmail.com>
2025-07-14 06:15:05 +00:00
66f6fbd393 [Prefix Cache] Add reproducible prefix-cache block hashing using SHA-256 + CBOR (64bit) (#20511)
Signed-off-by: Maroon Ayoub <maroon.ayoub@ibm.com>
2025-07-14 02:45:31 +00:00
8632e831ba [Core] Add update_config RPC method (#20095)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-07-14 00:49:18 +00:00
4bbfc36b16 [V1] Hybrid allocator without prefix caching (#20661)
Signed-off-by: nopperl <54780682+nopperl@users.noreply.github.com>
2025-07-13 16:55:14 +00:00