Commit Graph

6000 Commits

Author SHA1 Message Date
a7b809e0f0 Merge remote-tracking branch 'upstream/main' into benchmark-output 2025-04-23 14:55:50 +00:00
7efc568418 Update convert_to_csv.py 2025-04-23 10:51:38 -04:00
8e630d680e Improve Transformers backend model loading QoL (#17039)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-23 07:33:51 -07:00
af869f6dff [CI] Update structured-output label automation (#17055)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-04-23 07:33:14 -07:00
53c0fa1e25 Ensure that pid passed to kill_process_tree is int for mypy (#17051)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-23 07:32:26 -07:00
f7912cba3d [Doc] Add top anchor and a note to quantization/bitblas.md (#17042)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-04-23 07:32:16 -07:00
6317a5174a Categorize tests/kernels/ based on kernel type (#16799)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-23 09:21:07 -04:00
aa72d9a4ea Mistral-format support for compressed-tensors (#16803)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-23 08:46:23 -04:00
ce17db8085 [CI] Run v1/test_serial_utils.py in CI (#16996)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-04-23 01:13:34 -07:00
8c87a9ad46 [Bugfix] Fix AssertionError: skip_special_tokens=False is not supported for Mistral tokenizers (#16964)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-04-23 07:24:09 +00:00
ec69124eb4 [Misc] Improve readability of get_open_port function. (#17024)
Signed-off-by: gitover22 <qidizou88@gmail.com>
2025-04-23 06:16:53 +00:00
d0da99fb70 [BugFix] llama4 fa3 fix - RuntimeError: scheduler_metadata must have shape (metadata_size) (#16998)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-04-22 21:49:24 -07:00
b2f195c429 [V1] Avoid socket errors during shutdown when requests are in in-flight (#16807)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-04-23 12:36:29 +08:00
047797ef90 [Bugfix] Triton FA function takes no keyword arguments (#16902)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-04-22 21:35:24 -07:00
eb8ef4224d [doc] add download path tips (#17013)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-23 04:06:30 +00:00
56a735261c [INTEL-HPU][v0] Port delayed sampling to upstream (#16949)
Signed-off-by: Michal Adamczyk <michal.adamczyk@intel.com>
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Michal Adamczyk <madamczyk@habana.ai>
2025-04-22 20:14:11 -07:00
e1cf90e099 [misc] tune some env vars for GB200 (#16992)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-04-23 10:59:48 +08:00
6bc1e30ef9 Revert "[Misc] Add S3 environment variables for better support of MinIO." (#17021) 2025-04-22 19:22:29 -07:00
7e081ba7ca [BugFix] Revert ROCm Custom Paged Attention Env Flag Check (#17022)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-04-22 19:17:48 -07:00
1e013fa388 [V1][DP] More robust DP/EP dummy request coordination (#16277)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-04-22 19:12:15 -07:00
bc7c4d206b [Kernel][ROCM] Upstream prefix prefill speed up for vLLM V1 (#13305)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu>
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com>
Signed-off-by: maleksan85 <maleksan@amd.com>
Signed-off-by: <>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: qli88 <qiang.li2@amd.com>
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>
2025-04-22 19:11:56 -07:00
f67e9e9f22 add Dockerfile build vllm against torch nightly (#16936)
Signed-off-by: Yang Wang <elainewy@meta.com>
2025-04-22 19:08:27 -07:00
36fe78769f [Bugfix] validate urls object for multimodal content parts (#16990)
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com>
2025-04-23 09:43:06 +08:00
83d933718c [Core][V1][TPU] Enable structured decoding on TPU V1 (#16499)
Signed-off-by: Chenyaaang <chenyangli@google.com>
2025-04-22 18:05:23 -06:00
5175b884f7 [BugFix] Remove default multiproc executor collective_rpc timeout (#17000)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-04-22 23:27:14 +00:00
5536b30a4c Fencing Kernels Tests for enabling on AMD (#16929)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-04-22 09:32:40 -07:00
7f58fb9718 Add assertion for no objects while hashing hf_config (#16930)
Signed-off-by: rzou <zou3519@gmail.com>
2025-04-22 09:32:22 -07:00
30bc3e0f66 [FEAT][ROCm]: Support AITER MLA (#15893)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: qli88 <qiang.li2@amd.com>
2025-04-22 09:31:13 -07:00
f34410715f [frontend] enhance tool_calls type check (#16882)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-22 15:40:24 +00:00
68d4c33202 [Misc] Add S3 environment variables for better support of MinIO. (#16977)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-04-22 14:27:36 +00:00
f961d7f6ef [BugFix] Pass in correct VLLM config in FlashInfer backend (#13207) (#16973)
Signed-off-by: 苏政渊 <suzhengyuan@moonshot.cn>
Co-authored-by: 苏政渊 <suzhengyuan@moonshot.cn>
2025-04-22 06:44:10 -07:00
d059110498 Improve configs - SpeculativeConfig (#16971)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-22 12:55:36 +00:00
571e8dd65e [Bugfix] Fix distributed bug again in Qwen2.5-VL & Qwen2.5-Omni (#16974)
Signed-off-by: fyabc <suyang.fy@alibaba-inc.com>
2025-04-22 12:23:17 +00:00
4b91c927f6 [Misc] refactor example series (#16972)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-22 11:44:21 +00:00
0e237f0035 [FEAT][ROCm] Integrate Paged Attention Kernel from AITER (#15001)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-04-22 02:46:28 -07:00
8f7bace7c3 [Doc] Improve documentation for multimodal CLI args (#16960)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-22 08:35:35 +00:00
e4d6144232 [BugFix] Fix incremental detokenization perf issue (#16963)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-04-22 08:16:19 +00:00
8d32dc603d [Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036)
Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com>
2025-04-22 09:01:36 +01:00
c4ab9f3e71 [V1] Remove pre-allocation for KV cache (#16941)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-04-22 00:52:18 -07:00
2689d5c027 [Model] Use autoweightloader for mamba (#16950)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2025-04-22 07:48:15 +00:00
acba33a0f1 [Bugfix] Fix the issue where llm.generate cannot be called repeatedly after setting GuidedDecodingParams (#16767)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-04-22 06:02:20 +00:00
a114bf20a3 [Perf] Optimize _update_states for GPU model runner (#16910)
Signed-off-by: snowcharm <snowcharmqq@gmail.com>
2025-04-22 14:01:54 +08:00
3097ce3a32 [Doc] Update ai_accelerator/hpu-gaudi.inc.md (#16956)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
2025-04-22 05:33:27 +00:00
d6da9322c8 [Bugfix] Fix f-string for Python 3.9-3.11 (#16962)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-04-21 21:45:55 -07:00
71ce44047f Support S3 Sharded loading with RunAI Model Streamer (#16317)
Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-04-21 21:21:49 -07:00
188b7f9b8c [Performance][ROCm] Add skinny gemms for unquantized linear on ROCm (#15830)
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
2025-04-21 20:46:22 -07:00
b9b4746950 [V1] Remove additional_config check (#16710)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-04-21 20:45:27 -07:00
7b8a2ab76f [Kernel] Add expert_map support to Cutlass FP8 MOE (#16861)
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>
2025-04-21 20:44:32 -07:00
c9acbf1141 [Misc] Remove the chunked prefill warning for LoRA (#16925)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-04-21 20:44:24 -07:00
5b794cae8d [ROCm] Add aiter tkw1 kernel for Llama4 fp8 (#16727)
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-04-21 20:42:34 -07:00