fcec8c8827
add debug cruft
...
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
2025-06-20 20:37:37 +00:00
850dafea92
update
...
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
2025-06-20 19:57:07 +00:00
b4f17e12a4
tolerances
...
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
2025-06-20 19:47:25 +00:00
21ffc7353a
fixup
...
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
2025-06-20 15:56:05 +00:00
39d5d33f8f
tweaks
...
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
2025-06-20 15:36:59 +00:00
7a821f0e7f
precommit
...
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
2025-06-20 14:41:20 +00:00
26fd8ca33c
fixes
...
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
2025-06-20 14:40:21 +00:00
d5f206767c
Unit test
...
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
2025-06-20 14:39:58 +00:00
2b5ad9f233
fixes - use-fp8-dispatch
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-06-18 21:46:22 +00:00
299f829180
DeepGEMM LL optimizations
...
- Quantized dispatch
- Fused act-and-mul-and-quant in the right layout for DeepGEMM
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
2025-06-18 20:09:51 +00:00
104a984e6a
Merge remote-tracking branch 'nm/varun/deepep-fp8-dispatch' into ll_deepgemm_opt
2025-06-18 19:35:02 +00:00
12575cfa7a
[Bugfix] fix RAY_CGRAPH_get_timeout is not set successfully ( #19725 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-06-18 10:26:16 -07:00
8b6e1d639c
[Hardware][AMD] integrate aiter chunked prefill into vllm ( #18596 )
...
Signed-off-by: fsx950223 <fsx950223@outlook.com >
Signed-off-by: charlifu <charlifu@amd.com >
Co-authored-by: fsx950223 <fsx950223@outlook.com >
Co-authored-by: charlifu <charlifu@amd.com >
2025-06-18 08:46:51 -07:00
8de2fd39fc
deep_ep + use_fp8_dispatch
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-06-18 07:32:15 -07:00
735a9de71f
[Qwen] Add tagging rule for Qwen related PRs ( #19799 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-06-18 14:26:43 +00:00
257ab95439
[Platform] Allow platform use V1 Engine by default ( #19792 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-06-18 13:03:36 +00:00
cca91a7a10
[doc] fix the incorrect label ( #19787 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-18 10:30:58 +00:00
f04d604567
[Minor] Zero-initialize attn output buffer ( #19784 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-06-18 06:59:27 +00:00
19a53b2783
[V1] Decouple GPU and TPU InputBatch ( #19778 )
...
Signed-off-by: Andrew Feldman <afeldman@redhat.com >
2025-06-18 06:38:13 +00:00
eccdc8318c
[V1][P/D] An native implementation of xPyD based on P2P NCCL ( #18242 )
...
Signed-off-by: Abatom <abzhonghua@gmail.com >
2025-06-18 06:32:36 +00:00
5f52a84685
[V1] Add API docs for EncoderCacheManager ( #19294 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-06-18 13:37:01 +08:00
d4629dc43f
[Misc] Add __str__ for RequestStatus ( #19780 )
...
Signed-off-by: Linkun Chen <github@lkchen.net >
2025-06-18 03:03:01 +00:00
6e9cc73f67
[MISC] correct DeviceConfig device field static type analysis ( #19699 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-06-17 17:21:50 -07:00
c53711bd63
[MISC] correct copy_blocks src_to_dists param type ( #19696 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-06-17 17:21:06 -07:00
dac8cc49f4
[TPU] Update torch version to include paged attention kernel change ( #19706 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2025-06-17 22:24:49 +00:00
a44b1c951d
[Feature][ROCm] Add full graph capture support for TritonAttentionBackend ( #19158 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2025-06-17 17:03:06 -04:00
b447624ee3
[Bugfix] Fix faulty triton importing logic when using Ray for DP ( #19734 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-06-17 20:59:29 +00:00
cda92307c1
[Misc] Update lmcache connector with the latest connector apis ( #19441 )
...
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn >
2025-06-17 19:57:54 +00:00
bf57ccc5c2
Remove sm120 arch from sm100 cutlass kernel arch list ( #19716 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-06-17 11:49:39 -07:00
ffb2cd6b54
[Perf] Optimize moe_align_block_size CUDA kernel ( #19572 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-06-17 11:49:26 -07:00
ca94d7fa00
[Bugfix] Update multimodel models mapping to fit new checkpoint after Transformers v4.52 ( #19151 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-06-17 15:58:38 +00:00
5a1c2e15d8
[Mis] remove duplicate engine status checks ( #19647 )
...
Signed-off-by: googs1025 <googs1025@gmail.com >
2025-06-17 08:17:38 -07:00
4c8f64faa7
[V1][Kernel] Flashinfer HND KV cache layout ( #19280 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-06-17 09:09:22 -04:00
93aee29fdb
[doc] split "Other AI Accelerators" tabs ( #19708 )
2025-06-17 22:05:29 +09:00
154d063b9f
[doc][mkdocs] Add edit button to documentation ( #19637 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-17 11:10:31 +00:00
ccd7c05089
[Kernel] Add Split-KV Support to Unified Triton Attention Kernel ( #19152 )
...
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com >
2025-06-17 10:45:07 +00:00
c48c6c4008
Add a doc on how to update PyTorch version ( #19705 )
2025-06-17 18:10:37 +08:00
aed8468642
[Doc] Add missing llava family multi-image examples ( #19698 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-06-17 07:05:21 +00:00
5c76b9cdaf
[Core] add remove_seq_from_computed_blocks_tracker to BlockSpaceManager ( #19686 )
...
Signed-off-by: 刘全 <quan.liu2@dbappsecurity.com.cn >
Co-authored-by: 刘全 <quan.liu2@dbappsecurity.com.cn >
2025-06-17 04:40:58 +00:00
ddfed314f9
Fixes IMA for TP w/ flex-attention ( #19712 )
...
Signed-off-by: drisspg <drisspguessous@gmail.com >
2025-06-17 04:01:50 +00:00
5b3ad5ecf2
[DOC] fix doc typos ( #19600 )
...
Signed-off-by: Di Liu <liu-di@sjtu.edu.cn >
2025-06-17 11:34:53 +08:00
ede5c4ebdf
[Frontend] add chunking audio for > 30s audio ( #19597 )
...
Signed-off-by: nguyenhoangthuan99 <thuanhppro12@gmail.com >
2025-06-17 11:34:00 +08:00
07334959d8
[Wheel Size] Only build FA2 8.0+PTX ( #19336 )
2025-06-17 12:32:49 +09:00
119f683949
[doc] add project flag to gcloud TPU command ( #19664 )
...
Signed-off-by: David Xia <david@davidxia.com >
2025-06-17 01:00:09 +00:00
0860087aff
[Fix] Fall back to Gloo when NCCL backend is unavailable ( #19641 )
...
Signed-off-by: conroy-cheers <conroy@corncheese.org >
2025-06-17 08:42:14 +08:00
6bc7b57315
[Quantization] Remove FP4 emulation; Fall-back to marlin for device < 100 ( #19563 )
2025-06-16 17:33:51 -04:00
90f9c2eb5c
[V1] Change return type on get_multimodal_embeddings() ( #19446 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-06-16 13:32:15 -04:00
387bdf0ab9
[Model] Add support for MiniMaxM1ForCausalLM (shares architecture with MiniMaxText01ForCausalLM) ( #19677 )
...
Signed-off-by: QscQ <qscqesze@gmail.com >
2025-06-16 09:47:14 -07:00
5e5baa91aa
[Kernels] Use empty for modular MoE workspaces ( #19667 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-06-16 14:58:01 +00:00
836d4ce140
[Bugfix] fix missing 'finish_reason': null in streaming chat ( #19662 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-06-16 14:10:39 +00:00
c3fec47bb7
[MISC] bump huggingface_hub pkg to 0.33.0 ( #19547 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-06-16 05:22:28 -07:00
1173804dca
[Bugfix] Fix TP inference for Flex attention backend ( #19657 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-06-16 11:21:37 +00:00
4d5424029b
[Feature]:Allow for Granite MoE Hybrid models with _only_ shared experts. ( #19652 )
...
Signed-off-by: Shawn Tan <shawntan@ibm.com >
2025-06-16 11:14:18 +00:00
3e7506975c
[DOC] Add reasoning capability to vLLM streamlit code ( #19557 )
2025-06-16 07:09:12 -04:00
ee35e96ac3
[BugFix] Don't catch BaseException when dumping execute_model errors ( #19626 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-06-16 11:01:08 +00:00
dec66d253b
[Kernel] GGUF MMVQ kernel for multiple input vectors ( #18754 )
...
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com >
2025-06-16 17:33:26 +08:00
8d120701fd
[Docs] Move multiproc doc to v1 dir ( #19651 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-06-16 09:10:12 +00:00
f40f763f12
[CI] Add mteb testing for rerank models ( #19344 )
2025-06-16 01:36:43 -07:00
26bc46ef89
[MISC] typo fix ( #19672 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-06-16 07:18:49 +00:00
a77aea59fd
[TPU] support attention head dim smaller than 128 ( #19620 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-06-16 06:40:53 +00:00
b692e9cd07
[Misc] Fix skipped max-model-len validation when deriving max model length from tokenizer config ( #19660 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-06-16 06:30:29 +00:00
367871a469
[Misc][Frontend] passthrough bad_words ( #19564 )
...
Signed-off-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai >
Co-authored-by: Francesco Bertolotti <francesco.bertolotti@igenius.ai >
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com >
2025-06-16 05:05:13 +00:00
92183b41f3
[Bugfix][Core] Prefix caching causes incorrect outputs due to outdated ComputedBlocksTracker ( #18957 )
...
Signed-off-by: 刘全 <quan.liu2@dbappsecurity.com.cn >
Co-authored-by: 刘全 <quan.liu2@dbappsecurity.com.cn >
2025-06-15 21:56:37 -07:00
c6703d1e0d
[MISC] Remove unused variableds in C++ ( #19609 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-06-15 20:05:28 -07:00
a5e7242d5f
[Misc] Remove duplicate multiproc method setting for CPU platform ( #19649 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-06-16 02:26:58 +00:00
91b2c17a55
[CI/Build] Fix torch nightly CI dependencies part 2 ( #19589 )
2025-06-15 20:01:10 +08:00
055915e6ce
Enable prefix caching with full cuda graphs ( #19617 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-06-15 01:05:05 -07:00
3d330c4c09
[Benchmark] Refactor benchmark script for fp8 & int8 ( #19627 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-06-15 15:15:37 +08:00
0b73736a0d
[Kernel] Raise verbose error and consolidate num_heads/num_kv_heads divisibility check ( #19339 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-06-15 13:43:48 +08:00
ee1531bc38
[Bugfix][2/n] Fix speculative decoding CI - Fix test_ngram_e2e_greedy_correctness ( #19644 )
2025-06-14 21:15:41 -07:00
e13945f9dd
[Perf] Further tunings for SM100 FP8 CUTLASS kernel ( #19566 )
2025-06-14 17:25:10 -07:00
08500011d3
[Fix] Convert kv_transfer_config from dict to KVTransferConfig ( #19262 )
2025-06-14 12:32:07 -07:00
861a0a0a39
[Bugfix] Don't attempt to use triton if no driver is active ( #19561 )
2025-06-14 12:30:54 -07:00
bc956b38d0
Only build CUTLASS MoE kernels on Hopper ( #19648 )
2025-06-14 11:44:15 -07:00
294fc1e2c9
[Hardware][NVIDIA][kernel] Fp4 MOE quant kernel optimization ( #19500 )
2025-06-14 09:34:28 -07:00
2db9044ab6
[Bugfix] Fix auto dtype casting for BatchFeature ( #19316 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-06-14 15:13:08 +00:00
6fa718a460
[Misc] Modularize CLI Argument Parsing in Benchmark Scripts ( #19593 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-14 16:54:52 +08:00
06be858828
[Bugfix] Fix the speculative decoding test by setting the target dtype ( #19633 )
2025-06-13 20:57:32 -07:00
d1e34cc9ac
[V1][Metrics] Deprecate metrics with gpu_ prefix for non GPU specific metrics. ( #18354 )
...
Signed-off-by: Saheli Bhattacharjee <saheli@krai.ai >
2025-06-14 11:07:36 +08:00
bd517eb9fe
[BugFix] Fix DP Coordinator incorrect debug log message ( #19624 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-06-14 00:18:03 +00:00
d65668b4e8
Adding "AMD: Multi-step Tests" to amdproduction. ( #19508 )
...
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-06-13 17:08:51 -07:00
aafbbd981f
[torch.compile] Use custom ops when use_inductor=False ( #19618 )
2025-06-13 15:05:54 -07:00
0f0874515a
[Doc] Add troubleshooting section to k8s deployment ( #19377 )
...
Signed-off-by: Anna Pendleton <pendleton@google.com >
2025-06-13 21:47:51 +00:00
3597b06a4f
[CUDA] Enable full cudagraph for FlashMLA ( #18581 )
...
Signed-off-by: luka <luka@neuralmagic.com >
2025-06-13 18:12:26 +00:00
1015296b79
[doc][mkdocs] fix the duplicate Supported features sections in GPU docs ( #19606 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-13 16:25:08 +00:00
ce9dc02c93
[Refactor] Remove unused variables in moe_permute_unpermute_kernel.inl ( #19573 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-06-13 06:12:15 -07:00
a24cb91600
[Model] Fix minimax model cache & lm_head precision ( #19592 )
...
Signed-off-by: qingjun <qingjun@minimaxi.com >
2025-06-13 12:08:20 +00:00
7e8d97dd3f
[BugFix] Honor enable_caching in connector-delayed kvcache load case ( #19435 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-06-13 09:46:32 +00:00
d70bc7c029
[torch.compile] reorganize the cache directory to support compiling multiple models ( #19064 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-06-13 15:23:25 +08:00
ce688ad46e
use base version for version comparison ( #19587 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-06-13 15:09:34 +08:00
cefdb9962d
[Fix] The zip function in Python 3.9 does not have the strict argument ( #19549 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
2025-06-13 14:57:48 +08:00
ace5cdaff0
[Fix] bump mistral common to support magistral ( #19533 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
2025-06-12 22:28:12 -07:00
6458721108
[CPU] Refine default config for the CPU backend ( #19539 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-06-13 13:27:39 +08:00
bb4a0decef
[Misc] Correct broken docs link ( #19553 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com >
2025-06-12 22:27:13 -07:00
c707cfc12e
[doc] fix incorrect link ( #19586 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-13 04:26:09 +00:00
7b3c9ff91d
[Doc] uses absolute links for structured outputs ( #19582 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-06-13 03:35:17 +00:00
c68698b326
[Bugfix] Fix EAGLE vocab embedding for multimodal target model ( #19570 )
...
Signed-off-by: qizixi <qizixi@meta.com >
2025-06-12 23:09:19 -04:00
e3b12667d4
[BugFix] : Fix Batched DeepGemm Experts ( #19515 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-06-12 20:43:02 -06:00
e6aab5de29
Revert "[Build/CI] Add tracing deps to vllm container image ( #15224 )" ( #19378 )
2025-06-12 17:26:40 -07:00
c57bb199b3
[V1] Resolve failed concurrent structured output requests ( #19565 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-06-12 23:30:09 +00:00
dba68f9159
[Doc] Unify structured outputs examples ( #18196 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-06-12 22:50:31 +00:00
a3319f4f04
[Bugfix] Enforce contiguous input for dynamic_per_token FP8/INT8 quant ( #19452 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-06-12 15:39:15 -04:00
9d880f594d
[Misc] Turn MOE_DP_CHUNK_SIZE into an env var ( #19506 )
2025-06-12 18:01:16 +00:00
017ef648e9
[Spec Decode][Benchmark] Generalize spec decode offline benchmark to more methods and datasets ( #18847 )
2025-06-12 10:30:56 -07:00
4b25ab14e2
[doc] Make top navigation sticky ( #19540 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-12 15:48:11 +00:00
f98548b9da
[torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass ( #16756 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Sage Moore <sage@neuralmagic.com >
2025-06-12 08:31:04 -07:00
96846bb360
Fix TorchAOConfig skip layers ( #19265 )
...
Signed-off-by: mobicham <hicham@mobiuslabs.com >
2025-06-12 22:22:53 +08:00
b6efafd9e4
[Perf] Vectorize static / dynamic INT8 quant kernels ( #19233 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2025-06-12 06:51:41 -07:00
1129e2b1ab
[V1][NixlConnector] Drop num_blocks check ( #19532 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-06-12 12:36:14 +00:00
c742438f8b
[Doc] Add V1 column to supported models list ( #19523 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-06-12 19:16:44 +08:00
73e2e0118f
[Quantization] Improve AWQ logic ( #19431 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-06-12 11:02:11 +00:00
c9280e6346
[Bugfix] Respect num-gpu-blocks-override in v1 ( #19503 )
...
Signed-off-by: Jon Swenson <jmswen@gmail.com >
2025-06-12 11:00:23 +00:00
af09b3f0a0
[Bugfix][V1] Allow manual FlashAttention for Blackwell ( #19492 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-06-12 10:40:24 +00:00
4f6c42fa0a
[Security] Prevent new imports of (cloud)pickle ( #18018 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com >
2025-06-12 10:30:17 +00:00
dff680001d
Fix typo ( #19525 )
...
Signed-off-by: 2niuhe <carlton2tang@gmail.com >
2025-06-12 09:24:45 +00:00
2e090bd5df
[AMD][Kernel][BugFix] fix test_rocm_compressed_tensors_w8a8 for rocm ( #19509 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2025-06-12 07:14:24 +00:00
1b0b065eb5
[BugFix] Handle missing sep_token for Qwen3-Reranker in Score API ( #19522 )
...
Signed-off-by: strutive07 <strutive07@gmail.com >
2025-06-12 07:00:47 +00:00
d5bdf899e4
[BugFix] Work-around incremental detokenization edge case error ( #19449 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-06-12 06:43:20 +00:00
7e3e74c97c
[Frontend] Improve error message in tool_choice validation ( #19239 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-06-12 01:13:00 -04:00
3f6341bf7f
Add Triton Fused MoE kernel config for E=16 on B200 ( #19518 )
...
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca >
2025-06-12 04:31:51 +00:00
e5d35d62f5
[BugFix] Force registration of w8a8_block_fp8_matmul_deepgemm via lazy import ( #19514 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-06-12 04:28:12 +00:00
2f1c19b245
[CI] change spell checker from codespell to typos ( #18711 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-06-11 19:57:10 -07:00
42f52cc95b
[CI/Build] Fix torch nightly CI dependencies ( #19505 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2025-06-11 14:40:42 -07:00
97a9465bbc
[UX] Add Feedback During CUDAGraph Capture ( #19501 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-06-11 21:09:05 +00:00
c7ea0b56cd
[AMD] [Quantization] Add override flag for attention dtype instead of using kv_cache_dtype trigger ( #17331 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2025-06-11 15:53:28 -04:00
29fa5cac1c
[Kernels] Add activation chunking logic to FusedMoEModularKernel ( #19168 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-06-11 12:53:10 -04:00
b2d9be6f7d
[Docs] Remove WIP features in V1 guide ( #19498 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-06-11 09:15:03 -07:00
04a55612dd
[Misc] Fix misleading ROCm warning ( #19486 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-06-12 00:12:10 +08:00
89b0f84e17
[doc] fix "Other AI accelerators" getting started page ( #19457 )
...
Signed-off-by: David Xia <david@davidxia.com >
2025-06-11 16:11:17 +00:00
497a91e9f7
[CI] Update FlashInfer to 0.2.6.post1 ( #19297 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-06-11 22:57:28 +08:00
943ffa5703
[Bugfix] Update the example code, make it work with the latest lmcache ( #19453 )
...
Signed-off-by: Runzhen Wang <wangrunzhen@gmail.com >
2025-06-11 12:42:20 +00:00
5c8d34a42c
Support no privileged mode on CPU for docker and kubernetes deployments ( #19241 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
2025-06-11 04:11:47 -07:00
3c8694eabe
Fix some typo ( #19475 )
...
Signed-off-by: ximing.wxm <ximing.wxm@antgroup.com >
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com >
2025-06-11 10:36:04 +00:00
7484e1fce2
Add cache to cuda get_device_capability ( #19436 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-06-11 17:37:05 +08:00
a2142f0196
Support non-string values in JSON keys from CLI ( #19471 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-06-11 09:34:04 +00:00
871d6b7c74
[Misc] Reduce warning message introduced in env_override ( #19476 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-06-11 17:29:54 +08:00
29a38f0352
[Doc] Support "important" and "announcement" admonitions ( #19479 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-06-11 01:39:58 -07:00
a5115f4ff5
[Doc] Fix quantization link titles ( #19478 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-06-11 01:27:22 -07:00
68b4a26149
[Doc] Update V1 User Guide for Hardware and Models ( #19474 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-06-11 00:49:06 -07:00
b8e809a057
[Kernel] Support deep_gemm for linear methods ( #19085 )
...
Signed-off-by: artetaout <lulala341@gmail.com >
2025-06-11 15:14:45 +08:00
5039ec2336
[ROCm] Add rules to automatically label ROCm related PRs ( #19405 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-06-11 15:09:18 +08:00
7c644ab6d5
Fix Typo in Documentation and Function Name ( #19442 )
2025-06-10 22:44:11 -07:00
2d40665fe8
Add fused MOE config for Qwen3 30B A3B on B200 ( #19455 )
...
Signed-off-by: Junhao Li <junhao@ubicloud.com >
2025-06-11 13:43:46 +08:00
96ada386b7
[Misc] Remove unused MultiModalHasher.hash_prompt_mm_data ( #19422 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-06-11 05:18:57 +00:00
1e473b3010
[CI] Disable failing GGUF model test ( #19454 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-06-11 05:12:38 +00:00
2b1e2111b0
Fix test_max_model_len in tests/entrypoints/llm/test_generate.py ( #19451 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-06-11 12:54:59 +08:00
a45b979d9f
[BugFix] Fix docker build cpu-dev image error ( #19394 )
...
Signed-off-by: niu_he <carlton2tang@gmail.com >
2025-06-10 20:56:40 -07:00
3952731e8f
[New Model]: Support Qwen3 Embedding & Reranker ( #19260 )
2025-06-10 20:07:30 -07:00
77f0d465d0
[BugFix] Allow use_cudagraph to work with dynamic VLLM_USE_V1 ( #19390 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-06-11 07:54:41 +08:00
22c3c0aa4a
Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B-FP8 ( #19401 )
...
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com >
2025-06-11 07:23:57 +08:00
33f8dba7c6
[Model] use AutoWeightsLoader for commandr ( #19399 )
...
Signed-off-by: py-andy-c <pychen1017@gmail.com >
2025-06-10 22:42:21 +00:00
5241ca50d6
[ROCm][V1] Adding ROCm to the list of plaforms using V1 by default ( #19440 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-06-10 22:06:15 +00:00
da9b523ce1
[Docs] Note that alternative structured output backends are supported ( #19426 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-06-10 16:20:00 +00:00
b6553be1bc
[Misc] Slight improvement of the BNB ( #19418 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-06-10 13:51:49 +00:00
64a9af5afa
Simplify ep kernels installation ( #19412 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-06-10 20:06:08 +08:00
e4248849ec
[BugFix][CPU] Fix CPU CI by ignore collecting test_pixtral ( #19411 )
...
Signed-off-by: jiang.li <jiang1.li@intel.com >
2025-06-10 12:02:40 +00:00
467bef18a3
[BugFix][FlashInfer] Fix attention backend interface mismatch with unexpected keyword use_irope ( #19134 )
...
Signed-off-by: Yunqiu Guo <guorachel@meta.com >
2025-06-10 16:48:51 +08:00
5f1ac1e1d1
Revert "[v1] Add fp32 support to v1 engine through flex attn" ( #19404 )
2025-06-10 01:30:20 -07:00
9368cc90b2
Automatically bind CPU OMP Threads of a rank to CPU ids of a NUMA node. ( #17930 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
Co-authored-by: Li, Jiang <bigpyj64@gmail.com >
2025-06-10 06:22:05 +00:00
32b3946bb4
Add clear documentation around the impact of debugging flag ( #19369 )
...
Signed-off-by: Anna Pendleton <pendleton@google.com >
2025-06-10 06:16:09 +00:00
6b1391ca7e
[Misc] refactor neuron_multimodal and profiling ( #19397 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-10 06:12:42 +00:00
a3f66e75d1
Add security warning to bug report template ( #19365 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com >
2025-06-10 06:06:36 +00:00
319cb1e351
[Core] Batch multi modal input using pinned memory ( #19169 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-06-10 13:44:59 +08:00
1efef71645
[Bugfix] Fix modelscope token passed in ( #19389 )
...
Signed-off-by: wangli <wangli858794774@gmail.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-06-10 13:39:37 +08:00
646d62f636
[Core] Use tuple for kv cache group block ids ( #19175 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-06-10 07:01:17 +02:00
6cd4ae8acd
[Frontend] Add tqdm_leave_pbar to control progress bar visibility ( #19357 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-10 04:55:09 +00:00
c016047ed7
Fix docs/mkdocs/hooks/remove_announcement.py ( #19382 )
2025-06-09 21:36:54 -07:00
9af6d22e4c
Use xla flag to improve the quantized model performance ( #19303 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
2025-06-10 01:28:45 +00:00
4589b94032
[Bugfix] Fix benchmark_moe.py ( #19016 )
...
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn >
2025-06-09 18:04:36 -07:00
cc867be19c
[V1] Reuse V0's memory_profiling util for gpu worker memory profiling ( #19312 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-06-10 08:40:01 +08:00
3a7cd627a8
[Misc] Fix a config typo in disable_hybrid_kv_cache_manager configuration ( #19383 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-06-09 16:41:51 -07:00
8058c91108
[HOT-FIX] Add kv_sharing_target_layer_name argument to cutlass_mla backend ( #19374 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-06-09 19:00:07 -04:00
7d44c469fe
[TPU]Fix KV cache sharing tests ( #19371 )
2025-06-09 18:38:15 -04:00
31f58be96a
[Frontend] Make TIMEOUT_KEEP_ALIVE configurable through env var ( #18472 )
...
Signed-off-by: liusiqian <liusiqian@tal.com >
2025-06-09 21:41:21 +00:00
ebb2f383b8
[Quantization] Bump compressed-tensors version ( #19295 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2025-06-09 14:33:15 -07:00
c1c7dbbeeb
[Bugfix][Core] Prevent token lengths exceeding max_model_len in V0 ( #19348 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-06-09 23:01:29 +08:00
5cf2daea9a
[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. ( #19298 )
...
Signed-off-by: Varun <vsundarr@redhat.com >
Co-authored-by: Varun <vsundarr@redhat.com >
2025-06-09 10:50:39 -04:00
b8089195b4
[v1] Add fp32 support to v1 engine through flex attn ( #19319 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-06-09 22:10:44 +08:00
770e5dcdb8
[full_graph] Fix query_start_loc padding ( #19321 )
...
Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai >
2025-06-09 21:32:56 +08:00
c57c9415b1
[Docs] Fix a bullet list in usage/security.md ( #19358 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-06-09 13:28:51 +00:00
01810f9236
[CI] Introduce rules for llama auto-label ( #19323 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-06-09 20:05:42 +08:00
59abbd84f9
[Fix] Allow kernel compilation for CUDA capability 8.7 ( #19328 )
...
Signed-off-by: Conroy Cheers <conroy@corncheese.org >
2025-06-09 02:57:23 -07:00
95a6568b5c
[CI/Build] Fix LoRA test ( #19350 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-06-09 09:52:10 +00:00
0eca5eacd0
[Doc] Fix description in the Automatic Prefix Caching design doc ( #19333 )
...
Signed-off-by: cr7258 <chengzw258@163.com >
2025-06-09 17:30:02 +08:00
12e5829221
[doc] improve ci doc ( #19307 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-09 07:26:12 +00:00
3a4d417707
[Misc] Cleanup compilation tests ( #19343 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-06-09 15:05:44 +08:00
8335667c22
[Frontend] Remove unreachable code from llm.py ( #19288 )
...
Signed-off-by: KsuParkhamchuk <k.parkhamchuk@gmail.com >
2025-06-09 10:22:10 +08:00
e1c4380d4c
[Misc] Add documentation update reminder to PR template ( #19289 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-06-09 10:20:53 +08:00
e31ae3de36
[Deprecation] Remove inputs arg fallback in Engine classes ( #18799 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-06-09 10:19:56 +08:00
2ffb9b6e07
[Bugfix] model_max_length should consider max_model_len in tokenizer_config ( #19201 )
2025-06-08 07:17:53 -07:00
cda10fa3e2
[Multi Modal] Add an env var for message queue max chunk bytes ( #19242 )
...
Signed-off-by: yZhen <yZhen@fb.com >
Co-authored-by: yZhen <yZhen@fb.com >
2025-06-08 21:39:12 +08:00
c123bc33f9
[Quantization] Add compressed-tensors NVFP4 support ( #18312 )
2025-06-08 09:05:55 -04:00
b9a1791e2c
[Hardware][POWER] Add IBM POWER11 Support to CPU Extension Detection ( #19082 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
2025-06-08 09:17:14 +00:00
989dcee981
Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B ( #19315 )
...
Signed-off-by: Xu Wenqing <xuwq1993@qq.com >
2025-06-08 16:07:02 +08:00
3d64d366e0
[Misc] Change tests/compile to use VLLM_V1 by default ( #19302 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-06-08 16:06:48 +08:00
eaa2e51088
[Bugfix] Re-enable use_cudagraph in vLLM v1 ( #19299 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2025-06-08 08:56:12 +08:00
d77f7fb871
[Bugfix]: Fix TypeError: 'float' object cannot be interpreted as an integer ( #19283 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-06-08 08:16:31 +08:00
2d8476e465
[BugFix][V1] Fix memory profiling bug ( #18974 )
...
Signed-off-by: luka <luka@neuralmagic.com >
2025-06-07 10:34:51 -07:00
88be823d57
[AMD] Update compatible packaging version ( #19309 )
...
Signed-off-by: pramkuma <Pramendra.Kumar@amd.com >
2025-06-07 20:55:09 +08:00
4e4f63ad45
[Nit][Benchmark]Fix example in benchmark_serving_structured_output.py ( #19311 )
...
Signed-off-by: Lifan Shen <lifans@meta.com >
2025-06-07 18:25:38 +08:00
d2f0e7e615
[CI/Build] Improve Llama GGUF test robustness ( #19287 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-06-07 17:23:28 +08:00
122cdca5f6
[Misc] refactor context extension ( #19246 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-07 05:13:21 +00:00
cf02f9b283
Add FlexAttention to V1 ( #16078 )
...
Signed-off-by: drisspg <drisspguessous@gmail.com >
2025-06-06 21:58:55 -07:00
c4296b1a27
[CI][PowerPC] Use a more appropriate way to select testcase in tests/models/language/pooling/test_embedding.py ( #19253 )
...
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com >
2025-06-07 11:52:52 +08:00
66c508b137
[TPU][Test] Add script to run benchmark on TPU for buildkite ( #19039 )
...
Signed-off-by: Qiliang Cui <derrhein@gmail.com >
2025-06-06 20:10:24 -07:00
84166fee97
[Kernel] Integrate CUTLASS MoE kernel with PPLX ( #18762 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-06-06 18:26:11 -07:00
6e0cd10f72
[Easy][Test] Simplify test_function_tool_use with multiple parametrizes ( #19269 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-06-07 09:19:09 +08:00
e010688f50
[Build][ROCm] Update Dockerfile.rocm ( #19296 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-06-06 19:35:16 -04:00
441b65d8c7
[Misc][Tools][Benchmark] Fix and improve auto tune script ( #19163 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2025-06-06 23:31:19 +00:00
46ecc57973
[BugFix] Fix tpu_model_runner block_id concatenation ( #19228 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-06-06 16:28:17 -07:00
b6a3a9f76d
[Core] Fix abrupt request abort ( #18485 )
...
Signed-off-by: nicklucche <nlucches@redhat.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-06-06 16:27:59 -07:00
ca27f0f9c1
[Bugfix][Core] Update cancellation logic in generate() to handle Generator exits ( #19225 )
...
Co-authored-by: Adolfo Victoria <adovi@meta.com >
2025-06-06 20:17:54 +00:00
aad30bd306
[BugFix] Fix MultiConnector test after HMA changes ( #19291 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-06-06 20:16:24 +00:00
94ecee6282
Fixed ppc build when it runs on non-RHEL based linux distros ( #18422 )
...
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com >
Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com >
Signed-off-by: npanpaliya <nishidha.panpaliya@partner.ibm.com >
Co-authored-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com >
2025-06-06 11:54:26 -07:00
8267f9916f
improve logits bias ( #19041 )
2025-06-06 19:59:25 +08:00
7353492a47
[Core] Raise when non-multi-instance DP clients target a DP rank ( #19227 )
...
Signed-off-by: Jon Swenson <jmswen@gmail.com >
2025-06-06 19:03:01 +08:00
7661e92ef8
[Model] Optimize nemotron_h implementation ( #19249 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-06-06 10:05:14 +00:00
f168b85725
Unit Test for run_dp_sharded_vision_model ( #19103 )
...
Signed-off-by: Siqi Yan <siqi@meta.com >
Co-authored-by: Siqi Yan <siqi@meta.com >
2025-06-06 16:24:02 +08:00
da511d54d8
Fix CompilationConfig repr ( #19091 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-06-06 16:23:35 +08:00
65c69444b1
[Docs] Improve V1 KVConnector interface documentation ( #19172 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-06-06 16:22:45 +08:00
94870359cd
[Quantization] Bump compressed-tensors version; update NVFP4A16 test model ( #19224 )
...
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
2025-06-06 01:21:54 -07:00
0d49483ea9
[TPU] fix kv cache dtype in model runner ( #19244 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-06-06 16:20:16 +08:00
90b78ec5f9
[v1][P/D] Fix a edge case in kv cache schedule ( #19182 )
...
Co-authored-by: jinghui <jinghui@fb.com >
2025-06-05 23:32:55 -07:00
91a2ef98ea
[Chore] update CODEOWNERS ( #19247 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-06-06 06:09:43 +00:00
3da2313d78
Support allowed_token_ids in ChatCompletionRequest ( #19143 )
...
Signed-off-by: Xu Song <xusong.vip@gmail.com >
2025-06-06 05:06:48 +00:00
b61dc5f972
[TPU] update torch_xla pin ( #19231 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-06-06 04:27:38 +00:00
f8a1a2d108
[v1] Hybrid Memory Allocator ( #17996 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-06-05 20:47:09 -07:00
3465b87ef8
[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B ( #19033 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
2025-06-05 19:10:08 -07:00
c8134bea15
Fix AOPerModuleConfig name changes ( #18869 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
2025-06-05 18:51:32 -07:00
cb6d572e85
[Model] NemotronH support ( #18863 )
...
Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com >
Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com >
2025-06-05 21:29:28 +00:00
87360308b7
[V1] Use FlashInfer by default on Blackwell GPUs ( #19118 )
2025-06-05 15:40:39 -04:00
aa49f14832
[Quantization] Skip Fp4 Test for compressed-tensors ( #19217 )
2025-06-05 18:21:53 +00:00
9ef9173cfa
[P/D][NixlConnector] Enable FlashInfer backend ( #19090 )
2025-06-05 17:10:15 +00:00
85e2b7bb13
[MISC][Bugfix] Use less CPU when message queue has been empty for some time ( #16226 )
...
Signed-off-by: Povilas Kanapickas <povilas@radix.lt >
2025-06-05 16:53:08 +00:00
61059bee40
[Hardware][NVIDIA] FP4 MoE kernel optimization ( #19110 )
...
Signed-off-by: Chiyue Wei <chiyuew@nvidia.com >
Co-authored-by: Chiyue Wei <chiyuew@nvidia.com >
2025-06-05 09:48:26 -07:00
ec89524f50
Add H20-3e fused MoE kernel tuning configs for DeepSeek-R1/V3 ( #19205 )
2025-06-05 16:38:54 +00:00
f20f9f063b
[mistral_common] Add v11 tokenizer ( #19193 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
2025-06-05 08:27:41 -07:00
9bc8bb07cf
[Bugfix] properly catch PIL-related errors for vision models when incorrect data urls are provided ( #19202 )
...
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
2025-06-05 12:59:28 +00:00
1aeb925f34
[Frontend] improve vllm run-batch --help display ( #19187 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-05 11:16:25 +00:00
188a4590d8
[Misc] Do not override NCCL_CUMEM_ENABLE if set explicitly ( #19105 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-06-05 11:14:32 +00:00
18093084be
[Misc] Remove unnecessary fallback to prefill-decode attention ( #19138 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-06-05 16:08:26 +08:00
da40380214
[Build] Annotate wheel and container path for release workflow ( #19162 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-06-04 23:24:56 -07:00
8fc57501d3
[Bugfix]: Fix the incompatibility issue with stream when Thinking is disabled ( #19135 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-06-05 06:24:24 +00:00
af7fc84fd2
[BugFix][Minor] Fix full cuda graph bug when max_num_seqs < 512 ( #19171 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-06-05 13:41:25 +08:00
0678b52251
Handle non-serializable objects when dumping benchmark results ( #19114 )
2025-06-04 22:40:04 -07:00
25b918eee6
[Torch Nightly]add missing dependency ( #18770 )
...
Signed-off-by: Yang Wang <elainewy@meta.com >
2025-06-04 21:56:12 -07:00
a408820f2f
[Bugfix] Fix port handling in make_zmq_path ( #19117 )
2025-06-04 21:00:59 -06:00
c56ed8bb0e
[Bugfix][Nixl] Fix full prefix cache hit bug ( #18632 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-06-05 02:07:32 +00:00
78dcf56cb3
[doc] small fix ( #19167 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-05 09:13:50 +08:00
b2fac67130
[P/D] Heterogeneous TP ( #18833 )
...
Signed-off-by: nicklucche <nlucches@redhat.com >
2025-06-04 23:25:34 +00:00
23027e2daf
[Misc] refactor: simplify EngineCoreClient.make_async_mp_client in AsyncLLM ( #18817 )
...
Signed-off-by: googs1025 <googs1025@gmail.com >
2025-06-04 15:37:25 -07:00
c3fd4d669a
[Kernel] Integrate batched/masked deepgemm kernel ( #19111 )
...
Signed-off-by: Varun <vsundarr@redhat.com >
Co-authored-by: Varun <vsundarr@redhat.com >
2025-06-04 21:59:18 +00:00
ef3f98b59f
[Bugfix] fix v1 cpu worker fails on macOS ( #19121 )
2025-06-04 20:17:38 +00:00
7ee2590478
[TPU] Update dynamo dump file name in compilation test ( #19108 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-06-04 16:13:43 -04:00
53a5a0ce30
[Perf] Tunings for SM100 FP8 CUTLASS kernel ( #18778 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-06-04 10:46:28 -07:00
d459fae0a2
[Bugfix][EP+DP] Fix internode check ( #19112 )
...
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
2025-06-04 23:39:23 +08:00
c8dcc15921
Allow AsyncLLMEngine.generate to target a specific DP rank ( #19102 )
...
Signed-off-by: Jon Swenson <jmswen@gmail.com >
2025-06-04 08:26:47 -07:00
8f4ffbd373
[Doc] Update V1 Guide for embedding models ( #19141 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-06-04 22:57:55 +08:00
5f2cd251d2
Sm100 blockwise fp8 swap ab ( #18564 )
2025-06-04 07:48:45 -07:00
02658c2dfe
Add DeepSeek-R1-0528 function call chat template ( #18874 )
...
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com >
2025-06-04 13:24:18 +00:00
01dc9a76db
[CI/Build][Bugfix] Ensure compatibility with transformers 4.52 ( #18678 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-06-04 04:49:20 -07:00
35cf32df30
Improve the output precision of embedding models ( #19092 )
2025-06-04 11:48:57 +00:00
8711bc5e68
[Misc] Add packages for benchmark as extra dependency ( #19089 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-06-04 04:18:48 -07:00
2669a0d7b5
Fix ValueError: Missing value for tag key(s): model_name,engine. ( #19113 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2025-06-04 17:10:45 +08:00
8e972d9c44
[TPU] Skip hanging tests ( #19115 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-06-04 01:43:00 -07:00
3336c8cfbe
Fix #19130 ( #19132 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
2025-06-04 01:42:06 -07:00
b124e1085b
[Bugfix] Fix FA3 full cuda graph correctness ( #19106 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-06-03 23:10:15 -07:00
41aa578428
[NVIDIA] Add Cutlass MLA backend ( #17625 )
2025-06-03 21:40:26 -07:00
8d646c2e53
[Cleanup][v1]:remote guided-decoding-backend for example ( #19059 )
...
Signed-off-by: calvin chen <120380290@qq.com >
2025-06-04 04:23:26 +00:00
5d6d1adf15
[KERNEL] Sampler. CUDA kernel for applying repetition penalty ( #18437 )
2025-06-03 21:13:01 -07:00
1409ef9134
[Core] Cast multimodal input in hf processor ( #18862 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-06-03 20:24:56 -07:00
4555143ea7
[CPU] V1 support for the CPU backend ( #16441 )
2025-06-03 18:43:01 -07:00
52dceb172d
[Docs] Add developer doc about CI failures ( #18782 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-06-04 01:09:13 +00:00
abd7df2fca
[Misc] Fix path and python alias errors in disagg_prefill exmaples ( #18919 )
2025-06-03 17:15:18 -07:00
b712be98c7
feat: add data parallel rank to KVEventBatch ( #18925 )
2025-06-03 17:14:20 -07:00
a8da78eac9
[Bugfix] Max concurrency estimation and check_enough_kv_cache_memory for models with sliding window layers ( #19029 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-06-04 00:14:06 +00:00
5d96533e22
[Bugfix][P/D] Fix Prefix Cache Bug ( #18411 )
...
Signed-off-by: nicklucche <nlucches@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-06-03 23:53:16 +00:00
4de790fcad
[Bugfix]: Fix the incompatibility issue with tool_choice 'required' when Thinking is enabled ( #19075 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-06-03 23:27:24 +00:00
b5fd9506c1
[Bugfix] get_num_blocks_to_allocate with null_block ( #19031 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-06-03 15:30:55 -07:00
135cf55cd1
[V1][Spec Decode][Ngram] 1.35x gain -> 1.95x gain on InstructCoder with prompt fix ( #18971 )
2025-06-03 15:26:33 -07:00
6cac54f4d1
[v1] Re-init input batch for multiple kv cache groups ( #18654 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-06-03 21:41:36 +00:00
6865fe0074
Fix interaction between Optional and Annotated in CLI typing ( #19093 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Yikun Jiang <yikun@apache.org >
2025-06-03 21:07:19 +00:00
e31446b6c8
[Perf] Tune scaled_fp8_quant by increasing vectorization ( #18844 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-06-03 13:48:25 -07:00
bdf13965ab
[V1] Support cross-layer KV sharing ( #18212 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-06-03 20:33:07 +00:00
fa98d77773
[Kernel] DeepEP dispatch-combine kernel integration ( #18434 )
...
Signed-off-by: Varun <vsundarr@redhat.com >
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com >
2025-06-03 12:30:02 -07:00
01eee40536
[doc] update docker version ( #19074 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-03 19:08:21 +00:00
19bdaf32b1
[Doc] Readme standardization ( #18695 )
...
Co-authored-by: Soren Dreano <soren@numind.ai >
2025-06-03 11:50:55 -07:00
02f0c7b220
[Misc] Add SPDX-FileCopyrightText ( #19100 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-06-03 11:20:17 -07:00
d054da1992
[Misc] fix: add miss best_of param validation ( #18555 )
...
Signed-off-by: googs1025 <googs1025@gmail.com >
2025-06-03 11:02:07 -07:00
4b7817c119
[Misc] Add missing _Backend enums ( #19081 )
...
Signed-off-by: nicklucche <nlucches@redhat.com >
2025-06-03 16:15:16 +00:00
d00dd65cd4
[Doc] Improve the Pull Request template with key components ( #19086 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-06-03 23:44:34 +08:00
d81edded69
[Bugfix] disable processor cache ( #19068 )
...
Signed-off-by: raushan <raushan@huggingface.co >
2025-06-03 15:06:04 +00:00
476844d44c
Fix underscores in dict keys passed via CLI ( #19030 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-06-03 14:39:24 +00:00
4e68ae5e59
[CI/Build] Remove V0 LoRA test ( #19066 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-06-03 14:30:18 +00:00
4e88723f32
[doc] clarify windows support ( #19088 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-06-03 21:42:17 +08:00
118ff92111
[Doc] Update V1 user guide for embedding and enc-dec models ( #19060 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-06-03 02:29:41 -07:00
ec2dcd80bc
[Misc] Update WeightsMapper for qwen2-vl/qwen2.5-vl ( #19054 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-06-03 09:08:20 +00:00
42243fbda0
[Doc] Add InternVL LoRA support ( #19055 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-06-03 09:08:03 +00:00
6d18ed2a2e
Update docker docs with ARM CUDA cross-compile ( #19037 )
...
Signed-off-by: mgoin <michael@neuralmagic.com >
2025-06-03 08:21:53 +00:00
f32fcd9444
[v1][KVCacheManager] Rename BlockHashType to BlockHash ( #19015 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-06-03 08:01:48 +00:00
d32aa2e670
[Bugfix] Use cmake 3.26.1 instead of 3.26 to avoid build failure ( #19019 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-06-03 00:16:17 -07:00
cc977286e7
Reduce logs in CLI scripts and plugin loader ( #18970 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-06-03 06:00:45 +00:00
17430e3653
[bugfix] small fix logic issue ( #18999 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-03 05:35:12 +00:00
1282bd812e
Add tarsier model support ( #18985 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
2025-06-03 13:13:13 +08:00
bdce64f236
[V1] Support DP with Ray ( #18779 )
2025-06-02 21:15:13 -07:00
9e6f61e8c3
[ROCm][Build] Clean up the ROCm build ( #19040 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-06-02 20:47:47 -07:00
8655f47f37
[CPU][CI] Re-enable the CPU CI tests ( #19046 )
...
Signed-off-by: jiang.li <jiang1.li@intel.com >
2025-06-02 20:46:47 -07:00
4ce42f9204
Adding "LoRA Test %N" to AMD production tests ( #18929 )
...
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu >
2025-06-02 20:46:44 -07:00
8a57872b2a
[Bugfix][EP+DP] Use pplx-kernel internode instead of intranode ( #19034 )
...
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-06-03 11:36:51 +08:00
5bc1ad6cee
[Doc] Remove duplicate TOCs during MkDocs migration ( #19021 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com >
2025-06-02 19:49:48 -07:00
9112b443a0
[Hardware][TPU] Initial support of model parallelism with single worker using SPMD ( #18011 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com >
Co-authored-by: Chengji Yao <chengjiyao@google.com >
2025-06-03 00:06:20 +00:00
c57d577e8d
add an absolute path for run.sh ( #18258 )
...
Signed-off-by: calvin chen <120380290@qq.com >
2025-06-02 19:38:23 +00:00
ca2f6b9c30
[Bugfix][Model] Attempt to fix eagle in V0. ( #18978 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-06-02 08:15:53 -07:00
20133cfee2
[Frontend] enable custom logging for the uvicorn server (OpenAI API server) ( #18403 )
...
Signed-off-by: François Paupier <francois.paupier@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-06-02 15:04:23 +00:00
ebb1ec9318
[Model] enable data parallel for Llama4 vision encoder ( #18368 )
...
Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com >
Co-authored-by: yZhen <yZhen@fb.com >
Co-authored-by: yzhen <yzhen@devgpu093.cco2.facebook.com >
2025-06-02 19:22:54 +08:00
5b168b6d7a
[doc] add pytest tips ( #19010 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-02 11:07:26 +00:00
9760fd8f6a
[Core] Support inplace model weights loading ( #18745 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-06-02 17:38:50 +08:00
b9f61e1387
[Bugfix][Nixl] Fix DP Metadata Handshake ( #19008 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-06-02 03:30:41 +00:00
d6fd3a33b8
[Misc] reuse num_tokens_across_dp of get_dp_padding to avoid unnecessary dp all reduce in set_forward_context ( #18935 )
...
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
2025-06-01 19:41:18 +00:00
432ec9926e
[doc] wrong output ( #19000 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-06-01 11:26:14 +00:00
2b102d51ad
[BugFix] Fix incorrect metrics shutdown error log message ( #18992 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-06-01 11:42:23 +08:00
aa54a7bf7b
[BugFix] fix data parallel construct ipv6 url addres ( #18991 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-06-01 11:42:10 +08:00
2ad6194a02
Let max_num_batched_tokens use human_readable_int for large numbers ( #18968 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-06-01 11:41:29 +08:00
c594cbf565
[doc] small fix - mkdocs ( #18996 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-31 20:23:43 -07:00
a35ca765a5
[LoRA] Support dynamically initialize packed_modules_mapping for VLM with arbitrary components ( #18987 )
...
Signed-off-by: isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-06-01 11:06:57 +08:00
6aa8f9a4e7
[Core] Rework dtype resolution ( #18751 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-06-01 11:04:23 +08:00
1bc86a3da1
[Bugfix] Fix EAGLE3 broken logits ( #18909 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
2025-05-31 19:58:07 -07:00
bbfa0c61d1
[Misc][Benchmark] Add support for CustomDataset ( #18511 )
2025-05-31 19:07:38 +00:00
20079c6e36
[Misc] add return token strs for tokenize ( #18941 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-31 18:00:11 +00:00
9a1b9b99d7
[BugFix] Fix multi-node offline data-parallel ( #18981 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Yizhou Liu <liu_yizhou@outlook.com >
2025-05-31 08:34:52 -07:00
8bf507d766
[P/D] NixlConnector use cache device index for memory registration ( #18969 )
...
Signed-off-by: Piotr Tarasiewicz <ptarasiewicz@nvidia.com >
2025-05-31 11:19:18 -04:00
306d60401d
[ROCm][Kernel] Add gfx950 support for skinny gemms ( #18010 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2025-05-31 07:40:05 -07:00
f2c3f66d59
[Bugfix] Fix for issue 17396 ( #18773 )
...
Signed-off-by: Fred Reiss <frreiss@us.ibm.com >
2025-05-31 11:58:17 +00:00
0f5e0d567e
[FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 ( #18825 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-05-31 03:39:31 -07:00
c55d804672
[BugFix] Pydantic part 2 ( #18911 )
...
Signed-off-by: luka <luka@neuralmagic.com >
2025-05-31 03:39:28 -07:00
749f5bdd38
[doc] fix the list rendering issue - security.md ( #18982 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-31 10:39:21 +00:00
2a50ef5760
[Neuron] Add Multi-Modal model support for Neuron ( #18921 )
...
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com >
Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com >
Co-authored-by: Rohith Nallamaddi <nalrohit@amazon.com >
Co-authored-by: FeliciaLuo <luof@amazon.com >
Co-authored-by: Elaine Zhao <elaineyz@amazon.com >
2025-05-31 10:39:11 +00:00
b8b904795d
fix security issue of logging llm output ( #18980 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com >
2025-05-31 10:38:56 +00:00
ba5111f237
[Bugfix]: Fix the incompatibility issue with Structured Outputs when Thinking is disabled ( #18879 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-05-31 09:20:54 +00:00
1e123529d7
[Misc] Fix estimated max model len msg ( #18966 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-05-31 16:43:44 +08:00
dff80b0e42
[Frontend] Add rerank support to run_batch endpoint ( #16278 )
...
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io >
2025-05-31 07:40:01 +00:00
7782464a17
create util function for batched arange ( #18937 )
2025-05-31 13:50:38 +08:00
0f71e24034
[Docs] Correct multiprocessing design doc ( #18964 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-31 01:30:15 +00:00
1dab4d5718
Tool parser regex timeout handling ( #18960 )
...
Signed-off-by: Will Eaton <weaton@redhat.com >
2025-05-30 21:02:54 +00:00
7f21e8052b
[Misc] add group_size is -1 in awq quantization ( #18910 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-05-30 17:34:22 +00:00
5a8641638a
[VLM] Add PP support and fix GPTQ inference for Ovis models ( #18958 )
...
Signed-off-by: isotr0py <2037008807@qq.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-30 17:11:44 +00:00
f49239cb45
Benchmark script for fp8 vs bf16 gemm ( #17126 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-30 10:56:11 -06:00
2dbe8c0774
[Perf] API-server scaleout with many-to-many server-engine comms ( #17546 )
2025-05-30 08:17:00 -07:00
84ec470fca
Improve "failed to get the hash of the compiled graph" error ( #18956 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-05-30 15:00:54 +00:00
b29ca5c4d5
[Docs] Update SECURITY.md with link to our security guide ( #18961 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-30 07:37:27 -07:00
ec6833c5e9
[doc] show the count for fork and watch ( #18950 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-30 06:45:59 -07:00
e1fadf1197
[Feature] minicpm eagle support ( #18943 )
...
Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com >
Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com >
2025-05-30 06:45:56 -07:00
43ff405b90
[CI/Build] remove regex from build dependencies ( #18945 )
...
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-30 04:02:50 -07:00
fba02e3bd1
[Bugfix][TPU] Fix tpu model runner testcase failure ( #18810 )
...
Signed-off-by: Carol Zheng <cazheng@google.com >
2025-05-30 18:04:03 +08:00
4577fc9abb
[Misc]Fix typo ( #18947 )
2025-05-30 02:21:35 -07:00
5f1d0c8118
[Bugfix][Failing Test] Fix test_vllm_port.py ( #18618 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2025-05-30 17:13:47 +08:00
c3bb9f2331
[Model] Use in-place adds in SigLIP ( #18922 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-30 17:12:59 +08:00
8f8900cee9
[doc] add mkdocs doc ( #18930 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-30 07:58:44 +00:00
6acb7a6285
[Misc]Fix benchmarks/README.md for speculative decoding ( #18897 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2025-05-30 07:58:04 +00:00
4f4a6b844a
[Deprecation] Remove mean pooling default for Qwen2EmbeddingModel ( #18913 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-30 06:53:37 +00:00
4d0a1541be
[Bugfix] Remove NVFP4 scales assertions to fix load_format=dummy ( #18861 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-30 13:37:36 +08:00
77b6e74fe2
[ROCm] Remove unnecessary assertion of max_model_len in ROCM_AITER_MLA attention backend. ( #18938 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-05-29 22:33:17 -07:00
5acf828d99
[docs] fix: fix markdown syntax ( #18927 )
2025-05-30 05:20:48 +00:00
3987e2ae96
[Model] Use AutoWeightsLoader for mamba2 ( #18918 )
...
Signed-off-by: iLeGend <824040212@qq.com >
2025-05-30 04:50:10 +00:00
77164dad5e
[Bugfix] Consistent ascii handling in tool parsers ( #18883 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-05-30 04:44:43 +00:00
3de3eadf5b
improve the robustness of parsing vlms config in AutoRound ( #18894 )
...
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com >
2025-05-29 19:24:47 -07:00
3132290a14
[TPU][CI/CD] Clean up docker for TPU tests. ( #18926 )
...
Signed-off-by: Carol Zheng <cazheng@google.com >
2025-05-30 10:24:19 +08:00
1aa2f81b43
[Misc] Update type annotation for rotary embedding base ( #18914 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-30 10:17:01 +08:00
d54af615d5
[Bugfix] Fix PP default fallback behavior for V1 ( #18915 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-30 10:13:17 +08:00
a1cc9f33a3
[TPU] remove transpose ops in moe kernel ( #18923 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-05-29 23:00:11 +00:00
a521ef06e5
Use standalone_compile by default in torch >= 2.8.0 ( #18846 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-05-30 06:41:58 +08:00
64eaf5fe05
[P/D] NixlConnector DP fixes ( #18903 )
...
Signed-off-by: Will Eaton <weaton@redhat.com >
2025-05-29 18:08:40 +00:00
d1d61f3351
[BugFix] Make DP work with connector-delayed new requests ( #18559 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Will Eaton <weaton@redhat.com >
2025-05-29 18:04:18 +00:00
32ce3cf7c9
[V1] Allocate kv_cache with stride order for V1 ( #18775 )
...
Signed-off-by: nicklucche <nlucches@redhat.com >
2025-05-29 17:54:16 +00:00
d58f9c7f7a
[Misc] Remove duplicate init for self.vllm_config ( #18896 )
...
Signed-off-by: googs1025 <googs1025@gmail.com >
2025-05-29 17:26:07 +00:00
c29034037d
[Deprecation] Disallow pos-args other than model when initializing LLM ( #18802 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-29 09:36:58 -07:00
1b7cfd5a36
[ROCm][V0][Attention] Revert to the previous FA triton kernel ( #18226 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-29 12:13:18 -04:00
da4b69d0b4
[Attention][V1] Toggle for v1 attention backend ( #18275 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-29 10:48:24 -04:00
c9479b2920
[Bugfix] Fix the failing gte embedding test ( #18720 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-29 07:39:25 -07:00
6f2909405e
[Doc] Fix codeblocks formatting in LoRA adapters documentation ( #18907 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com >
2025-05-29 07:38:55 -07:00
b169d5f7b6
[Misc][Tools][Benchmark] Add benchmark_serving supports for llama.cpp. ( #18692 )
...
Signed-off-by: Duyi-Wang <duyi.wang@intel.com >
2025-05-29 20:02:08 +08:00
f8977c233f
Fix an error in dummy weight loading for quantization models ( #18855 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2025-05-29 03:07:20 -07:00
f274581f44
[BugFix] Update pydantic to fix error on python 3.10 ( #18852 )
...
Signed-off-by: luka <luka@neuralmagic.com >
2025-05-29 03:05:46 -07:00
0b1447f890
[Bugfix] Ensure tensors are contiguous during serialisation ( #18860 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-29 03:05:20 -07:00
24d0ef8970
[Misc] Replace TODO in serving transcription ( #18895 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-05-29 02:58:14 -07:00
7fcfd954ff
[Bugfix] Fix misleading information in the documentation ( #18845 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-29 02:54:14 -07:00
e740d07f07
[doc] add CLI doc ( #18871 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-29 09:51:36 +00:00
a652e71dd0
[Doc] Remove redundant spaces from compatibility_matrix.md ( #18891 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-05-29 02:51:20 -07:00
34d6c447c4
[LoRA] Add LoRA support for InternVL ( #18842 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-29 08:46:24 +00:00
972eddf7c9
[Neuron] Add multi-LoRA support for Neuron. ( #18284 )
...
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com >
2025-05-29 16:41:22 +08:00
fd7bb88d72
Fixes a dead link in nightly benchmark readme ( #18856 )
...
Signed-off-by: Brent Salisbury <bsalisbu@redhat.com >
2025-05-29 04:41:39 +00:00
3c49dbdd03
Skip device and quant Pydantic validation to make plugin device work ( #18843 )
...
Signed-off-by: Yikun Jiang <yikunkero@gmail.com >
2025-05-28 20:12:30 -07:00
1661a9c28f
[Doc][Neuron] Update documentation for Neuron ( #18868 )
...
Signed-off-by: Elaine Zhao <elaineyz@amazon.com >
2025-05-28 19:44:01 -07:00
8e882ffdc0
[Bugfix][TPU] fix moe custom kernel import ( #18853 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-05-28 19:34:19 -07:00
26b4fa45be
Add ability to use CUDAGraphs with use_inductor=False ( #17345 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-05-29 10:16:52 +08:00
515b413ebf
Prevent the cross-encoder logic from being applied to classification tasks ( #18838 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-28 19:16:17 -07:00
269d901734
[Bugfix][ROCm] fix the power of 2 exception from triton_unified_attention.py when running llama4 models and unit test fix ( #18100 )
...
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-05-29 07:21:46 +08:00
7951d78738
[Core] Enable CUDA graphs for DP + All2All kernels ( #18724 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-05-28 22:55:30 +00:00
6dbe5b5c93
Remove checks for None for fields which should never be None ( #17985 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-28 21:32:19 +00:00
643622ba46
[Hardware][TPU][V1] Multi-LoRA Optimisations for the V1 TPU backend ( #15655 )
...
Signed-off-by: Akshat Tripathi <akshat@krai.ai >
Signed-off-by: Chengji Yao <chengjiyao@google.com >
Signed-off-by: xihajun <junfan@krai.ai >
Signed-off-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk >
Signed-off-by: Jorge de Freitas <jorge@krai.ai >
Co-authored-by: Chengji Yao <chengjiyao@google.com >
Co-authored-by: xihajun <junfan@krai.ai >
Co-authored-by: Jorge de Freitas <jorge.de-freitas22@imperial.ac.uk >
Co-authored-by: Jorge de Freitas <jorge@krai.ai >
2025-05-28 19:59:09 +00:00
a09c7ca9f2
[Chore][Spec Decode] Update check NoneType instead of assigning variables ( #18836 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-05-28 18:57:19 +00:00
0e98964e94
[V1][Metrics] Remove metrics that were deprecated in 0.8 ( #18837 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-05-28 18:54:12 +00:00
c68b5c63eb
[Misc] fix olmoe model layer can't laod in tp gt 1 ( #18828 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-05-28 17:36:21 +00:00
fced756923
[Chore] update ty configuration ( #18839 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-05-28 08:59:11 -07:00
321331b8ae
[Core] Add Lora Support to Beam Search ( #18346 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2025-05-28 08:58:24 -07:00
6e4cea1cc5
decrement server_load on listen for disconnect ( #18784 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com >
2025-05-28 22:15:12 +08:00
435fa95444
[Frontend] add run batch to CLI ( #18804 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-28 07:08:57 -07:00
4c2b38ce9e
Enable Pydantic mypy checks and convert configs to Pydantic dataclasses ( #17599 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-28 12:46:04 +00:00
d781930f90
[Platform][Dist] Make torch distributed process group extendable ( #18763 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-05-28 10:52:34 +00:00
ce75efeecb
[BugFix] FA2 MLA Accuracy Issue ( #18807 )
...
Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com >
2025-05-28 08:59:39 +00:00
aa42561e40
Fix PiecewiseCompileInterpreter ( #17338 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-05-28 08:40:53 +00:00
de65fc8e1e
[CI] improve embed testing ( #18747 )
2025-05-28 00:16:35 -07:00
0c492b7824
[Deprecation] Remove fallbacks for Embeddings API ( #18795 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-28 15:09:04 +08:00
0f0926b43f
[Deprecation] Remove unused sync methods in async_timeout ( #18792 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-28 15:08:48 +08:00
7f2c1a87e9
[Deprecation] Require overriding get_dummy_text and get_dummy_mm_data ( #18796 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-28 15:08:35 +08:00
b78f844a67
[Bugfix][FailingTest]Fix test_model_load_with_params.py ( #18758 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2025-05-28 05:42:54 +00:00
5e13c07d00
[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (2) ( #18781 )
...
Signed-off-by: Ronald Xu <ronaldxu@amazon.com >
2025-05-28 05:09:14 +00:00
774c5fde30
[V1] fix torch profiling for V1 offline scenarios ( #18445 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2025-05-28 04:16:30 +00:00
9a21e331ff
[Bugfix]: correctly propagate errors message caught at the chat_templating step to the client ( #18769 )
...
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
2025-05-28 03:35:43 +00:00
3e9ce609bd
[Bugfix] Fix nomic max_model_len ( #18755 )
2025-05-27 20:29:53 -07:00
794ae1f551
[rocm] Fix wrong attention log ( #18764 )
...
Signed-off-by: Felix Marty <felmarty@amd.com >
2025-05-27 19:45:41 -07:00
d73a9457a5
[Core] Improve Tensor serialisation ( #18774 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-28 09:46:21 +08:00
a3896c7f02
[Build] Fixes for CMake install ( #18570 )
2025-05-27 20:49:24 -04:00
51e98e4ffd
[Bugfix] Disable prefix caching by default for benchmark ( #18771 )
...
Signed-off-by: cascade812 <cascade812@outlook.com >
2025-05-28 08:18:09 +08:00
e56f44d9ec
Support datasets in vllm bench serve and sync with benchmark_[serving,datasets].py ( #18566 )
2025-05-27 19:59:48 -04:00
e0cbad4e30
[Neuron] Support quantization on neuron ( #18283 )
...
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com >
2025-05-27 22:10:33 +00:00
b48d5cca16
[CI/Build] [TPU] Fix TPU CI exit code ( #18282 )
...
Signed-off-by: Carol Zheng <cazheng@google.com >
2025-05-27 14:54:59 -07:00
5873877241
[Bugfix] Mistral tool calling when content is list ( #18729 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-27 09:05:37 -07:00
696259ca01
[Core] Automatically cast multi-modal input dtype ( #18756 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-27 23:45:48 +08:00
6b6d496114
optimize get_kv_cache_torch_dtype ( #18531 )
...
Signed-off-by: idellzheng <idellzheng@tencent.com >
2025-05-27 13:08:44 +00:00
aaa4ac1c95
Disable prefix cache by default for benchmark ( #18639 )
...
Signed-off-by: cascade812 <cascade812@outlook.com >
2025-05-27 20:06:34 +08:00
06a0338015
[V1][Metrics] Add API for accessing in-memory Prometheus metrics ( #17010 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-05-27 09:37:06 +00:00
4318c0559d
[CI/Build] Remove imports of built-in re ( #18750 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-27 09:19:18 +00:00
a68e293cb9
[Doc] Convert Sphinx directives ( {class}, {meth}, {attr}, ...) to MkDocs format for better documentation linking ( #18663 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com >
2025-05-27 01:44:20 -07:00
6881107948
[BUG FIX] minicpm ( #18739 )
...
Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com >
Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com >
2025-05-27 01:04:49 -07:00
e0f0ff87b8
[Build] fix cpu build missing libtbbmalloc.so ( #18744 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
2025-05-27 01:03:56 -07:00
c24b1572ac
Minor fix about MooncakeStoreConnector ( #18721 )
...
Signed-off-by: baoloongmao <baoloongmao@tencent.com >
2025-05-27 08:02:28 +00:00
4693a3438c
[Doc] cleanup deprecated flag for doc ( #18715 )
...
Signed-off-by: calvin chen <120380290@qq.com >
2025-05-27 07:12:02 +00:00
bbd9a84dc5
[Hardware][Intel-Gaudi] [CI/Build] Fix multiple containers using the same name in run-hpu-test.sh ( #18752 )
...
Signed-off-by: Lukasz Durejko <ldurejko@habana.ai >
2025-05-27 00:10:26 -07:00
a547aeb828
feat(rocm-support): support mamba2 on rocm ( #18565 )
...
Signed-off-by: Islam Almersawi <islam.almersawi@openinnovation.ai >
Co-authored-by: Islam Almersawi <islam.almersawi@openinnovation.ai >
2025-05-27 00:07:53 -07:00
fc6d0c290f
[Misc] improve docs ( #18734 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-27 07:07:01 +00:00
753944fa9b
[Doc] Update reproducibility doc and example ( #18741 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-27 07:03:13 +00:00
25a817f202
[Doc] Update OOT model docs ( #18742 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-27 06:30:31 +00:00
d260f799a9
[FEAT] [ROCm] Upgrade AITER Fused MoE kernels. ( #18271 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-05-26 23:14:07 -07:00
b50602d5f0
[Model][Gemma3] Cast image pixel values already on CPU ( #18732 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-27 05:42:54 +00:00
1f1b1bc03b
[V1][Quantization] Add CUDA graph compatible v1 GGUF support ( #18646 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-27 04:40:28 +00:00
1f88dbd2bb
[Misc] improve web section group title display ( #18684 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-27 04:35:16 +00:00
0eebd74842
[Model][Gemma3] Simplify image input validation ( #18710 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-27 11:13:37 +08:00
27bebcd897
Convert examples to ruff-format ( #18400 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-26 16:57:54 +00:00
e7523c2e03
[V1][Sampler] Improve performance of FlashInfer sampling by sampling logits instead of probs ( #18608 )
2025-05-26 11:49:36 -04:00
a869baca73
[Bugfix] Fix Llama GGUF initialization ( #18717 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 07:49:22 -07:00
82e2339b06
[Doc] Move examples and further reorganize user guide ( #18666 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 07:38:04 -07:00
9553fdb41e
[Doc] Improve API docs ( #18713 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 07:33:34 -07:00
243eb9199f
[Bugfix]: handle hf-xet CAS error when loading Qwen3 weights in vLLM ( #18701 )
2025-05-26 07:10:56 -07:00
0665e29998
[Misc] add AutoGen integration ( #18712 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-26 13:56:18 +00:00
e76be06550
[Hardware][Intel-Gaudi] [CI/Build] Add tensor parallel size = 2 test to HPU CI ( #18709 )
...
Signed-off-by: Lukasz Durejko <ldurejko@habana.ai >
2025-05-26 05:26:07 -07:00
0877750029
[CI/Build] Split pooling and generation extended language models tests in CI ( #18705 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-26 04:00:08 -07:00
6d68030f1c
[Model] Add support for YARN in NemotronNAS models ( #18427 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com >
2025-05-26 10:31:49 +00:00
5a2c76cbe1
[CI] fix dump_input for str type ( #18697 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-26 18:23:35 +08:00
38b13dfe78
[CI/Build] Replace math.isclose with pytest.approx ( #18703 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 02:05:17 -07:00
61a45e7a72
[Bugfix] Fix Mistral-format models with sliding window ( #18693 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 01:44:04 -07:00
65523a0995
[Doc] Fix issue template format ( #18699 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 00:45:39 -07:00
4b7740a105
[GH] Add issue template for reporting CI failures ( #18696 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 00:42:04 -07:00
4ea62c0ea0
[CI] add missing argument ( #18694 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-26 00:22:04 -07:00
561b77a0d6
[Bugfix] Fix the lm_head in gpt_bigcode in lora mode ( #6357 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Signed-off-by: Max de Bayser <maxdebayser@gmail.com >
2025-05-26 14:52:25 +08:00
abd4030d94
refactor: simplify request handler, use positive condition check for handler assignment ( #18690 )
...
Signed-off-by: googs1025 <googs1025@gmail.com >
2025-05-26 06:32:28 +00:00
8820821b59
[Misc] Fixed the abnormally high TTFT issue in the PD disaggregation example ( #18644 )
...
Signed-off-by: zhaohaidao <zhaohaidao2008@hotmail.com >
Signed-off-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com >
Co-authored-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com >
2025-05-26 13:51:27 +08:00
fba0642704
[CI/Build][Doc] Update gte-Qwen2-1.5B-instruct usage ( #18683 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-05-25 20:27:50 -07:00
6071e989df
[Core][Multimodal] Convert PIL Image to array without data copy when hashing ( #18682 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-25 17:33:35 +00:00
57fd13a707
[Bugfix] Fix profiling dummy data for Pixtral ( #18677 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-25 14:05:30 +00:00
3a886bd58c
[Misc] small improve ( #18680 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-25 06:05:38 -07:00
35be8fad62
[CI/build] fix no regex ( #18676 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-25 10:10:51 +00:00
f2faac745d
[Bugfix] Fix cpu usage and cache hit stats reporting on cpu environment ( #18674 )
...
Signed-off-by: zzzyq <zhangyuqi94@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-25 02:36:06 -07:00
279f854519
[doc] improve readability ( #18675 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-25 01:40:31 -07:00
624b77a2b3
[doc] fix broken links ( #18671 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-25 01:36:33 -07:00
503f8487c2
[Misc] Reduce logs on startup ( #18649 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 23:03:53 -07:00
44073a7ac3
[BUGFIX] catch subclass first for try...except ( #18672 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-25 05:34:24 +00:00
63934543a0
Speed up the kernels/quantization/ tests ( #18669 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-25 05:02:59 +00:00
75f81750f3
[VLM] Initialize video input support for InternVL models ( #18499 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-25 04:51:25 +00:00
6ab681bcbe
[Misc][ModelScope] Change to use runtime VLLM_USE_MODELSCOPE ( #18655 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-05-25 04:51:21 +00:00
cebc22f3b6
[Misc]Replace cuda hard code with current_platform in Ray ( #14668 )
...
Signed-off-by: noemotiovon <757486878@qq.com >
2025-05-24 20:26:31 -07:00
6c6dcd8611
[MISC] correct signature for LoaderFunction ( #18670 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-24 20:17:47 -07:00
7891fdf0c6
[V1] Fix _pickle.PicklingError: Can't pickle <class 'transformers_modules.deepseek-ai.DeepSeek-V2-Lite... ( #18640 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2025-05-24 20:07:20 -07:00
6825d9a998
[BugFix][Spec Decode] Improve Prefix Caching Logic in Speculative Decoding ( #18668 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-05-24 17:33:46 -07:00
b554ab736e
[CI/Build] fix permission denied issue ( #18645 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-24 16:09:10 +00:00
9ea7f1abf3
fix(regression): clone from reference items ( #18662 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-05-24 15:25:20 +00:00
2807271c86
[CI] enforce import regex instead of re ( #18665 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-05-24 08:04:14 -07:00
b9018a3f9f
[BugFix] Fix import error for fused_moe ( #18642 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-05-24 07:53:36 -07:00
4ceafb6299
[MISC] typo fix and clean import ( #18664 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-24 07:52:09 -07:00
2e6705784f
[CI/Build] chmod +x to cleanup_pr_body.sh ( #18650 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 07:26:45 -07:00
1cb194a018
[Doc] Reorganize user guide ( #18661 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 07:25:33 -07:00
2cd4d58df4
[Model] use AutoWeightsLoader for gpt2 ( #18625 )
...
Signed-off-by: zt2370 <ztang2370@gmail.com >
2025-05-24 13:36:13 +00:00
6d166a8d35
[Doc] Add community links ( #18657 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 06:06:38 -07:00
ef1dd6870f
[Doc] Fix indentation problems in V0 Paged Attention docs ( #18659 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 06:06:35 -07:00
e77dc4bad8
[MISC][pre-commit] Add pre-commit check for triton import ( #17716 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-05-24 20:09:15 +08:00
07458a51ce
[Doc] Update README links, mark external links ( #18635 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 09:57:15 +00:00
c1e4a4052d
[V1][Spec Decode] Support multi-layer eagle draft model ( #18030 )
...
Signed-off-by: qizixi <qizixi@meta.com >
2025-05-24 09:45:34 +00:00
a859320575
[Model] Add support for Qwen2.5-Omni-7B-AWQ (Qwen2_5OmniForConditionalGeneration) ( #18647 )
2025-05-24 09:15:36 +00:00
441dc63ac7
[Frontend] improve vllm serve --help display ( #18643 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-24 07:53:22 +00:00
d55e446d13
[V1][Spec Decode] Small refactors to improve eagle bookkeeping performance ( #18424 )
...
Signed-off-by: qizixi <qizixi@meta.com >
2025-05-24 06:51:22 +00:00
ec82c3e388
FIX MOE issue in AutoRound format ( #18586 )
...
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com >
2025-05-23 22:01:40 -07:00
45ab403a1f
config.py: Clarify that only local GGUF checkpoints are supported. ( #18623 )
...
Signed-off-by: Mathieu Bordere <mathieu@letmetweakit.com >
2025-05-24 08:46:34 +08:00
2b10ba7491
[Bugfix][Nixl] Fix Preemption Bug ( #18631 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-05-23 23:30:16 +00:00
4fc1bf813a
[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking ( #18454 )
...
Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com >
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com >
2025-05-23 16:16:26 -07:00
f2036734fb
[ModelOpt] Introduce VLLM_MAX_TOKENS_PER_EXPERT_FP4_MOE env var to control blockscale tensor allocation ( #18160 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-05-23 15:52:20 -07:00
7d9216495c
[Doc] Update references to doc files ( #18637 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 15:49:21 -07:00
0ddf88e16e
[CI] Enable test_initialization to run on V1 ( #16736 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-23 15:09:44 -07:00
1645b60196
Use prebuilt FlashInfer x86_64 PyTorch 2.7 CUDA 12.8 wheel for CI ( #18537 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-05-23 21:17:16 +00:00
2628a69e35
[V1] Support Deepseek MTP ( #18435 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn >
Co-authored-by: Rui Qiao <ruisearch42@gmail.com >
2025-05-23 10:26:28 -07:00
371f7e4ca2
[Doc] Fix broken links and unlinked docs, add shortcuts to home sidebar ( #18627 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 10:22:40 -07:00
15b45ffb9a
[Doc] Avoid documenting dynamic / internal modules ( #18626 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 09:58:02 -07:00
273cb3b4d9
[Doc] Fix top-level API links/docs ( #18621 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 09:46:56 -07:00
8ddd1cf26a
[Doc] fix list formatting ( #18624 )
...
Signed-off-by: David Xia <david@davidxia.com >
2025-05-23 09:41:17 -07:00
6550114c9c
[v1] Redo "Support multiple KV cache groups in GPU model runner ( #17945 )" ( #18593 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-23 09:39:47 -07:00
9520a989df
[Docs] Change mkdocs to not use directory urls ( #18622 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-23 09:33:21 -07:00
3d28ad343f
Fix figures in design doc ( #18612 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 09:09:54 -07:00
6a7988c55b
Refactor pplx init logic to make it modular (prepare for deepep) ( #18200 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-05-23 23:43:43 +08:00
022d8abe29
[Doc] Use a different color for the announcement ( #18616 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 08:25:03 -07:00
5221815a00
[Doc] Fix markdown list indentation for MkDocs rendering ( #18620 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com >
2025-05-23 08:23:21 -07:00
1068556b2c
[Bugfix][Build/CI] Fixup CUDA compiler version check for CUDA_SUPPORTED_ARCHS ( #18579 )
2025-05-23 07:43:58 -07:00
2cd1fa4556
[Misc] add Haystack integration ( #18601 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-23 06:21:19 -07:00
d4c2919760
Include private attributes in API documentation ( #18614 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 06:18:31 -07:00
6220f3c6b0
[Bugfix] Fix transformers model impl ignored for mixtral quant ( #18602 )
...
Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com >
2025-05-23 05:54:13 -07:00
52fb23f47e
Fix examples with code blocks in docs ( #18609 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 05:53:44 -07:00
6dd51c7ef1
[CI/Build] Fix V1 flag being set in entrypoints tests ( #18598 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 05:51:53 -07:00
2edb533af2
Replace {func} with mkdocs style links ( #18610 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 05:51:38 -07:00
38a95cb4a8
[Doc] Fix indent of contributing to vllm ( #18611 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com >
2025-05-23 05:50:07 -07:00
cd821ea5d2
[CI] fix kv_cache_type argument ( #18594 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-23 04:49:18 -07:00
7ab056c273
[Hardware][CPU] Update intel_extension_for_pytorch 2.7.0 and move to requirements/cpu.txt ( #18542 )
...
Signed-off-by: Kay Yan <kay.yan@daocloud.io >
2025-05-23 04:38:42 -07:00
6526e05111
Add myself as docs code owner ( #18605 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 04:08:31 -07:00
e493e48524
[V0][Bugfix] Fix parallel sampling performance regression when guided decoding is enabled ( #17731 )
...
Signed-off-by: Madeesh Kannan <shadeMe@users.noreply.github.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-05-23 03:38:23 -07:00
4ce64e2df4
[Bugfix][Model] Fix baichuan model loader for tp ( #18597 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-05-23 02:39:05 -07:00
fbb13a2c15
Revert "[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal ( #18034 )" ( #18600 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 02:18:22 -07:00
a1fe24d961
Migrate docs from Sphinx to MkDocs ( #18145 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 02:09:53 -07:00
d0bc2f810b
[Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 cpu platform ( #18430 )
...
Signed-off-by: Yuqi Zhang <yuqizhang@google.com >
Co-authored-by: Yuqi Zhang <yuqizhang@google.com >
2025-05-23 01:41:37 -07:00
b046cf792d
[Feature][V1]: suupports cached_tokens in response usage ( #18149 )
...
Co-authored-by: simon-mo <xmo@berkeley.edu >
2025-05-23 01:41:03 -07:00
54af915949
[Doc] Update quickstart and install for cu128 using --torch-backend=auto ( #18505 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-23 08:36:37 +00:00
71ea614d4a
[Feature]Add async tensor parallelism using compilation pass ( #17882 )
...
Signed-off-by: cascade812 <cascade812@outlook.com >
2025-05-23 01:03:34 -07:00
4c611348a7
[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal ( #18034 )
...
Signed-off-by: Ronald Xu <ronaldxu@amazon.com >
2025-05-23 00:37:18 -07:00
60cad94b86
[Hardware] correct method signatures for HPU,ROCm,XPU ( #18551 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-22 22:31:59 -07:00
9c1baa5bc6
[Misc] Replace cuda hard code with current_platform ( #16983 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-05-23 04:38:50 +00:00
4be2255c81
[Bugfix][Benchmarks] Fix a benchmark of deepspeed-mii backend to use api_key ( #17291 )
...
Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com >
2025-05-23 12:30:47 +08:00
ed5d408255
[Neuron] Remove bypass on EAGLEConfig and add a test ( #18514 )
...
Signed-off-by: Elaine Zhao <elaineyz@amazon.com >
2025-05-22 21:26:32 -07:00
583507d130
[Spec Decode] Make EAGLE3 draft token ID mapping optional ( #18488 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-05-22 20:17:39 -07:00
e44d8ce8c7
[Bugfix] Set KVTransferConfig.engine_id in post_init ( #18576 )
...
Signed-off-by: Linkun Chen <github@lkchen.net >
2025-05-23 02:54:42 +00:00
93ecb8139c
[BugFix] Increase TP execute_model timeout ( #18558 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-23 10:22:11 +08:00
fae453f8ce
[Misc] refactor: simplify input validation and num_requests handling in _convert_v1_inputs ( #18482 )
...
Signed-off-by: googs1025 <googs1025@gmail.com >
2025-05-23 10:15:32 +08:00
4b0da7b60e
Enable hybrid attention models for Transformers backend ( #18494 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 10:12:08 +08:00
c6b636f9fb
[V1][Spec Decoding] Use model_loader.get_model() to load models ( #18273 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-05-23 02:05:44 +00:00
04eb88dc80
Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. ( #18569 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
2025-05-23 01:59:18 +00:00
46791e1b4b
[AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh ( #18568 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2025-05-22 18:45:35 -07:00
c32e249a23
[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization ( #17926 )
...
Signed-off-by: Sanger Steel <sangersteel@gmail.com >
2025-05-22 18:44:18 -07:00
c91fe7b1b9
[Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser ( #17917 )
...
Signed-off-by: Kai Wu <kaiwu@meta.com >
2025-05-22 16:44:08 -07:00
a04720bc36
[V1][Spec Decode][Bugfix] Load quantize weights for EAGLE ( #18290 )
2025-05-22 15:17:33 -07:00
7b9d832c80
[Tool] Add NIXL installation script ( #18172 )
...
Signed-off-by: Linkun <github@lkchen.net >
2025-05-22 14:33:16 -07:00
6e588da0f4
[Build/CI] Fix CUDA 11.8 build ( #17679 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-22 12:13:54 -07:00
f8d2cc5f55
[Compile][Platform] Make PiecewiseBackend pluggable and extendable ( #18076 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-05-22 12:11:53 -07:00
721fb9b181
[Platform] Move platform check to right place ( #18470 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-05-22 12:11:28 -07:00
1f3a1200e4
[Bugfix] make test_openai_schema.py pass ( #18224 )
...
Signed-off-by: David Xia <david@davidxia.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-22 18:34:06 +00:00
54631f8262
[Misc] Call ndarray.tobytes() directly instead of ndarray.data.tobytes() ( #18347 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-22 09:00:13 -07:00
cb506ecb5a
[Misc] improve Automatic Prefix Caching example ( #18554 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-22 14:50:46 +00:00
93f71673ce
[BugFix][CPU] Fix x86 SHM distributed module initialization ( #18536 )
...
Signed-off-by: jiang.li <jiang1.li@intel.com >
2025-05-22 07:35:00 -07:00
3f505233fd
[Doc] Add stream flag for chat completion example ( #18524 )
...
Signed-off-by: calvin chen <120380290@qq.com >
2025-05-22 14:07:10 +00:00
4e04eceb58
[Bugfix] Use random hidden states in dummy sampler run ( #18543 )
...
Signed-off-by: Bowen Wang <abmfy@icloud.com >
2025-05-22 06:48:56 -07:00
71075029f2
[Doc] Support --stream arg in openai_completion_client.py script ( #18388 )
...
Signed-off-by: googs1025 <googs1025@gmail.com >
2025-05-22 13:20:17 +00:00
ca86a7cf6e
[CI/Build] Update bamba test model location ( #18544 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-22 06:01:07 -07:00
a35a494745
[Bugfix] Add kwargs to RequestOutput __init__ to be forward compatible ( #18513 )
...
Signed-off-by: Linkun <github@lkchen.net >
2025-05-22 05:24:43 -07:00
f6037d1907
[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text ( #18526 )
...
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-22 05:22:53 -07:00
fa72f9a812
Order sequence ids + config update to support specifying custom quantization layers ( #18279 )
...
Signed-off-by: Elaine Zhao <elaineyz@amazon.com >
Co-authored-by: Tailin Pan <tailinpa@amazon.com >
Co-authored-by: Rishabh Rajesh <rishyraj@amazon.com >
Co-authored-by: Yishan McNabb <yishanm@amazon.com >
Co-authored-by: Patrick Lange <patlange@amazon.com >
Co-authored-by: Maxwell Goldberg <mgld@amazon.com >
Co-authored-by: Aakash Shetty <sheaak@amazon.com >
2025-05-22 02:20:36 -07:00
ebed81fbf5
Update default neuron config for speculation ( #18274 )
...
Signed-off-by: Elaine Zhao <elaineyz@amazon.com >
Co-authored-by: Shashwat Srijan <sssrijan@amazon.com >
Co-authored-by: Aakash Shetty <sheaak@amazon.com >
2025-05-22 02:18:55 -07:00
e2d7d31244
[Neuron] Update Dockerfile.neuron to use latest neuron release (2.23) ( #18512 )
...
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com >
2025-05-22 02:17:34 -07:00
23b67b37b2
[Doc] Fix invalid JSON in example args ( #18527 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-22 07:11:46 +00:00
db5a29ba19
[Bugfix] Fix LoRA test ( #18518 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-21 21:48:53 -07:00
51797775c3
[Bugfix][Model] Make Olmo2Model weight loading return loaded weights ( #18504 )
...
Signed-off-by: Shane A <shanea@allenai.org >
2025-05-21 21:17:03 -07:00
cf5984b2fe
[BugFix][DP] Send DP wave completion only from dp_rank==0 ( #18502 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: kourosh hakhamaneshi <kourosh@anyscale.com >
2025-05-21 20:25:25 -07:00
d022115cc6
[Bugfix] Inconsistent token calculation compared to HF in llava family ( #18479 )
...
Signed-off-by: jaycha <jaycha@ncsoft.com >
2025-05-21 20:21:47 -07:00
acb54ca8e1
Intialize io_thread_pool attribute in the beginning. ( #18331 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2025-05-21 20:21:14 -07:00
6e0fd34d3c
[CI] Fix race condition with StatelessProcessGroup.barrier ( #18506 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-21 20:19:13 -07:00
176d62e4ea
[MISC] update project urls in pyproject.toml ( #18519 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-21 20:17:34 -07:00
20bd6f4d2e
[FalconH1] Fix output dtype in RMSNorm fallback path for Falcon-H1 (e.g. 0.5B) ( #18500 )
...
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae >
Co-authored-by: younesbelkada <younesbelkada@gmail.com >
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae >
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae >
2025-05-21 19:23:59 -07:00
1f079540db
[Bugfix] Consistent ascii handling in tool parsers ( #17704 )
...
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com >
2025-05-21 20:41:23 +00:00
94d8ec8d2b
[FEAT][ROCm] Upgrade AITER MLA v1 backend ( #18338 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-05-21 10:34:28 -07:00
bb0a311213
Revert "[v1] Support multiple KV cache groups in GPU model runner ( #17945 ) ( #18459 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-05-21 10:25:23 -07:00
dd5fa7e04f
[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 ( #17004 )
...
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com >
2025-05-21 08:35:00 -07:00
2b16104557
[Misc] Update deprecation message for --enable-reasoning ( #18404 )
2025-05-21 07:33:11 -07:00
371376f996
[Build] fix Dockerfile shell ( #18402 )
2025-05-21 07:32:06 -07:00
c6c10ca920
[Bugfix] Reduce moe_sum test size to avoid OOM ( #18484 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-05-21 06:46:39 -07:00
c154d89306
[Doc] fix arg docstring in linear layers ( #18410 )
...
Signed-off-by: giantcroc <1204449533@qq.com >
2025-05-21 06:45:57 -07:00
eca18691d2
[MODEL] FalconH1 ( #18406 )
...
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae >
Co-authored-by: younesbelkada <younesbelkada@gmail.com >
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae >
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae >
2025-05-21 04:59:06 -07:00
61acfc45bc
[Bugfix][Failing Test] Fix test_events.py ( #18460 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2025-05-21 04:57:28 -07:00
107f5fc4cb
[Misc] refactor disaggregated-prefill-v1 example ( #18474 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-21 11:10:14 +00:00
907f935de9
[V1] Fix general plugins not loaded in engine for multiproc ( #18326 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-05-21 01:21:49 -07:00
5d7f545204
[Frontend] deprecate --device arg ( #18399 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
2025-05-21 01:21:17 -07:00
cd8dfc6dfc
[Misc] MultiConnector._connectors type ( #18423 )
...
Signed-off-by: nicklucche <nlucches@redhat.com >
2025-05-20 22:48:43 -07:00
d06dd72ba9
[Bugfix][Failing Test] Fix nixl connector test when promt size < block size ( #18429 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-05-20 22:41:44 -07:00
ad0012a0ac
Revert "[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text ( #18407 )" ( #18456 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-20 22:39:22 -07:00
92247c522e
[Bug] Fix moe_sum signature ( #18440 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-05-20 22:37:08 -07:00
0c15c2e486
[Bugfix] config.head_dim is now explicitly set to None ( #18432 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-20 21:04:33 -07:00
3b17ea26e4
[TPU] Re-enable the Pallas MoE kernel ( #18025 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2025-05-20 19:52:27 -07:00
23baa2180b
fix:Build torch wheel inline rather than picking from nightly ( #18351 )
...
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com >
2025-05-20 22:22:24 +00:00
980a172474
[Kernel] update comment for KV shape in unified triton attn ( #18099 )
...
Signed-off-by: haochengxia <xhc_1007@163.com >
2025-05-20 11:19:34 -07:00
e1f5a71ed7
[Model] use AutoWeightsLoader for bloom ( #18300 )
...
Signed-off-by: calvin chen <120380290@qq.com >
2025-05-20 09:40:05 -07:00
f4a8a37465
[Minor] Rename quantization nvfp4 to modelopt_fp4 ( #18356 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-20 09:08:37 -07:00
8f55962a7f
[Misc] refactor prompt embedding examples ( #18405 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-20 15:26:12 +00:00
be48360c1f
[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text ( #18407 )
...
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com >
2025-05-20 06:59:48 -07:00
86847700d7
[CI] Add mteb testing to test the accuracy of the embedding model ( #17175 )
2025-05-20 06:51:12 -07:00
d6c86d09ae
Update cpu.txt ( #18398 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
2025-05-20 10:53:23 +00:00
6b35cb10a0
[Misc] Add LoRA code owner ( #18387 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-20 03:27:30 -07:00
1b1e8e05ff
[doc] update env variable export ( #18391 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-20 08:53:27 +00:00
bca55b556f
[Bugfix] fix adding bias twice in ipex GPTQ quantization ( #18363 )
...
Signed-off-by: rand-fly <randfly@outlook.com >
2025-05-20 00:54:33 -07:00
d981396778
[release] Change dockerhub username for TPU release ( #18389 )
2025-05-19 23:49:23 -07:00
9609327fa4
[Core] [Bugfix]: tensor parallel with prompt embeds ( #18171 )
...
Signed-off-by: Nan2018 <nan@protopia.ai >
Co-authored-by: Andrew Sansom <andrew@protopia.ai >
2025-05-19 20:21:27 -07:00
f07a673eb2
[Misc] Allow AutoWeightsLoader to skip loading weights with specific substr in name ( #18358 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-19 20:20:12 -07:00
d565e0976f
[neuron] fix authorization issue ( #18364 )
...
Signed-off-by: Liangfu Chen <liangfc@amazon.com >
2025-05-19 23:30:32 +00:00
258bf621d5
fix CUDA_check redefinition in #17918 ( #18287 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com >
2025-05-19 13:42:35 -07:00
dc1440cf9f
Neuron up mistral ( #18222 )
...
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com >
2025-05-19 09:54:47 -07:00
8171221834
[Misc] Fix typo ( #18330 )
2025-05-19 09:51:01 -07:00
7937c2fd52
Add files via uploadAdd fused MoE kernel tuning configs (fp8_w8a8) for DeepSeek V3/R1 on a single-node 8x NVIDIA H20 96GB setup ( #18337 )
2025-05-19 09:49:57 -07:00
e2ee1e8e9e
[Feature]Add support for models quantized with AutoRound ( #17850 )
...
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com >
2025-05-19 09:38:53 -07:00
20d8ce81eb
[Frontend] add --quick option for vllm chat/complete ( #18297 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-19 09:36:13 -07:00
84ab4feb7e
[Doc] Fix typo ( #18355 )
2025-05-19 16:05:16 +00:00
6781af5608
[Quantization] Pool model support bitsandbytes ( #18087 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-19 09:03:43 -07:00
1b15df2546
[BugFix] Fix handling of num_computed_tokens with connector ( #18232 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2025-05-19 09:03:25 -07:00
43b5f61dce
[Doc] Move input-related docs to Features ( #18353 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-19 15:08:39 +00:00
c5bb0ebdc6
[Doc] Fix prompt embedding examples ( #18350 )
...
Signed-off-by: wangli <wangli858794774@gmail.com >
2025-05-19 06:48:16 -07:00
d637b96099
[BugFix] [Vul] Add missing usedforsecurity=False in MD5 hashing to enable FIPS ( #18319 )
...
Signed-off-by: cascade812 <cascade812@outlook.com >
Signed-off-by: shaoyuyoung <shaoyuyoung@gmail.com >
Co-authored-by: cascade <cascade812@outlook.com >
2025-05-19 01:31:23 -07:00
275c5daeb0
fix: Add type specifications for CLI arguments in tensorizer options ( #18314 )
2025-05-18 23:42:17 -07:00
47fda6d089
[Build] Supports CUDA 12.6 and 11.8 after Blackwell Update ( #18316 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-05-18 23:19:33 -07:00
27d0952600
[Misc] extract parser.parse_args() ( #18323 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-19 04:06:26 +00:00
221cfc2fea
Feature/vllm/input embedding completion api ( #17590 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
Signed-off-by: Nan2018 <nan@protopia.ai >
Co-authored-by: 临景 <linjing.yx@alibaba-inc.com >
Co-authored-by: Bryce1010 <bryceyx@gmail.com >
Co-authored-by: Andrew Sansom <andrew@protopia.ai >
Co-authored-by: Andrew Sansom <qthequartermasterman@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-18 20:18:05 -07:00
9da1095daf
[Spec Decode][V0] Fix spec decode correctness test in V0 eagle/medusa ( #18175 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-05-18 19:49:46 -07:00
d1211f8794
[Doc] Add doc to explain the usage of Qwen3 thinking ( #18291 )
...
Signed-off-by: WangErXiao <863579016@qq.com >
2025-05-18 23:04:07 +00:00
b6a6e7a529
[Misc] add litellm integration ( #18320 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-18 15:32:30 +00:00
4fb349f66a
Fix copy-paste error in phi4mm image processing ( #18315 )
...
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com >
2025-05-18 07:00:12 -07:00
908733aca7
[Model] Use sigmoid for single-label classification ( #18313 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-05-18 07:00:09 -07:00
1a8f68bb90
[doc] update reasoning doc ( #18306 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-18 06:59:14 -07:00
9ab2c02ff8
Support sequence parallelism combined with pipeline parallelism ( #18243 )
...
Signed-off-by: cascade812 <cascade812@outlook.com >
2025-05-17 22:47:25 +00:00