Commit Graph

  • 17c540a993 [torch.compile] fix simple inductor graph partition test (#27050) Boyuan Feng 2025-10-16 18:09:36 -07:00
  • 4d4d6bad19 [Chore] Separate out vllm.utils.importlib (#27022) Cyrus Leung 2025-10-17 08:48:59 +08:00
  • 11ae016bd7 [torch.compile] Passing only necessary compilation config to inductor pass config (#27041) Lucia Fang 2025-10-16 17:01:52 -07:00
  • 41d3071918 [NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel (#26714) jiahanc 2025-10-16 16:20:25 -07:00
  • fb5e10d3fb Refactor Transformers backend to use mixins (#26906) Harry Mellor 2025-10-16 22:50:39 +01:00
  • b2f78cbad4 [small][batch invariance] Rename the env and internal flags to simplify usage (#26855) Bram Wasti 2025-10-16 14:40:25 -07:00
  • 23583ee28c [Bug] Add Assertion for random-input-len / random-output-len (#26834) Wentao Ye 2025-10-16 17:36:39 -04:00
  • 01c977e96d [CI] Prune Quantization Tests and skip compilation (#27038) Michael Goin 2025-10-16 17:26:35 -04:00
  • b3dda72c23 [Feature] Migrate DeepGEMM API from get_m_alignment_for_contiguous_layout to get_mk_alignment_for_contiguous_layout (#26935) Wentao Ye 2025-10-16 16:46:48 -04:00
  • fb0571b077 [GPTOSS][DP/EP][Marlin] Enable GPTOSS Batched DP/EP using Marlin kernels (#25997) Varun Sundar Rabindranath 2025-10-16 15:53:11 -04:00
  • 2ed8b6b3d0 [Bug] Fix batch invariant test has to is (#27032) Wentao Ye 2025-10-16 15:45:14 -04:00
  • 013abde6ef Adding Warmup to Benchmark Serving (#26943) kimbochen 2025-10-16 15:44:32 -04:00
  • a5464dcf92 [Compressed Tensors] Always clone output for compile robustness (#26849) Kyle Sayers 2025-10-16 15:29:59 -04:00
  • ac3ed5a815 Support block size of 256 used by Intel HPU (#26883) Mandy Li 2025-10-16 12:10:57 -07:00
  • e6ba2000ae [gpt-oss][1/N] EZ: refactor serving_responses for modularity (#26948) Andrew Xia 2025-10-16 11:44:06 -07:00
  • aa255ff55a Support set in the CLI generation (#27031) Harry Mellor 2025-10-16 19:07:18 +01:00
  • 7bb736d00e Fix Qwen2.5 VL image grid docstring (#27033) ZiTian Zhao 2025-10-17 00:57:36 +08:00
  • 69c9a01538 disable flashinfer warmup Woosuk Kwon 2025-10-16 16:49:29 +00:00
  • 01e389cd94 fix woosuk/router-nixl Woosuk Kwon 2025-10-16 16:48:51 +00:00
  • 9f4e30904b [Model] Fix Qwen3VL mm mapping (#27027) Jee Jee Li 2025-10-17 00:45:59 +08:00
  • 9decb2a5b1 Merge remote-tracking branch 'test/nixl-ptp-gt-dtp' into woosuk/router-nixl Woosuk Kwon 2025-10-16 16:34:15 +00:00
  • 5afd3276df [Feature] Add process_weights_after_loading to AttentionImpl (#26870) rongfu.leng 2025-10-16 23:02:30 +08:00
  • 43721bc67f [CI] Replace large models with tiny alternatives in tests (#24057) Tahsin Tunan 2025-10-16 20:51:27 +06:00
  • 02d709a6f1 [docs] standardize Hugging Face env var to HF_TOKEN (deprecates HUGGING_FACE_HUB_TOKEN) (#27020) Kay Yan 2025-10-16 22:31:02 +08:00
  • 4a510ab487 [NIXL] Improve request_finished() debug logs (#25665) Mark McLoughlin 2025-10-16 14:55:17 +01:00
  • 314fa8abbf [Attention] Tune CUTLASS MLA num_splits (#26846) Matthew Bonanni 2025-10-16 09:36:09 -04:00
  • 334535b6fb [Benchmark] Show E2EL by default for pooling models (#27014) Cyrus Leung 2025-10-16 20:47:09 +08:00
  • dcbb3f1871 [Bugfix] Correct LayerNorm epsilon parameter in modernbert.py (#27008) bogdanm 2025-10-16 17:27:44 +05:00
  • 00417f4e44 [MISC] fix import violations for re and triton modules (#26654) Sungjae Lee 2025-10-16 19:38:27 +09:00
  • ed344f4116 Cleanup code after Python 3.10 upgrade (#26520) Lukas Geiger 2025-10-16 11:38:23 +01:00
  • e51928793e [Model][Bugfix] fix ernie45 vl run failed from shared experts optimization (#26885) CSWYF3634076 2025-10-16 18:37:35 +08:00
  • d2740fafbf [Chore] Separate out vllm.utils.collections (#26990) Cyrus Leung 2025-10-16 16:35:35 +08:00
  • 17838e50ef [Benchmark] Use truncation by default for pooling benchmarks (#26992) Cyrus Leung 2025-10-16 16:02:39 +08:00
  • 44c8555621 [CI/Build] Fix AMD import failures in CI (#26841) Zhewen Li 2025-10-16 00:28:20 -07:00
  • f7d318de2b [Hardware][CPU][PowerPC]Disable torch.compile() in toptopk sampling (#26987) Akash kaothalkar 2025-10-16 11:06:59 +05:30
  • 76f0d05bc6 [CI/Build] Update expected beam search output for Phi3V (#26978) Cyrus Leung 2025-10-16 13:12:44 +08:00
  • 7d8975de84 Deepseek-v3 Batch Invariant on 8xH100 (#26609) Bram Wasti 2025-10-15 22:06:02 -07:00
  • 785d8b6410 [PERF] Qwen3-next MTP speedup (change bool mask indexing to index_select / index_copy to reduce d2h) (#26437) Vadim Gimpelson 2025-10-16 08:18:31 +04:00
  • f6cdc9a02f [Chore] Rename utils submodules (#26920) Cyrus Leung 2025-10-16 11:58:13 +08:00
  • c72d44ba4a Add test for batched triton fallback behavior copilot-swe-agent[bot] 2025-10-16 03:46:02 +00:00
  • c292032b44 Add env var to control batched triton kernel fallback copilot-swe-agent[bot] 2025-10-16 03:42:58 +00:00
  • b286fba2bb Initial plan copilot-swe-agent[bot] 2025-10-16 03:37:04 +00:00
  • 509cdc0370 [DOC][XPU]update feature parity with Intel GPU (#26954) Chendi.Xue 2025-10-15 22:07:10 -05:00
  • 9b6504c307 [BugFix] Work around graph partition x torch.compile cache issue (#26956) Richard Zou 2025-10-15 23:06:11 -04:00
  • e19b16dde6 [bugfix] Fix SP + PP without specifying compile size (#26955) Angela Yi 2025-10-15 20:05:33 -07:00
  • 582f2c6be7 [BUG] Allow runai_streamer_sharded in config check (#26958) ahao-anyscale 2025-10-15 20:05:14 -07:00
  • f8a0acbdbe [CI] Enable Blackwell Llama4 MoE tests (#26731) Michael Goin 2025-10-15 23:02:57 -04:00
  • 1317034379 [ROCm][FEAT] Fuse DeepSeek shared experts into AITER fused_moe ops (#24097) kliuae 2025-10-16 10:41:34 +08:00
  • 0ecc553ee6 [Bugfix] reasoning_parser parameter handling in run_batch.py (#26225) InChang Jeong 2025-10-16 11:24:05 +09:00
  • f96bc3649c [Qwen3-Next] Add tuned MoE config for Qwen3-Next FP8 on H100 tp2 (#26887) felixzhu555 2025-10-15 18:55:05 -07:00
  • 8935ca208d Merge branch 'main' into woosuk/test-router Woosuk Kwon 2025-10-16 00:32:13 +00:00
  • 98e71a4954 enable all Zhuohan Li 2025-10-15 17:01:03 -07:00
  • 1f4472ba5f batched_deepgemm_contiguous Zhuohan Li 2025-10-15 16:54:53 -07:00
  • 938c43ea7f [ci] Adjusting AMD test composition 2025-10-14 (#26852) Alexei-V-Ivanov-AMD 2025-10-15 18:52:13 -05:00
  • 0a9ef0cfce Move query quantization to attention layer for Flashinfer & Triton. (#26534) Adrian Abeyta 2025-10-15 18:01:38 -05:00
  • 850876a183 add triton_group_gemm_masked Zhuohan Li 2025-10-15 14:52:31 -07:00
  • a608dfab45 Add contiguous triton moe example Zhuohan Li 2025-10-15 13:30:30 -07:00
  • e5b438a247 [Bug] Temporally Disable VLLM_ALLREDUCE_USE_SYMM_MEM by Default (#26925) Wentao Ye 2025-10-15 16:18:50 -04:00
  • 0b99f5d302 support flashinfer_fp4 moe for 5090 gpu (#26669) XiaobingZhang 2025-10-16 03:06:47 +08:00
  • 1f491aa0c8 Vectorize RMS norm variance using vectorize_read_with_alignment (#26234) Benji Beck 2025-10-15 11:54:41 -07:00
  • 2797adb329 cleanup update_from_kv_xfer_finished_race_fix Tyler Michael Smith 2025-10-15 18:07:49 +00:00
  • 7477823407 [Bugfix] Fix race condition when KV transfer times out before request finishes Tyler Michael Smith 2025-10-15 18:01:02 +00:00
  • de92d916fe [NVIDIA] Add support for cudnn fp4 gemm via flashinfer (#26107) Kaixi Hou 2025-10-15 10:53:00 -07:00
  • a1063628a4 [Chore] Clean up CODEOWNERS (#26923) Woosuk Kwon 2025-10-15 10:52:54 -07:00
  • d796375258 [ModelOpt] Remove NVFP4 MoE K%16==0 constraint (#26891) XiaobingZhang 2025-10-16 01:06:17 +08:00
  • 14f8456344 [Feature]: Use pydantic validation in observability.py config (#26637) Sam/Samuel 2025-10-16 01:44:03 +09:00
  • 4794c2bd92 Olmo 3 tool parser and tests (#26143) Pradeep Dasigi 2025-10-15 09:36:12 -07:00
  • d3cbaa08dc Lower sevarity of log when model info cache misses due to exception (#26917) Harry Mellor 2025-10-15 17:01:09 +01:00
  • 828523ad8e [Chore] Separate out vllm.utils.async_utils (#26913) Cyrus Leung 2025-10-15 23:33:00 +08:00
  • 136a17fe6e [Chore] Separate out vllm.utils.func (#26904) Cyrus Leung 2025-10-15 21:03:58 +08:00
  • f57438338d [BugFix] Patch inductor memory plan logic (#26878) Boyuan Feng 2025-10-15 05:51:45 -07:00
  • 5d598680e3 chore: remove unused marker (#26890) Max Wittig 2025-10-15 14:40:33 +02:00
  • 8f4b313c37 [Misc] rename torch_dtype to dtype (#26695) wangxiyuan 2025-10-15 20:11:48 +08:00
  • f93e348010 [Misc] Remove isort and yapf ignores (#26888) Cyrus Leung 2025-10-15 20:09:03 +08:00
  • f54f85129e [Model][2/N] Improve all pooling task | Support multi-vector retrieval (#25370) wang.yuqi 2025-10-15 19:14:41 +08:00
  • d4d1a6024f [Lora]Load tuned multi-lora kernel configs from json files (#26319) li2haipeng 2025-10-15 02:45:14 -07:00
  • db1764e4e0 [Platform] allow platform to init dp group (#22243) wangxiyuan 2025-10-15 17:32:17 +08:00
  • 7f83b4ee8e [Easy] Get rid of unnecessary paraenthesis in kv_cache_manager (#26842) Jialin Ouyang 2025-10-15 02:17:43 -07:00
  • 5c3bae1a6a [Fix] Remove divisibility requirement between num_kv_heads and tp_size in bailing_moe (#26876) ant-yy 2025-10-15 16:44:04 +08:00
  • 5210dc3940 [Misc] Update TritonLanguagePlaceholder to have attributes that are used by Flash Linear Attention ops. (#26853) Xudong Ma 2025-10-15 01:37:49 -07:00
  • 650b51f9f9 [doc] add Context Parallel Deployment doc (#26877) youkaichao 2025-10-15 16:33:52 +08:00
  • 6256697997 [Doc] ruff format remaining Python examples (#26795) Cyrus Leung 2025-10-15 16:25:49 +08:00
  • 71557a5f7c [CI] Fix mypy for vllm/executor (#26845) Wentao Ye 2025-10-15 04:23:33 -04:00
  • f3c378ffa7 [CI/Build] Add Qwen2.5-VL-7B-Instruct ChartQA Accuracy Tests in CI (#21810) Zhewen Li 2025-10-15 01:09:56 -07:00
  • f5ed68ef63 [Deepseek-V3.2][Kernel] Integrate cuda indexer k cache gather (#26456) Yongye Zhu 2025-10-15 04:05:01 -04:00
  • efdef57b1f [bugfix] Lazy import cv2 (#26869) Angela Yi 2025-10-15 00:47:50 -07:00
  • b8a4572157 [Misc] Use helper function to generate dummy messages in OpenAI MM tests (#26875) Cyrus Leung 2025-10-15 15:17:37 +08:00
  • 302ef403a2 [DSA][MLA] Tiny refactor on DeepSeek to make it reusable for different backends (#26656) Mengqing Cao 2025-10-15 15:16:44 +08:00
  • 8865da157b [Bugfix][Multi Modal] Fix incorrect Molmo token processing (#26873) sangho.lee 2025-10-15 02:13:59 -05:00
  • f0862eae43 [Graph Partition] pass tests for decorator (#26831) Boyuan Feng 2025-10-14 23:39:48 -07:00
  • 8c851f6d04 [Bugfix] Fix qwen3-omni audio truncation issue (#26815) Isotr0py 2025-10-15 13:38:36 +08:00
  • 7cfa420f49 [BugFix] Patch inductor partitioning logic (#26735) Angela Yi 2025-10-14 22:04:32 -07:00
  • a27b288e4a [Feature] default --extra-body param to disable thinking in vllm bench serve (#26784) rongfu.leng 2025-10-15 12:23:44 +08:00
  • e471d7ca7e [CI/Build][Bugfix] fix qutlass cmake error when set QUTLASS_SRC_DIR (#26773) zhrrr 2025-10-15 12:09:44 +08:00
  • c43ca8259e [Docs] Move build.inc into arm.inc (#26862) Michael Yao 2025-10-15 11:35:08 +08:00
  • 85a65e7f51 [Model] Add DeepSeek-V3.1 reasoning parser (split from PR #24972) (#25589) Tao Hui 2025-10-15 11:09:52 +08:00
  • a2986b3e33 [Bugfix] Fixes prefix-repetition benchmark script (#26828) kourosh hakhamaneshi 2025-10-14 19:54:43 -07:00
  • 96b9aa5aa0 [Frontend][torch.compile] CompilationConfig Overhaul (#20283): name change compilation level to compilation mode, deprecation compilation level (#26355) Morrison Turnansky 2025-10-14 22:51:16 -04:00
  • e66d787bce Disable FlashInfer sampler by default (#26859) Michael Goin 2025-10-14 22:35:18 -04:00
  • bfad142e25 [BUGFIX][NIXL] quick fix for 'assert self.connector_worker is not None' in get_kv_connector_stats (#26851) Chendi.Xue 2025-10-14 21:33:25 -05:00