Commit Graph

  • 36960501d3 [Hardware][Powerpc] Fix VLLM_CPU_OMP_THREADS_BIND="auto" low CPU utilization for Power (#27734) main Akash kaothalkar 2025-10-31 13:15:26 +05:30
  • b2e65cb4a7 [benchmark] Make request IDs unique across clients by default (#27723) Seiji Eicher 2025-10-30 19:40:35 -05:00
  • 2bf0bcc1fc [CI Test] Add Scheduled Integration Test (#27765) Wentao Ye 2025-10-30 20:29:26 -04:00
  • 697f507a8e [CI/Build][Intel] Enable performance benchmarks for Intel Gaudi 3 (#26919) Jakub Sochacki 2025-10-31 00:57:22 +01:00
  • 1c5c866559 uint64 woosuk/model-runner-v2 Woosuk Kwon 2025-10-30 16:54:10 -07:00
  • d5d2a0fe74 [Misc] Make all tool scripts executable (#27831) Matthew Bonanni 2025-10-30 19:46:02 -04:00
  • 5c8049d990 fix Woosuk Kwon 2025-10-30 16:40:09 -07:00
  • 5666a25efb fix Woosuk Kwon 2025-10-30 16:38:16 -07:00
  • 09e4b2f6eb update Woosuk Kwon 2025-10-30 16:30:06 -07:00
  • 3e2e549c49 add a small update wentao-add-batch-invariant-test yewentao256 2025-10-30 16:29:52 -07:00
  • c9791f1813 [BugFix] Fix broken import in initialize_ray_cluster() (#27838) Nick Hill 2025-10-30 16:26:13 -07:00
  • 110770170f Merge branch 'main' into woosuk/model-runner-v2 Woosuk Kwon 2025-10-30 22:19:50 +00:00
  • e1e2d61788 update time yewentao256 2025-10-30 15:09:19 -07:00
  • 467d3269a5 add batch invariant test to ci yewentao256 2025-10-30 14:47:11 -07:00
  • 620ad799ba Merge branch 'main' into wentao-refactor-batch-invariant-fp8-deepgemm wentao-refactor-batch-invariant-fp8-deepgemm yewentao256 2025-10-30 13:45:20 -07:00
  • e7acb20076 [Feature] Batch invariant torch.compile (#27660) Paul Zhang 2025-10-30 16:11:29 -04:00
  • 4b68c4a55b [Core][Perf] Only invoke save_new_computed_blocks when computed blocks are not empty (#27799) Jialin Ouyang 2025-10-30 12:47:30 -07:00
  • f0756a5b25 Merge branch 'main' into wentao-refactor-batch-invariant-fp8-deepgemm Wentao Ye 2025-10-30 15:33:10 -04:00
  • a8141fa649 [Refactor] Remove VLLM_DEEPEP_LOW_LATENCY_ALLOW_NVLINK (#27750) Wentao Ye 2025-10-30 15:32:39 -04:00
  • 4917002523 [Fix] Skip record_sleep_state logic in PrometheusStatsLogger if not in dev mode (#27789) Sumanth R Hegde 2025-10-30 12:26:27 -07:00
  • a2981c4272 [EP/DP][API Server] Enable DP-aware routing in OpenAI API requests (#24945) cong-meta 2025-10-30 12:10:16 -07:00
  • 4574d48bab [Core][Bookkeeping] Update cu_num_accepted_tokens for all req_index (#27629) Jialin Ouyang 2025-10-30 11:52:36 -07:00
  • ab98f6556f [Bugfix] Fix 2 precommit issues - (mamba_block_size, kv_cache_config) (#27811) Tyler Michael Smith 2025-10-30 14:52:18 -04:00
  • 2918c1b49c [Model] Use the same fused_moe configs for all H200 devices (#23642) v0.11.1rc5 releases/v0.11.1 Roger Meier 2025-10-31 01:36:56 +08:00
  • 1004205795 [MTP] Refactor mtp predictor to avoid d2h operation (#27643) Mengqing Cao 2025-10-31 01:27:39 +08:00
  • ba33e8830d Reapply "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" (#27768) Huy Do 2025-10-30 10:22:30 -07:00
  • 33a0ea5f32 [Docs] add Shanghai Meetup - 2025/10 (#27545) Kebe 2025-10-31 01:33:13 +09:00
  • 60f76baa66 [Misc] Replace CUDA_VISIBLE_DEVICES in DP with torch.cuda.set_device for device selection on cuda-like devices (#27564) Ilya Markov 2025-10-30 16:41:44 +01:00
  • e5e076cad7 [BugFix] Stopgap - Flashinfer Autotuner + GPT-OSS + DP/TP (#27762) Varun Sundar Rabindranath 2025-10-30 11:24:31 -04:00
  • eebf00cb0c [Bugfix][CPU] Fix MRoPE dispatch on the CPU backend (#27800) Li, Jiang 2025-10-30 23:12:05 +08:00
  • 6c5382d06e Merge branch 'main' into wentao-refactor-batch-invariant-fp8-deepgemm Wentao Ye 2025-10-30 10:42:20 -04:00
  • 9956aae4ea [Model][Ouro] Support Ouro Model (#27794) Fan Yin 2025-10-30 22:34:41 +08:00
  • 0fe0140408 [KV offload] Enable CPU KV offload on CUDA alike Platforms (#27770) Zhewen Li 2025-10-30 07:10:29 -07:00
  • 4e68cc9b6a [Model] Introduce Kimi Linear to vLLM (#27809) Zhiyuan Li 2025-10-30 21:02:27 +08:00
  • 1994de99ea [CI Failure] Fix test_kv_cache_model_load_and_run (#27717) Huamin Li 2025-10-30 05:27:53 -07:00
  • 4464723f22 [Frontend][Doc][5/N] Improve all pooling task | Polish encode (pooling) api & Document. (#25524) wang.yuqi 2025-10-30 20:13:05 +08:00
  • 74374386e2 [Bugfix] Improve GPU validation logging in Ray fallback scenarios (#25775) Sairam Pillai 2025-10-30 17:27:59 +05:30
  • c01f6e525f [CI] Fix mypy for vllm/v1/core and vllm/v1/engine (#27108) Wentao Ye 2025-10-30 07:32:17 -04:00
  • c7d2a554ba [CI Failure] fix test_default_mm_loras (#27795) Huamin Li 2025-10-30 03:13:03 -07:00
  • af826e0820 [V0 deprecation] Remove VLLM_USE_V1 usage in config module (#27784) wangxiyuan 2025-10-30 17:42:49 +08:00
  • e806178d2a [BugFix][VL] Fix FA selection on Qwen2.5-VL (#27790) Zhewen Li 2025-10-30 00:54:44 -07:00
  • 5be1bed790 [CI/Build]Add eval config for Qwen3-235B-A22B-Instruct-2507-FP8 (#27113) Huamin Li 2025-10-30 00:50:56 -07:00
  • 31b55ffc62 use stringData in secret yaml to store huggingface token (#25685) yitingdc 2025-10-30 15:47:36 +08:00
  • ded8ada86a Add more dims for batch invariant shims (#27489) Bram Wasti 2025-10-30 01:28:45 -04:00
  • 8bff831f0a [Benchmark] Cleanup deprecated nightly benchmark and adjust the docstring for performance benchmark (#25786) Kuntai Du 2025-10-29 21:43:37 -07:00
  • b5d70751d8 [BugFix] Reordering extend logic fix (#27739) Lucas Wilkinson 2025-10-30 12:39:34 +08:00
  • b8c48c5d72 kernels/moe test pruning (#27053) Fardin Hoque 2025-10-29 21:10:34 -07:00
  • 17d055f527 [Feat] Adds runai distributed streamer (#27230) Benjamin Bartels 2025-10-30 04:09:10 +00:00
  • 2ce5c5d3d6 [BugFix] Handle unscheduled requests properly when async scheduling (#27756) Nick Hill 2025-10-29 21:04:25 -07:00
  • b5bae42f91 [XPU] Update latest IPEX 2.8 release (#27735) Kunshang Ji 2025-10-30 11:17:13 +08:00
  • d7fb10c574 [Bugfix] mamba-block-size is set for vision language model (#27773) Chen Zhang 2025-10-29 19:39:57 -07:00
  • b798e39f93 [XPU][bugfix] fix rope for llama4 and deepseek (#25145) Yan Ma 2025-10-30 09:43:13 +08:00
  • 48eb8eba58 [Temp fix] Disable torch.compile for Qwen2.5 VL's VisionBlock temporarily. (#27760) Chenheli Hua 2025-10-29 16:17:48 -07:00
  • 40464dbf34 rename wentao-batch-invariance-dp yewentao256 2025-10-29 15:19:17 -07:00
  • 981cc5fdbf Merge branch 'main' into wentao-batch-invariance-dp yewentao256 2025-10-29 15:18:33 -07:00
  • b5d90f7400 [Bug] Fix DBO IMA issue for DeepEPHT (#27666) Wentao Ye 2025-10-29 16:28:27 -04:00
  • 7557a67655 precommit copilot/disable-batched-triton-kernel Tyler Michael Smith 2025-10-29 20:26:12 +00:00
  • 1af476b0e9 Merge branch 'main' into copilot/disable-batched-triton-kernel Tyler Michael Smith 2025-10-29 20:18:03 +00:00
  • 8c3b1c7c62 ditch the unit test honestly Tyler Michael Smith 2025-10-29 20:17:46 +00:00
  • d4aa144343 [BugFix] Fix handling of resumed reqs in SharedStorageConnector (#27719) Nick Hill 2025-10-29 13:16:52 -07:00
  • fcb1d570bb [Bug] Fix DeepEP low latency assert self.batched_router_logits.size(-1) == full_router_logits.size(-1) Bug (#27682) Wentao Ye 2025-10-29 14:50:39 -04:00
  • accb8fab07 [KVConnector] Add metrics to Prometheus-Grafana dashboard (#26811) Nicolò Lucchesi 2025-10-29 19:44:49 +01:00
  • 5b0448104f [Bug] Raise error explicitly if using incompatible backend (#27424) Wentao Ye 2025-10-29 13:29:20 -04:00
  • f7a6682872 [CI/Build] Test torchrun with 8 cards (#27548) 22quinn 2025-10-29 10:26:06 -07:00
  • a9fe0793f2 use_aot_compile should respect VLLM_DISABLE_COMPILE_CACHE (#27698) Boyuan Feng 2025-10-29 10:08:54 -07:00
  • 7568a282b9 [FIXBUG] Qwen3VL hallucinations without Contiguous on Torch.SDPA (#27744) JartX 2025-10-29 17:55:35 +01:00
  • 1da3309ace [Core] Exposing engine sleep & wake_up state as prometheus metrics (#24176) Braulio Dumba 2025-10-29 12:32:01 -04:00
  • 5522fb274b [Chore] Optimize P2PNCCLEngine http_address (#27488) Wentao Ye 2025-10-29 12:05:09 -04:00
  • 0f95a1c3f2 [CI] Fix flaky test_two_responses_with_same_prev_id test (#27745) Nicolò Lucchesi 2025-10-29 16:10:35 +01:00
  • ded24e3e54 [ROCm][Platform] Add MI308X device id in _ROCM_DEVICE_ID_NAME_MAP (#27623) Xiake Sun 2025-10-29 22:44:03 +08:00
  • d6704dd099 Fix MiniMax-M2 rmsnorm precision and remove useless code (#27627) Roger Young 2025-10-29 21:01:05 +08:00
  • ecca3fee76 [Frontend] Add vllm bench sweep to CLI (#27639) Cyrus Leung 2025-10-29 20:59:48 +08:00
  • 9a0d2f0d92 [CI/Build] Skip cpu offloading test on AMD (#27690) Zhewen Li 2025-10-29 05:55:51 -07:00
  • ad3ec89532 [VLM] Add Qwen3-VL generation test (#25185) Isotr0py 2025-10-29 20:19:37 +08:00
  • 3481e40743 [chore] Remove models weight on S3 logic (#27725) Kevin H. Luu 2025-10-29 03:29:49 -07:00
  • 5e72216d17 Feature/video support in random mm dataset (#25963) Eugene Khvedchenya 2025-10-29 12:24:52 +02:00
  • 1a33aacf82 [Misc] Raise error for missing video metadata in MultiModalDataParser (#27664) Isotr0py 2025-10-29 18:06:42 +08:00
  • 7ba6aa8f56 [Fix] import get_kv_cache_torch_dtype error in LMCacheConnector integration (#27670) Yue Zhang 2025-10-29 18:03:54 +08:00
  • ab2eb27b74 [Frontend] [gpt-oss] Mcp type bug (#27689) Alec S 2025-10-29 06:01:32 -04:00
  • 3c7fefdeba [Frontend] [gpt-oss] Tool json call parsing error retry (#27675) Alec S 2025-10-29 05:42:44 -04:00
  • 1891cf605a [Bugfix] Fix modular kernel tests (#27707) bnellnm 2025-10-29 04:14:33 -04:00
  • 8df98c2161 [perf] Enable concurrent execution of "shared_experts" and "selected_experts" in qwen3-next (#27578) Jiangyun Zhu 2025-10-29 16:12:54 +08:00
  • 4fb8771cc0 [CI/Build] Move pre-commit only scripts to tools/pre_commit (#27657) Cyrus Leung 2025-10-29 16:04:33 +08:00
  • 413ef7a3b4 [Speculators] Move tests + fix integration (#27308) Dipika Sikka 2025-10-29 03:54:21 -04:00
  • 8b62495076 [Bugfix] Fix non-contiguous tensor error in rocm_unquantized_gemm_impl (#27605) Zhewen Li 2025-10-29 00:00:15 -07:00
  • 83fd49b1fc [CI/Build][Bugfix]Fix Quantized Models Test on AMD (#27712) Zhewen Li 2025-10-28 23:27:30 -07:00
  • a4a4f0f617 [KV Connector] Update lmcache connector with latest compatibility (#27681) Shaoting 2025-10-28 22:38:37 -07:00
  • 0d8161b075 [Model] Fix Qwen3VL and Qwen3Omni after torch.compile changes (#27705) Lukas Geiger 2025-10-29 05:28:20 +00:00
  • d2c33c397a [NIXL][XPU] update name of nixl wheel (#27631) liuzhenwei 2025-10-29 12:43:29 +08:00
  • f6d5f5888c [Build] Revert triton_kernels requirements (#27659) Varun Sundar Rabindranath 2025-10-29 00:07:09 -04:00
  • 9007bf57e6 Revert "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" (#27714) Simon Mo 2025-10-28 20:58:01 -07:00
  • f257544709 Install pre-built xformers-0.0.32.post2 built with pt-2.9.0 (#27598) v0.11.1rc4 Huy Do 2025-10-28 19:39:15 -07:00
  • 0b51c9bd8b [Core] Early return in SlidingWindowManager.remove_skipped_blocks (#27673) Jialin Ouyang 2025-10-28 18:32:33 -07:00
  • d3ab240f39 [Bug] Fix deepep low latency use nvlink by default (#27677) Wentao Ye 2025-10-28 19:53:12 -04:00
  • 94666612a9 [Misc][qwen2_5_vl][torch.compile] Enable supports_torch_compile on generic nn.Module and demonstrate speedup on Qwen Vision model (#23207) Lucas Kabela 2025-10-28 15:36:43 -07:00
  • 4fe5895361 [AsyncScheduling] Make async overlap work with logprobs (#27615) Nick Hill 2025-10-28 15:35:54 -07:00
  • b53a65fa46 update using skip if server is not up yewentao256 2025-10-28 14:58:07 -07:00
  • 111faf1118 [Core] Scheduler: Publish connector events after output (#25875) Or Ozeri 2025-10-28 23:01:33 +02:00
  • 4f2a8d9d7f Merge branch 'main' into wentao-batch-invariance-dp yewentao256 2025-10-28 14:00:34 -07:00
  • 6afc28a9ba [Test] Batch Invariant: Unit test using parameterized backend (#27478) Wentao Ye 2025-10-28 16:51:35 -04:00