Commit Graph

  • c3a2c6ac5f [MM][Core] Decouple ViT backend from LM backend (#27061) Roger Wang 2025-10-21 00:30:10 -07:00
  • 72f431e709 [Nixl] Minor refactor to handshake related metadata (#26410) Nicolò Lucchesi 2025-10-21 09:07:47 +02:00
  • be4445072c [Fix][Spec Decode] Fix llama4 draft loading with different quantization (#27136) Zebing Lin 2025-10-21 02:19:00 -04:00
  • f381cf2302 [Bugfix] Fix broken MTP weight loading for FP8 KV Scales (#27227) Benjamin Chislett 2025-10-21 01:51:44 -04:00
  • 5ff5d94e77 [Bugfix] Fix gpt-oss w4a8 DP/EP on B200 (#26729) Varun Sundar Rabindranath 2025-10-21 01:51:14 -04:00
  • f95da13c3d [ModelOpt] Load w13/w2_input_scale for all experts, nvfp4 (#26135) Shu Wang 2025-10-21 00:50:31 -05:00
  • 99e2379b16 low latency combine zhuohan/moe-kernel-experiment Zhuohan Li 2025-10-20 22:00:01 -07:00
  • da26dce7b2 low latency dispatch Zhuohan Li 2025-10-20 21:49:09 -07:00
  • aef368aa08 [BugFix] GPT-OSS Attention DP + MoE TP weight loading issue (#24032) Po-Han Huang (NVIDIA) 2025-10-21 12:03:47 +08:00
  • 5f6cbf60d6 [Feature][Kernel]FusedMoE LoRA (#21229) Chen Wu 2025-10-21 11:01:37 +08:00
  • 3ada34f9cb [Frontend] Enforce tokenize=False when applying chat template (#27205) Russell Bryant 2025-10-20 22:57:34 -04:00
  • eda71c2847 Remove /generate API Woosuk Kwon 2025-10-21 02:55:24 +00:00
  • 0eb8f2b880 create is_in_the_same_node on cpu (#26832) Lunwen He 2025-10-20 19:04:14 -07:00
  • 163965d183 [cpu] Dispatch un-quantized linear to oneDNN/ACL by default for AArch64 (#27183) Fadi Arafeh 2025-10-21 03:02:58 +01:00
  • a03cf9bc70 [V0 Deprecation] Remove V0 metrics code (#27215) Nick Hill 2025-10-20 19:02:10 -07:00
  • 352c0c8a28 [Quantization] Automatically infer AWQ modules_to_not_convert field (#26909) Isotr0py 2025-10-21 09:49:28 +08:00
  • 48dcc72d7e refactor and add low latency code Zhuohan Li 2025-10-20 18:08:57 -07:00
  • e3e2bb3865 add combine example Zhuohan Li 2025-10-20 17:45:31 -07:00
  • bfe0b4bd2a [ez] add uv lock to gitignore (#27212) Andrew Xia 2025-10-20 17:37:44 -07:00
  • 58fbbcb2f5 [ROCm] enable some tests in entrypoints test groups on AMD (#26725) Concurrensee 2025-10-20 19:37:16 -05:00
  • 1bff9a59ec Add /generate API Woosuk Kwon 2025-10-20 22:29:52 +00:00
  • 87778d5f00 [Feature][Quantization] auto_round support for mixed bits quantization (#23812) Heng Guo 2025-10-21 06:23:30 +08:00
  • f9e7ad5400 [Bugfix][CI] Fix Distributed Tests (4 GPUs) async_sched+ray test (#27195) Nicolò Lucchesi 2025-10-20 18:34:54 +02:00
  • 4d0f266113 [Kernel][Model] Tune fused_moe Triton configs for Qwen3-30B A3/A3B on H100 (FP8/BF16) (#26268) shivampr 2025-10-20 07:48:01 -07:00
  • e93ff6c8b9 Nemotron Nano V2 VL + EVS Video Support (#27107) Eugene Khvedchenya 2025-10-20 17:19:11 +03:00
  • 1c691f4a71 AArch64 CPU Docker pipeline (#26931) ioana ghiban 2025-10-20 13:09:40 +02:00
  • 4e2abe99b7 minor fix Zhuohan Li 2025-10-19 23:31:03 -07:00
  • 177f5d757f complicated assert for correctness check Zhuohan Li 2025-10-19 23:30:28 -07:00
  • 9fce7bee74 [Kernel] Accelerate solve_tril with TMA (#26746) Jiangyun Zhu 2025-10-20 13:39:02 +08:00
  • b63f2143f8 [LoRA] LoRA cuda graph specialization (#25914) Andy Lo 2025-10-20 05:21:09 +01:00
  • f32bf7582e [Model][VLM] Support Bee-8B Model (#27012) Yi Zhang 2025-10-20 10:31:26 +08:00
  • 8a81d776ce Fix typo in ValueError message: use kv_role instead of kv_disagg_role (#27166) Yongtao Huang 2025-10-20 03:47:19 +08:00
  • f6fdacd82c [Bugfix] Fix error with penalties when speculative decoding and structural output are enabled (#26586) Sergei Skvortsov 2025-10-19 20:24:46 +01:00
  • d31f7844f8 [Misc] Move utils to avoid conflicts with stdlib, and move tests (#27169) Cyrus Leung 2025-10-19 20:20:55 +08:00
  • 7a6c8c3fa1 [Chore] Separate out vllm.utils.network_utils (#27164) iAmir97 2025-10-19 17:06:32 +07:00
  • 221bf72577 output type conversion fix (#27159) Jianyu Huang 2025-10-19 01:10:07 -07:00
  • b3aba04e5a [Benchmark] Convenience script for multiple parameter combinations (#27085) Cyrus Leung 2025-10-19 14:57:01 +08:00
  • 8a297115e2 [Chore] Separate out hashing utilities from vllm.utils (#27151) dongbo910220 2025-10-19 11:09:38 +08:00
  • 191eed0bb9 [BugFix] Fix lazy imports involving outlines_core (#27158) 22quinn 2025-10-18 19:35:32 -07:00
  • fb860670da [Minor] Remove unused env variable (#27161) Woosuk Kwon 2025-10-18 18:48:35 -07:00
  • 6f47333c4e [Misc] Allow override VLLM_DISTRIBUTED_INIT_METHOD_OVERRIDE woosuk/rm-add-init-env Woosuk Kwon 2025-10-19 01:47:13 +00:00
  • 83e760c57d [V1][Metrics][Plugin] Add plugin support for custom StatLoggerBase implementations (#22456) Tova Movshovitz 2025-10-19 01:12:46 +03:00
  • c2bba69065 [BugFix] Disable fp8 kv-cache by default for DeepSeek V3.2 (#27121) Lucas Wilkinson 2025-10-18 18:05:23 -04:00
  • e133d6d218 [BugFix] fix graph partition signature (#27139) Boyuan Feng 2025-10-18 14:34:36 -07:00
  • a1946c9f61 [Chore] Separate out profiling utilities from vllm.utils (#27150) dongbo910220 2025-10-19 03:12:01 +08:00
  • 14299bfcaf Derive auto max model len state from original value codex/add-1-option-for-max-model-length Michael Goin 2025-10-18 14:49:36 -04:00
  • 9f020f4f31 [BugFix] Fix failing gemma-3-1b-it test: test_lm_eval_accuracy_v1_engine[google/gemma-3-1b-it] (#27111) Lucas Wilkinson 2025-10-18 14:44:39 -04:00
  • 3b45075206 [Minor] Add some clarifying comments to recent changes (#27130) Nick Hill 2025-10-18 09:52:45 -07:00
  • 168e578efc Fix incorrect string formatting in barrier timeout exceptions (#27149) Yongtao Huang 2025-10-19 00:51:57 +08:00
  • 6ac5e06f7c [Chore] Clean up pytorch helper functions in vllm.utils (#26908) Isotr0py 2025-10-19 00:48:22 +08:00
  • 5c2acb270a [Models][QwenVL] Remove unnecessary .contiguous() calls (#27106) Lukas Geiger 2025-10-18 16:05:05 +02:00
  • b26b70bec4 [Misc] Refactor get_kv_cache_spec into AttentionLayerBase (#26587) Nicolò Lucchesi 2025-10-18 15:51:21 +02:00
  • ab4be40fc5 [fix][cpu] fix prefill attention in CPU attention backend (#27035) Fadi Arafeh 2025-10-18 14:30:21 +01:00
  • 245e4f2c01 [Feature] Batch Invariant: Support DeepGEMM and Blackwell (#27127) Wentao Ye 2025-10-18 09:28:05 -04:00
  • 1d165d6d85 [Chore] Separate out vllm.utils.mem_utils (#27143) iAmir97 2025-10-18 17:06:59 +07:00
  • 83004020fd [Test] Add test for /health endpoint on engine failure (#26074) dongbo910220 2025-10-18 17:59:05 +08:00
  • 12e21701e7 [DOC][FEATURES][CPU]update cpu feature for v1 (#27135) Chendi.Xue 2025-10-18 03:10:45 -05:00
  • 30a33b92ee [Misc] Rev DeepEP (#27122) Varun Sundar Rabindranath 2025-10-18 02:54:29 -04:00
  • 7c572544e4 [GPT-OSS] Structure_Tag support for gpt-oss tool-call in cot (#25515) Hanchenli 2025-10-17 21:55:54 -07:00
  • c312320764 [CI/Build] tests(v1): feed Triton attention the (num_blocks, 2, …) KV cache layout in backend-correctness tests (#26663) Huamin Li 2025-10-17 21:11:26 -07:00
  • c981f0ea78 [Perf] Add H100 fused MoE config (#25398) ZiTian Zhao 2025-10-18 10:21:27 +08:00
  • dcf059ab84 deepep HT dispatch no abstraction Zhuohan Li 2025-10-17 18:42:27 -07:00
  • 6367bde739 [BugFix][Core] Fix error when enable async-scheduling in multi-node env (#25887) Lehua Ding 2025-10-18 06:16:18 +08:00
  • f50cc221ea [Test] Make test_failure more stable for batch invariance (#27054) Wentao Ye 2025-10-17 16:59:08 -04:00
  • acedc74b1a [V1][Spec Decode] Fix greedy temperature detection after sampler refactor (#27077) Pradyun92 2025-10-17 16:27:47 -04:00
  • d29483b58a [Minor] Remove unnecessary error message (#27115) Zhuohan Li 2025-10-17 13:02:12 -07:00
  • 950cf9e58e [Bugfix] Use PIECEWISE cudagraphs on Blackwell if max_model_len > 131072 (#27114) Michael Goin 2025-10-17 15:47:18 -04:00
  • 3125d79950 [Chore] Remove unused PolyNorm layer (#27110) Isotr0py 2025-10-18 03:03:43 +08:00
  • e33ee23ee3 [Bugfix] [AITER] [ROCm] Fix Quark MoE Quant Config and AITER Fused MoE quant type logic (#27029) vllmellm 2025-10-18 02:51:10 +08:00
  • a2599dca0f fix missing removal zhuohan/remove-virtual-engine Zhuohan Li 2025-10-17 11:35:42 -07:00
  • b10c64c834 [ROCm][Bugfix][Model] Fix illegal memory access when running qwen3_moe models with rms_norm (Qwen3-235B-A22B, Qwen3-30B-A3B, etc.) (#26192) rasmith 2025-10-17 13:17:18 -05:00
  • 0925b28a8e [ROCM] MoE fp4 CK kernel (#26545) Aleksandr Malyshev 2025-10-17 11:06:33 -07:00
  • 99722d5f0e [CI] Remove forbidden slash (#27112) Nicolò Lucchesi 2025-10-17 18:38:00 +02:00
  • 4c91a28e30 [bugfix] Qwen3-VL fix video incorrect timestamp calculations while do_sample_frames=True (#27104) 2025-10-18 00:26:33 +08:00
  • b038d9c40c [Data-parallel] Allow DP>1 for world_size > num_gpus on node (8) (#26367) Patrick von Platen 2025-10-17 17:24:42 +02:00
  • 2ba60ec7fe [CI] Nixl integration tests (#27010) Nicolò Lucchesi 2025-10-17 16:13:31 +02:00
  • bd7157a071 [torch.compile] Enable attention and allreduce fusion without custom ops enabled (#24604) Luka Govedič 2025-10-17 10:10:23 -04:00
  • be429d0cfd Fix incorrect docstring for stop_profile() method (#27101) Yongtao Huang 2025-10-17 21:30:23 +08:00
  • c253745eb8 [Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI350 and MI355 (#25586) Reima Karhila (AMD) 2025-10-17 14:56:12 +03:00
  • daec4d2624 [Model]Improve Qwen3VLMoeForConditionalGeneration packed_modules_mapping (#27096) Jee Jee Li 2025-10-17 19:47:00 +08:00
  • 6c9fdbf725 [Docs] Replace rst style double-backtick with md single-backtick (#27091) Harry Mellor 2025-10-17 10:47:34 +01:00
  • 483ea64611 [Docs] Replace all explicit anchors with real links (#27087) Harry Mellor 2025-10-17 10:22:06 +01:00
  • e20eba753b [VLM][Refactor] Remove useless func get_input_positions in MRotaryEmbedding (#27088) Mengqing Cao 2025-10-17 17:00:30 +08:00
  • bbc1b29665 Update troubleshooting.md and remind VLLM_TRACE_FUNCTION usage (#27069) cong-meta 2025-10-17 01:53:06 -07:00
  • acb1bfa601 [CI] fix docs build failed (#27082) Chauncey 2025-10-17 15:53:40 +08:00
  • 75c7ad9918 [Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel (#26717) zhrrr 2025-10-17 15:30:35 +08:00
  • 3fd66b1e73 [Misc] Remove unused virtual engine flag Zhuohan Li 2025-10-16 23:04:05 -07:00
  • 5550ff9c25 [CI/Build] Update compressed tensor test path to fix CPU CI (#27068) Li, Jiang 2025-10-17 13:34:56 +08:00
  • 3aeb19a39e [Model] Add support for LightOnOCR (#26916) Said Taghadouini 2025-10-17 07:05:24 +02:00
  • 8c017b3490 [Model] Always use Transformers backend for PaliGemma and Gemma3-MM (#26715) Cyrus Leung 2025-10-17 13:03:35 +08:00
  • 9c2c2287a0 [CI/Build] Update Llama4 eval yaml (#27070) Zhewen Li 2025-10-16 21:59:47 -07:00
  • fec2b341ad [Kernel] Lazy import FlashInfer (#26977) Jee Jee Li 2025-10-17 12:48:18 +08:00
  • 87bc0c492f [Bugfix] Fix ReplicatedLinearWithLoRA (#27065) Jee Jee Li 2025-10-17 12:43:16 +08:00
  • fe3b9372ad [Core] Change execute_model_with_error_logging() to be a ctx manager (#27060) Nick Hill 2025-10-16 20:45:32 -07:00
  • bde9e2272a [Bugfix][Qwen] fixes the weights dtype in qwen3_next: it is actually a bfloat16 (#27030) Tao He 2025-10-17 11:37:52 +08:00
  • 08405609cc disable graph partition in custom op (#26952) Boyuan Feng 2025-10-16 20:08:47 -07:00
  • ab81379ea6 [Perf] Exploit out-of-band buffers in shm_broadcast (#26961) Nick Hill 2025-10-16 20:08:03 -07:00
  • 4ffd6e8942 [Docs] Reduce custom syntax used in docs (#27009) Harry Mellor 2025-10-17 04:05:34 +01:00
  • 965c5f4914 vllm bench serve shows num of failed requests (#26478) Tomas Ruiz 2025-10-17 04:55:09 +02:00
  • 4d055ef465 Remove unused imports (#26972) Lukas Geiger 2025-10-17 03:51:17 +01:00