|
|
cfa3234a5b
|
[CI][Spec Decode] Adjust threshold for flaky ngram spec decoding test again (#24771)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-09-13 15:45:11 +08:00 |
|
|
|
59d7ffc17f
|
[CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (#24750)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-13 07:29:19 +00:00 |
|
|
|
98229db244
|
[Kernels][DP/EP] Optimize Silu Kernel for R1 (#24054)
Signed-off-by: elvircrn <elvircrn@gmail.com>
|
2025-09-13 00:17:27 -07:00 |
|
|
|
5febdc8750
|
[Chore] Remove unused batched RoPE op & kernel (#24789)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-09-13 00:08:20 -07:00 |
|
|
|
89e08d6d18
|
[Model] Add Olmo3 model implementation (#24534)
Signed-off-by: Shane A <shanea@allenai.org>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-13 03:26:21 +00:00 |
|
|
|
7f2ea7074e
|
[Frontend][Multimodal] Allow skipping media data when UUIDs are provided. (#23950)
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-09-13 02:16:06 +00:00 |
|
|
|
4fdd6f5cbf
|
[Core] Support async scheduling with uniproc executor (#24219)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Co-authored-by: Ronald1995 <ronaldautomobile@163.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-09-12 16:34:28 -07:00 |
|
|
|
5fe643fc26
|
Add FLASHINFER_MLA to backend selector test (#24753)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
|
2025-09-12 22:30:07 +00:00 |
|
|
|
7ba32aa60b
|
[Attention][FlashInfer] Enable FP8 FlashInfer (TRTLLM) MLA decode (#24705)
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
|
2025-09-12 15:45:53 -06:00 |
|
|
|
c8c42597ab
|
[CI] Speed up model unit tests in CI (#24253)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
|
2025-09-12 10:36:50 -07:00 |
|
|
|
a5b84f1cbf
|
[Core] Shared memory based object store for Multimodal data caching and IPC (#20452)
Signed-off-by: donglu <donglu@cohere.com>
|
2025-09-12 07:54:17 -07:00 |
|
|
|
0377802c20
|
[Multimodal] Remove legacy multimodal fields in favor of MultiModalFeatureSpec (#24548)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2025-09-12 21:42:23 +08:00 |
|
|
|
72fc8aa412
|
[Multi Modal] Add FA3 in VIT (#24347)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
|
2025-09-12 21:27:24 +08:00 |
|
|
|
59d5d2c736
|
[CI/Build] Skip prompt embeddings tests on V1-only CPU backend (#24721)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-09-12 18:51:01 +08:00 |
|
|
|
d21a36f5f9
|
[CI] Add ci_envs for convenient local testing (#24630)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-12 08:52:25 +00:00 |
|
|
|
561a0baee0
|
[CI] Fix flaky test v1/worker/test_gpu_model_runner.py::test_kv_cache_stride_order (#24640)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-09-12 07:49:09 +00:00 |
|
|
|
ddcec289c7
|
Fix implementation divergence for BLOOM models between vLLM and HuggingFace when using prompt embeds (#24686)
Signed-off-by: Andrew Sansom <andrew@protopia.ai>
|
2025-09-12 04:35:48 +00:00 |
|
|
|
e090b7b45b
|
Enable conversion of multimodal models to pooling tasks (#24451)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-09-12 03:30:41 +00:00 |
|
|
|
40b6c9122b
|
[V1] feat:add engine v1 tracing (#20372)
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Signed-off-by: Ye Zhang <zhysishu@gmail.com>
Signed-off-by: RichardoMu <44485717+RichardoMrMu@users.noreply.github.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Mu Huai <tianbowen.tbw@antgroup.com>
Co-authored-by: Ye Zhang <zhysishu@gmail.com>
Co-authored-by: Benjamin Bartels <benjamin@bartels.dev>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: 瑜琮 <ly186375@antfin.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-09-11 17:10:39 -07:00 |
|
|
|
c3aea10dc8
|
[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-11 15:43:14 -07:00 |
|
|
|
074854b24f
|
[Kernel][B200] mxfp4 fused cutlass moe (#23696)
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-11 17:04:56 -04:00 |
|
|
|
e26fef8397
|
fix some typos (#24616)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
|
2025-09-11 10:48:46 -07:00 |
|
|
|
c1eda615ba
|
Fix model name included in responses (#24663)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-11 10:47:51 -07:00 |
|
|
|
bcbe2a4d9e
|
[VLM] Optimize GLM4.5-V-style video processing to only decode necessary frames (#24161)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-11 09:44:34 -07:00 |
|
|
|
fd1ce98cdd
|
[CI] Split mteb test from Language Models Test (#24634)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-11 06:37:51 -07:00 |
|
|
|
5f5271f1ee
|
Move LoRAConfig from config/__init__.py to config/lora.py (#24644)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-11 11:01:38 +00:00 |
|
|
|
a8b0361c92
|
[CI] Split pooling from entrypoints Test (#24632)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-11 01:53:09 -07:00 |
|
|
|
e93f4cc9e3
|
Add the support for the qwen3 next model (a hybrid attention model). (#24526)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-11 15:32:09 +08:00 |
|
|
|
2048c4e379
|
[torchao] Support quantization configs using module swap (#21982)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
|
2025-09-10 23:53:24 -07:00 |
|
|
|
9bd831f501
|
[Model] New model support for Motif-1-Tiny (#23414)
Signed-off-by: ca1207 <ca1207zzz@gmail.com>
Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com>
Co-authored-by: WyldeCat <skan1543@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-10 23:29:40 -07:00 |
|
|
|
6c8deacd72
|
[Bug] [Spec Decode] Fix model_initialization test and mismatch in aux_hidden_layers (#24613)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-09-10 21:23:18 -07:00 |
|
|
|
dcb28a332b
|
[Kernel] Flashinfer MLA (trtllm-gen) decode kernel integration (#21078)
Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-09-10 15:31:10 -07:00 |
|
|
|
9a161307f5
|
[torch.compile][ROCm][V1] Enable attention output FP8 fusion for V1 attention backends (#19767)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-10 13:59:55 -07:00 |
|
|
|
37e8182bfe
|
[v1] Add Whisper model support (encoder-decoder) (#21088)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
|
2025-09-10 13:53:35 -07:00 |
|
|
|
4db4426404
|
[CI] Fail subprocess tests with root-cause error (#23795)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-09-10 13:53:21 -07:00 |
|
|
|
9fb74c27a7
|
[Core] Support configuration parsing plugin (#24277)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-09-10 11:32:43 -07:00 |
|
|
|
bb3eb80d92
|
[Core] Split LoRA layers (#24574)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-09-10 07:47:51 -07:00 |
|
|
|
fcc0a3130a
|
[CI] Fix tensorizer test assertion (#24545)
Signed-off-by: Peter Schuurman <psch@google.com>
|
2025-09-10 06:57:36 -07:00 |
|
|
|
3144d90217
|
fix some typos (#24167)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-09-10 06:21:23 -07:00 |
|
|
|
bd98842c8a
|
[CI] Add PPL test for generation models (#24485)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-09-10 06:16:39 -07:00 |
|
|
|
492196ed0e
|
[CI/Build] split true unit tests to Entrypoints Unit Tests (#24418)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-09-10 06:16:07 -07:00 |
|
|
|
0b9a612fa3
|
[BugFix][easy] Fix flaky test test_gpt_oss_multi_turn_chat (#24549)
Signed-off-by: lacora2017 <yehu@meta.com>
Co-authored-by: lacora2017 <yehu@meta.com>
|
2025-09-10 21:14:55 +08:00 |
|
|
|
f36355abfd
|
Move LoadConfig from config/__init__.py to config/load.py (#24566)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-09-10 06:14:18 -07:00 |
|
|
|
6cbd41909e
|
Feature/vit attention unification# 23880 (#23978)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-09-10 06:10:14 -07:00 |
|
|
|
72d30108a0
|
Support for NemotronH Nano VLM (#23644)
Signed-off-by: Daniel Afrimi <danielafrimi8@gmail.com>
|
2025-09-10 06:10:06 -07:00 |
|
|
|
7c195d43da
|
[ROCm][Bugfix] Fix Aiter RMSNorm (#23412)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-09-10 21:08:03 +08:00 |
|
|
|
77f62613f9
|
Consolidate rendering parameters into RenderConfig dataclass (#24543)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2025-09-10 08:44:47 +00:00 |
|
|
|
feaf202e93
|
[Bugfix] Guard _may_reorder_batch for encoder-only models on CPU (#24319) (#24348)
Signed-off-by: Remy <eunhwan.shin@dtonic.io>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2025-09-10 14:24:42 +08:00 |
|
|
|
4377b1ae3b
|
[Bugfix] Update Run:AI Model Streamer Loading Integration (#23845)
Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai>
Signed-off-by: Peter Schuurman <psch@google.com>
Co-authored-by: Omer Dayan (SW-GPU) <omer@run.ai>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-09-09 21:37:17 -07:00 |
|
|
|
009d689b0c
|
[Core] Simplify and unify mm uuid handling & auto-generated mm hash overrides processing. (#24271)
Signed-off-by: Chenheli Hua <huachenheli@outlook.com>
|
2025-09-09 21:36:09 -07:00 |
|