|
|
2f7dbc9b42
|
Add batch invariant kernel override for FlashInfer backend [2/n] (#25769)
Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: Bram Wasti <bwasti@fb.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-10-03 19:49:30 -07:00 |
|
|
|
ea25a76c05
|
[BugFix] Use async Mistral Tokenizer in Chat Completions (#26134)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-04 09:42:08 +08:00 |
|
|
|
67bc0c003e
|
[Bugfix] Fix qwen3 vl dummy data generation with overrides (#26193)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-10-04 01:40:20 +00:00 |
|
|
|
5a05f26603
|
Fix issue of using only the part of video frame [Nemotron Nano] (#26186)
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
|
2025-10-04 00:21:00 +00:00 |
|
|
|
7ef40bb983
|
[GPTOSS][DP/EP][Marlin] Enable GPTOSS DP/EP using Marlin kernels (#25488)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-10-03 20:13:13 -04:00 |
|
|
|
767cbb011d
|
[CI] Fix Pre-commit Mypy Error (#26181)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 16:08:03 -07:00 |
|
|
|
7cfa4b24bf
|
[BugFix] Fix de-functionalization pass for rotary_embedding (#23953)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2025-10-03 15:44:18 -07:00 |
|
|
|
b71fcd4905
|
[Misc] Add penalties sampling parameters to serve tool (#25974)
Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com>
Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com>
|
2025-10-03 15:43:14 -07:00 |
|
|
|
75003f34e8
|
[CI] Push multiarch manifests as nightly builds (#25764)
Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>
|
2025-10-03 15:42:55 -07:00 |
|
|
|
78b8015a4d
|
[Bugfix] Relax tokenizer regex for mixtral to include 'tokenizer.model' (#25964)
Signed-off-by: Bowen Bao <bowenbao@amd.com>
|
2025-10-03 18:31:59 -04:00 |
|
|
|
831b124151
|
[responsesAPI] add better error messaging for long prompts (#25724)
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-10-03 14:33:13 -07:00 |
|
|
|
c1ffcb55da
|
[Refactor] Optimize FP8 MOE Backend Choice and Log (#26044)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 15:23:42 -06:00 |
|
|
|
0879736aab
|
[Perf] Remove hardcoded num_warps=1 (#26183)
Signed-off-by: Corey Lowman <clowman1993@gmail.com>
|
2025-10-03 20:38:50 +00:00 |
|
|
|
a26917332f
|
[Quantization/NVFP4] Speed up TRTLLM NVFP4 MOE weight loading and fix K/V scale loading for MLA Attn (#25968)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-10-03 19:35:06 +00:00 |
|
|
|
cd9e5b8340
|
Fix V1 engine serialization error with Ray distributed executor (#26148)
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
|
2025-10-03 18:39:45 +00:00 |
|
|
|
300a59c4c3
|
Avoid division by zero in cache DS MLA kernel (#26174)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-03 17:35:17 +00:00 |
|
|
|
d76541a6c5
|
Stop mergify from keeping stale PRs alive (#26169)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-03 16:42:34 +00:00 |
|
|
|
dd96465fd7
|
[BugFix][QWEN-VL]fix wrong apply_rotary_emb_torch selection introduced by #24642 (#26123)
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-03 08:52:26 -07:00 |
|
|
|
4f8f47e87e
|
Fix undefined symbol: cutlass_moe_mm_sm100 (#26098)
Signed-off-by: Jun Jiang <jasl9187@hotmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-03 15:48:32 +00:00 |
|
|
|
d78fda7cda
|
[Renderer] Move Processor out of LLMEngine (#26165)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-03 15:08:22 +00:00 |
|
|
|
73a99cc2a5
|
[Model] Fixed stream generator for gpt-oss + spec-decoding (#26027)
Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com>
|
2025-10-03 13:43:41 +00:00 |
|
|
|
adae0c1f43
|
[CI/Build] do not enforce precompilation on tpu ci tests (#25992)
Signed-off-by: Xiang Si <sixiang@google.com>
|
2025-10-03 13:38:42 +00:00 |
|
|
|
cbf9221992
|
[Model] Supplement to PR 24862: Pass param prefix to LLMHead (#25805)
Signed-off-by: whx-sjtu <2952154980@qq.com>
|
2025-10-03 21:34:53 +08:00 |
|
|
|
5f42fc53b6
|
[backends][short_conv] CUDA graph piecewise edits (#24215)
Signed-off-by: Paul Pak <paulpak58@gmail.com>
|
2025-10-03 12:59:48 +00:00 |
|
|
|
8ee846c27c
|
[Bugfix] Re-enable prefill of max model length (#24446)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
|
2025-10-03 14:13:34 +02:00 |
|
|
|
812b7f54a8
|
[Renderer] Move Processor out of AsyncLLM (#24138)
Signed-off-by: Yang <lymailforjob@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-03 11:29:45 +00:00 |
|
|
|
5f2cacdb1e
|
Quick fix for IMA with the Prefix Prefill kernel during graph capture (#25983)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-10-03 11:28:22 +00:00 |
|
|
|
aa5053e3fe
|
[Doc] Fixed shape description for fused_batched_moe.py (#25668)
Signed-off-by: Egor <e.a.krivov@gmail.com>
|
2025-10-03 04:00:23 -07:00 |
|
|
|
79aa244678
|
[Multi Modal] Configurable MM Profiling (#25631)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-03 03:59:10 -07:00 |
|
|
|
2ed3f20dba
|
[openai] Fix missing tool usage check (system message) (#24768)
Signed-off-by: kyt <eluban4532@gmail.com>
|
2025-10-03 18:55:44 +08:00 |
|
|
|
48f309029a
|
[NIXL][Misc] Expose metrics from NIXL for logging to CLI (#25388)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-10-03 10:47:59 +00:00 |
|
|
|
0e93ac0b3a
|
[CI] Fix distributed hybrid tests in CI (#26155)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-10-03 09:14:18 +00:00 |
|
|
|
5446ad1d24
|
[test utils] correct wrong typing (#26159)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
|
2025-10-03 02:11:49 -07:00 |
|
|
|
f9a8084e48
|
[Model] Use merge_by_field_config for MM models (InternVL family) (#26153)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-03 01:59:06 -07:00 |
|
|
|
3e70e3d4d5
|
add(v1): RequestStatesStats to RequestOutput (#24947)
Signed-off-by: huijjj <huijong.jeong@squeezebits.com>
|
2025-10-03 08:56:25 +00:00 |
|
|
|
eb0fa43868
|
[Perf] Optimize reshape_and_cache CUDA Kernel (#25955)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Liu-congo <1502632128@qq.com>
|
2025-10-03 01:33:46 -07:00 |
|
|
|
0ad9951c41
|
[Input] Remove unused prompt field (#26097)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-03 00:23:21 -07:00 |
|
|
|
8c9117181d
|
[Misc] Remove typing.List (#26150)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-10-03 07:00:33 +00:00 |
|
|
|
c4b48d3c0f
|
[BUG] Reorder model config creation (#26124)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2025-10-03 14:59:36 +08:00 |
|
|
|
10d765482d
|
FusedMoE support for the Transformers backend (#22650)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-02 23:12:15 -07:00 |
|
|
|
39b643dc1a
|
[Model] Use merge_by_field_config for MM models (G) (#26117)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-02 22:38:29 -07:00 |
|
|
|
711f485643
|
[Bugfix] Fix import gemm_afp4wfp4 failure on AMD (#26068)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-10-02 22:37:25 -07:00 |
|
|
|
9c5ee91b2a
|
[ROCm] [VL] [Bugfix] Fix vit flash attn dispatcher logic for ROCm (#26104)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-10-02 22:34:53 -07:00 |
|
|
|
27edd2aeb4
|
[Build/CI] Revert back to Ubuntu 20.04, install python 3.12 with uv (#26103)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-10-02 22:21:01 -07:00 |
|
|
|
e5017cd6d6
|
[gpt-oss] disable tool server initialization if no tool in request (#25790)
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-10-03 05:08:35 +00:00 |
|
|
|
6a7796e871
|
[Bug]: Limit num_reqs in dummy_run when max_num_seqs is small (#26144)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-10-03 04:00:20 +00:00 |
|
|
|
47b9339546
|
[DeepSeek] Improve performance of DS MLA cache kernel (#26132)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-02 20:35:47 -07:00 |
|
|
|
5d5146eee3
|
[CI/Build] Conditionally register cutlass_fp4_group_mm to fix building on Hopper (#26138)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-02 20:32:38 -07:00 |
|
|
|
2aaa423842
|
[Attention] Move Backend enum into registry (#25893)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-02 20:32:24 -07:00 |
|
|
|
ad2d788016
|
[Bug][Benchmark] Fix duplicate req in oversampling (#26140)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-03 02:55:24 +00:00 |
|