|
|
eb1f43bc82
|
[gpt-oss][bugfix] remove logic to require resp_ in ResponseAPI (#25428)
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
cf0e250200
|
[V0 Deprecation] Remove placeholder attn (#25510)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
0c11617ff1
|
[Core] Use KVCacheBlock as much as possible instead of dict[block_id, KVCacheBlock] (#24830)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
930e691c65
|
[CI/Build] Fix and re-enable v1 PP test on CI (#25496)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
e49561cd91
|
Enable symmetric memory all reduce by default only enabling for TP (#25070)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
8ba3b17cc1
|
[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue (#25406)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
b672b8c3b8
|
[Performance] Move apply_w8a8_block_fp8_linear to an op class (#24666)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
56201cfb01
|
[core] add nccl symmetric memory for all reduce (#24532)
Signed-off-by: Amir Samani <asamani@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
9689be1e8e
|
[ROCm] Add skinny gemm bias support for dtypes fp16,bf16,fp8 (#24988)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
Signed-off-by: Hashem Hashemi <159079214+amd-hhashemi@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
26da2c6244
|
[V1][Kernel] Add triton implementation for reshape_and_cache_flash (#24503)
Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
9cae377a16
|
Add backward compatibility for GuidedDecodingParams (#25422)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
8c5c35c027
|
[Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support (#24845)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
f97da2c732
|
[V1] Remove V0 code paths for Hybrid models (#25400)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
a500f7cc09
|
[Docs] NixlConnector quickstart guide (#24249)
Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Signed-off-by: Nicolò Lucchesi<nicolo.lucchesi@gmail.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
1b75f784b8
|
[P/D] Support NIXL connector to disconnect during a clean shutdown (#24423)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
030774abcf
|
[CI/Build] Fix disabled v1 attention backend selection test (#25471)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
59659b74c4
|
[Core] Optimize LoRA weight loading (#25403)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
fb64e67533
|
[Test]: Hermes tool parser stream output error in Qwen3 case (#25203)
Signed-off-by: Andreas Hartel <andreas.hartel@aleph-alpha.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
215da8510d
|
[Misc] Move DP for ViT code inside model executor dir (#25459)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
c4a15ee240
|
[Frontend] Add a new xml-based tool parser for qwen3-coder (#25028)
Signed-off-by: Zhikaiiii <1658973216@qq.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
921945c81e
|
[NIXL][OOT platform] support nixl_connector with oot platform and other nixl_backend (#25121)
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
a0b5617263
|
[V0 deprecation] Remove platform v1 controling interface (#25410)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
864bbe36f0
|
[Bugfix] Fix missing clear_connector_metadata (#25397)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
d96a3fc653
|
[Bugfix] fix custom op test (#25429)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:54 -07:00 |
|
|
|
aac85cc6d6
|
[Frontend] Responses API MCP tools for built in tools and to pass through headers (#24628)
Signed-off-by: Alec Solder <alecs@fb.com>
Signed-off-by: Alec S <10566873+alecsolder@users.noreply.github.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
6e9229e919
|
[CI/Build] Skip Qwen3-VL initialization tests until models are actually released (#25394)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
ff54b6bfe3
|
[KV offload][5/N] Add CPUOffloadingSpec (#24251)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
6dbbecd5b2
|
[torch.compile] Cleanup compilation tests and custom passes, add debug utils, fix DCE bug (#23091), fix test (#24376), and prep for custom op matching (#24604) (#24542)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: luka <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
d988b84e8e
|
[DP] support torchrun external launcher with Data Parallelism (#24899)
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
2a8bd2b93b
|
[CLI env var] Add VLLM_FLASH_ATTN_MAX_NUM_SPLITS_FOR_CUDA_GRAPH in env variables (#25274)
Signed-off-by: qqma <qqma@amazon.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: qqma <qqma@amazon.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
4bc6b5d2c3
|
[TPU] Deprecate xm.mark_step in favor of `torch_xla.sync (#25254)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
2f237d3df4
|
[V0 Deprecation] Remove MultiModalPlaceholderMap (#25366)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
243c358fa8
|
[V0 Deprecation] Remove V0-only methods in multi-modal registry (#25362)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
1b3aa0f297
|
[Bugfix] Fix hermes tool parser handling of non-string argument types (#22002)
Signed-off-by: wangzi <3220100013@zju.edu.cn>
Signed-off-by: David Chen <530634352@qq.com>
Co-authored-by: wangzi <3220100013@zju.edu.cn>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
dba6db9937
|
[Docs] GSM8K Accuracy Evaluation doc update (#25360)
Signed-off-by: David Chen <530634352@qq.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
5322390f1d
|
[Model] Support Dots OCR (#24645)
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: yinz-aizip <yinz@aizip.ai>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
5f6a36054a
|
Multimodal - audio tests (#25285)
Signed-off-by: Debolina Roy <debroy@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
e348e1027c
|
[Bugfix][V0 Deprecation][CI] use async mock and await for async method (#25325)
Signed-off-by: Yang <lymailforjob@gmail.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
a815d820ee
|
Remove V0 attention backends (#25351)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
b81364a7cd
|
[V0 Deprecation] Remove V0 sampling metadata (#25345)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
791089df20
|
feat: Enable engine-level arguments with speculators models (#25250)
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
71f2b5ddea
|
[V0 Deprecation] Remove async_output_proc, preemption mode, delay factor (#25334)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
81e17a1e26
|
[V0 Deprecation] Remove V0 Sequence class & Sampler (#25332)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
f6278e3065
|
[V1] Add sliding window support to Flex Attention backend (#24089)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
86fdd686be
|
[CI] Skip tests failing on main (#25326)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
171592330b
|
[Chore] Remove unused sampler in models (#25324)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
4bb2eb42d4
|
[V0 Deprecation] Remove V0 Output Processor (#25320)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
32d43a5a9e
|
[V0 Deprecation] Remove LLMEngine (#25033)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
9cfa7697c1
|
[V0 Deprecation] Enable the remaining multimodal tests in V1 (#25307)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|
|
|
9fc86d2802
|
[Core] Enable sharded state loader for V1 engine and enhance test coverage (#25308)
Signed-off-by: pengdrumli <pengdrumli@tencent.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 13:35:53 -07:00 |
|