|
|
644d57d531
|
[Model] Add Ernie4.5 VL Model Support (#22514)
Signed-off-by: wangyafeng <wangyafeng@baidu.com>
|
2025-08-26 21:02:55 -07:00 |
|
|
|
6dab89b8ec
|
[Docs] Fix math rendering in docs (#23676)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-26 18:47:08 -07:00 |
|
|
|
6421b66bf4
|
[Docs] Move quant supported hardware table to README (#23663)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-26 22:26:46 +00:00 |
|
|
|
d696f86e7b
|
[doc] Hybrid KV Cache Manager design doc (#22688)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-26 20:19:05 +00:00 |
|
|
|
9816b81f5f
|
[Model] Enable video support for InternVL3.5 models (#23658)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-26 19:46:52 +00:00 |
|
|
|
227e231b55
|
[Docs] [V1] [Hybrid] Update docs to remove FlashInfer constraint for hybrid models (#23665)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-26 18:33:16 +00:00 |
|
|
|
379f828fba
|
[Docs] Reduce requirements for docs build (#23651)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-26 15:43:28 +00:00 |
|
|
|
7c04779afa
|
[Doc]: fix various spelling issues in multiple files (#23636)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-08-26 14:05:29 +00:00 |
|
|
|
164b2273c8
|
[Docs] Fix broken links to docs/api/summary.md (#23637)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-26 13:00:18 +00:00 |
|
|
|
ff77764f86
|
Fix CLI parameter documentation inconsistency in pooling_models.md (#23630)
|
2025-08-26 01:05:37 -07:00 |
|
|
|
bfc1edc9f5
|
[Docs] Fix titles for multi-file examples that are rendered in the docs (#23573)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-26 00:16:44 -07:00 |
|
|
|
7b6a837275
|
[Docs] Update Documentation of Cohere Command-A Models (#23584)
Signed-off-by: Terrencezzj <terrence@cohere.ai>
Signed-off-by: Abatom <abzhonghua@gmail.com>
Co-authored-by: Zhonghua Deng <abzhonghua@gmail.com>
|
2025-08-25 21:53:52 +00:00 |
|
|
|
e269be2ba2
|
[Doc] Add caution for API server scale-out (#23550)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-25 06:14:15 -07:00 |
|
|
|
d0a4a3f645
|
[misc] add shanghai meetup (#23535)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-08-25 17:00:03 +08:00 |
|
|
|
47455c424f
|
[Doc: ]fix various typos in multiple files (#23487)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-25 00:04:04 +00:00 |
|
|
|
e2db1164a1
|
[Model] Enable BLOOM on V1 (#23488)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-24 13:30:47 +00:00 |
|
|
|
416f05929a
|
[New Model]Donut model (#23229)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-08-24 12:52:24 +00:00 |
|
|
|
b8f17f5d98
|
Support DeepSeek-V3.1 tool call (#23454)
Signed-off-by: Xu Wenqing <xuwq1993@qq.com>
|
2025-08-23 05:50:16 +00:00 |
|
|
|
23c939fd30
|
[Model] Support DP for ViT on MiniCPM-V-4 (#23327)
Signed-off-by: ycyaw66 <497410282@qq.com>
Co-authored-by: ycyaw66 <497410282@qq.com>
|
2025-08-23 02:14:41 +00:00 |
|
|
|
0313cf854d
|
[PERF] PyTorch Symmetric Memory All-Reduce (#20759)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-22 15:39:08 -06:00 |
|
|
|
a073be6d87
|
[Doc] Update the doc for log probs + prefix caching (#23399)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-22 13:20:39 +00:00 |
|
|
|
5964069367
|
[New Model] Add Seed-Oss model (#23241)
Signed-off-by: jiabin.00 <jiabin.00@bytedance.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-22 04:58:10 +00:00 |
|
|
|
8896eb72eb
|
[Deprecation] Remove prompt_token_ids arg fallback in LLM.generate and LLM.embed (#18800)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-22 10:56:57 +08:00 |
|
|
|
5cc54f7c5b
|
[Doc] Fix batch-level DP example (#23325)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-08-21 06:16:38 -07:00 |
|
|
|
2e2000f352
|
[Model] Add LFM2 architecture (#22845)
Signed-off-by: Paul Pak <paulpak58@gmail.com>
|
2025-08-21 09:35:07 +02:00 |
|
|
|
f571ff8eb6
|
[Sampler] Support returning final logprobs (#22387)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-20 21:28:32 -07:00 |
|
|
|
655a09f653
|
[Model][VLM] Support R-4B Model (#23246)
Signed-off-by: yannqi <yannqi@qq.com>
Signed-off-by: 杨奇(yann qi) <51905299+yannqi@users.noreply.github.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: yannqiyang <yannqiyang@tencent.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-08-21 04:08:52 +00:00 |
|
|
|
3663870c72
|
[V1][Mamba1] - Full CUDA and Piecewise CUDA Graphs Support (#23035)
Signed-off-by: asafg <asafg@ai21.com>
Signed-off-by: asafg <39553475+Josephasafg@users.noreply.github.com>
Co-authored-by: asafg <asafg@ai21.com>
|
2025-08-20 20:08:51 -07:00 |
|
|
|
5efd6905bc
|
[CLI][Doc] Formalize --mm-encoder-tp-mode (#23190)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-20 23:42:28 +08:00 |
|
|
|
c6d80a7a96
|
[Model] Improve olmo and olmo2 (#23228)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-20 12:47:05 +00:00 |
|
|
|
3aa8c10038
|
Fix missing quotes (#23242)
Signed-off-by: Shiming Zhang <wzshiming@hotmail.com>
|
2025-08-20 10:46:59 +00:00 |
|
|
|
64ab3c7253
|
[Doc] Update V1 status of various pooling models (#23189)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-20 10:33:41 +08:00 |
|
|
|
21dce80ea9
|
[CI/Build] Add support for Python 3.13 (#13164)
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-19 13:49:34 -07:00 |
|
|
|
b87cb97a53
|
[Model] support new model ovis2.5 (#23084)
Signed-off-by: myselvess <244285088@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-19 13:12:59 +00:00 |
|
|
|
2c3f557f08
|
[Doc] use power of 2 (#23172)
|
2025-08-19 03:16:23 -07:00 |
|
|
|
fda9537c5e
|
[Model] Support Pipeline Parallelism for moonshotai/Kimi-VL-A3B-Thinking-2506 (#23114)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-08-19 14:24:31 +08:00 |
|
|
|
27e8d1ea3e
|
[Refactor] Define MultiModalKwargsItems separate from MultiModalKwargs (#23053)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-18 09:52:00 +00:00 |
|
|
|
16bff144be
|
[Misc] fix typo in the multimodal doc (#23051)
|
2025-08-17 01:56:20 -07:00 |
|
|
|
4fc722eca4
|
[Kernel/Quant] Remove AQLM (#22943)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-08-16 19:38:21 +00:00 |
|
|
|
829bbd7882
|
[New Model]mBART model (#22883)
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
|
2025-08-16 12:16:58 +00:00 |
|
|
|
8ad7285ea2
|
[Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. (#22035)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-15 14:46:00 -04:00 |
|
|
|
a0632a3e03
|
[Frontend] Expose do_log_stats interval to env (#22905)
Signed-off-by: Csrayz <jover@cmbchina.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-15 13:00:20 +00:00 |
|
|
|
637093ae26
|
docs: update fastsafetensors usage instructions (#22891)
Signed-off-by: Nir Levy <bhr166@gmail.com>
|
2025-08-14 19:56:54 +00:00 |
|
|
|
92ff41abea
|
[Model] Modify the gate implementation of glm4_moe (#22832)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-14 05:28:50 -07:00 |
|
|
|
0783f13960
|
[Doc] fix dead link (#22898)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
|
2025-08-14 04:06:13 -07:00 |
|
|
|
00e3f9da46
|
vLLM Benchmark suite improvement (#22119)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com>
|
2025-08-14 07:12:17 +00:00 |
|
|
|
3f52738dce
|
[Doc] Add max_lora_rank configuration guide (#22782)
Signed-off-by: chiliu <cliu_whu@yeah.net>
|
2025-08-13 04:10:07 -07:00 |
|
|
|
e18859298d
|
Add hardware plugins to installation doc (#22732)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-12 17:14:46 -07:00 |
|
|
|
fde0b611a3
|
[Model] Decouple glm4v (#22751)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-12 17:13:17 -07:00 |
|
|
|
45c3936e94
|
[Docs] Hide the navigation and toc sidebars on home page (#22749)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-12 17:12:26 -07:00 |
|