Commit Graph

1324 Commits

Author SHA1 Message Date
8ad7285ea2 [Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. (#22035)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-15 14:46:00 -04:00
a0632a3e03 [Frontend] Expose do_log_stats interval to env (#22905)
Signed-off-by: Csrayz <jover@cmbchina.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-15 13:00:20 +00:00
Nir
637093ae26 docs: update fastsafetensors usage instructions (#22891)
Signed-off-by: Nir Levy <bhr166@gmail.com>
2025-08-14 19:56:54 +00:00
92ff41abea [Model] Modify the gate implementation of glm4_moe (#22832)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-14 05:28:50 -07:00
0783f13960 [Doc] fix dead link (#22898)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
2025-08-14 04:06:13 -07:00
00e3f9da46 vLLM Benchmark suite improvement (#22119)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com>
2025-08-14 07:12:17 +00:00
3f52738dce [Doc] Add max_lora_rank configuration guide (#22782)
Signed-off-by: chiliu <cliu_whu@yeah.net>
2025-08-13 04:10:07 -07:00
e18859298d Add hardware plugins to installation doc (#22732)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-12 17:14:46 -07:00
fde0b611a3 [Model] Decouple glm4v (#22751)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-12 17:13:17 -07:00
45c3936e94 [Docs] Hide the navigation and toc sidebars on home page (#22749)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-12 17:12:26 -07:00
80bb1e8afe Officially support SmolLM3 using the Transformers backend (#22665)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-12 05:38:48 -07:00
767e63b860 [Docs] Improve docs navigation (#22720)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-12 04:25:55 -07:00
9f909b8996 [New Model] Support Command-A-Vision (#22660)
Signed-off-by: donglu <donglu@cohere.com>
2025-08-12 01:39:54 -07:00
59f3b93636 [DOC] update v1_guide with INTEL HW (#22679)
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
2025-08-12 01:22:49 -07:00
2f4657952b [doc] Update x86 CPU-inference installation doc to reflect optionality of AVX512f (#22707)
Signed-off-by: Sooraj S <94284954+sooraj-satheesh@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com>
2025-08-12 00:21:08 -07:00
3a7e3bbdd2 [Doc] Added unmentioned required option "method" in the usage of EAGLE-3 based models (#21737)
Signed-off-by: Dilute-l <dilu2333@163.com>
Co-authored-by: Dilute-l <dilu2333@163.com>
2025-08-12 00:14:51 -07:00
3fa5b25845 Document aarch64 CPU support works (#22646)
Signed-off-by: Eric Curtin <ecurtin@redhat.com>
2025-08-11 07:22:45 -07:00
bc1d02ac85 [Docs] Add comprehensive CLI reference for all large vllm subcommands (#22601)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-11 00:13:33 -07:00
b81fe83b2c [doc] add alibaba cloud as sponsor (#22597)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-08-10 23:13:47 +08:00
0757551c96 [doc] add beijing meetup links (#22596)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-08-10 22:51:36 +08:00
00976db0c3 [Docs] Fix warnings in docs build (#22588)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-10 05:49:51 -07:00
010e0e39ea [Doc] Fix API doc link in side navigation (#22585)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-08-10 01:35:22 -07:00
c49848396d Refactor sliding window configuration to Transformers best practice (#21927)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-09 20:50:48 -07:00
5a16fa614c [Model] Gemma3n MM (#20495)
Signed-off-by: ShriKode <shrikode@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: ShriKode <shrikode@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
2025-08-09 09:56:25 -07:00
56186474f6 [Docs] Reduce noise in docs and --help from the JSON tip (#22567)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-09 08:31:32 -07:00
a6022e6fbc GLM-4.5V with new class name at transformers (#22520)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-09 00:50:21 -07:00
2be07a0db1 Update docs for Minimax-Text support (#22562)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-08-09 00:18:18 -07:00
8a0ffd6285 Remove mamba_ssm from vLLM requirements; install inside test container using --no-build-isolation (#22541)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-08-08 23:05:32 -07:00
23472ff51c [Doc] Add usage of implicit text-only mode (#22561)
Signed-off-by: Roger Wang <hey@rogerw.me>
Co-authored-by: Flora Feng <4florafeng@gmail.com>
2025-08-08 23:04:19 -07:00
baece8c3d2 [Frontend] Add unix domain socket support (#18097)
Signed-off-by: <yyweiss@gmail.com>
Signed-off-by: yyw <yyweiss@gmail.com>
2025-08-08 16:23:44 -07:00
2fcf6b27b6 [Docs] fix broken links in metrics.md (#22315)
Signed-off-by: Guy Stone <guys@spotify.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-08 16:22:35 -07:00
e290594072 [Docs] Rename “Distributed inference and serving” to “Parallelism & Scaling” (#22466)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-08-08 19:26:21 +00:00
7be7f3824a [Docs] Improve API docs (+small tweaks) (#22459)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-08 03:02:51 -07:00
099c046463 [Doc] Sleep mode documentation (#22310)
Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com>
Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com>
Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Hong Hanh <hanh.usth@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-08-08 12:25:18 +08:00
139d155781 [Frontend] Use engine argument to control MM cache size (#22441)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-07 09:47:10 -07:00
766bc8162c [Core] Store only the keys for multi-modal data in P0 (#22198)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-07 01:45:04 -07:00
289b18e670 [Docs] Update features/disagg_prefill, add v1 examples and development (#22165)
Signed-off-by: David Chen <530634352@qq.com>
2025-08-07 00:59:23 -07:00
a2c6696bfe [Docs] Factor out troubleshooting to its own guide; add section for Ray Observability (#21578)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2025-08-07 00:29:13 -07:00
5e8398805e [Doc] Fix link to prefix caching design (#22384)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-08-07 00:28:15 -07:00
609b533cb6 [Bugfix] Add proper comparison for package versions (#22314)
Signed-off-by: Syed Muhammad Bin Asif <syedmba7@connect.hku.hk>
2025-08-06 20:31:03 -07:00
41b67f4263 [model] Support MiniCPM-V 4.0 (#22166)
Co-authored-by: imning3 <hbning@pku.edu.cn>
2025-08-06 18:35:46 -07:00
46a13949d5 [v1] - Mamba1 Attention Metadata (#21249)
Signed-off-by: asafg <asafg@ai21.com>
Co-authored-by: asafg <asafg@ai21.com>
2025-08-06 17:03:42 -07:00
54991c548a [gpt-oss] add model to supported models doc (#22336)
Signed-off-by: Roger Wang <hey@rogerw.me>
2025-08-06 01:49:44 -07:00
d1bf1b9711 [Docs][TPU] Highlight TPU Software version selection (#22242)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-08-05 02:33:46 -07:00
6fa41e0c32 self.gate dtype update for GLM-4.5 (#22203)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
2025-08-04 19:12:38 -07:00
1539ced93a [Doc] Update pooling model docs (#22186)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-04 03:37:06 -07:00
a7b8788d2c [Misc] Modify the organization of GLM series (#22171)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-03 23:51:20 -07:00
83f7bbb318 Add chat doc in quick start (#21213)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-08-03 07:47:55 -07:00
25373b6c6c for glm-4.1V update (#22000)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-08-02 01:46:57 -07:00
067c34a155 docs: remove deprecated disable-log-requests flag (#22113)
Signed-off-by: Roger Wang <hey@rogerw.me>
2025-08-02 00:19:48 -07:00