|
|
ba7703e659
|
[Misc] Remove qlora_adapter_name_or_path (#17699)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-06 23:10:37 -07:00 |
|
|
|
f80ae5bdcf
|
[Kernel] Use fused rmsnorm for some models like qwen3 series (#17735)
Signed-off-by: evian <eviantai@u.nus.edu>
Co-authored-by: evian <eviantai@u.nus.edu>
|
2025-05-06 23:10:02 -07:00 |
|
|
|
1a45a61387
|
[Kernel] GGUF MoeVec kernel (#16780)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-05-06 23:07:23 -07:00 |
|
|
|
c3e9d5060e
|
[Misc] Use apply_rotary_emb from vllm_flash_attn for Qwen2-VL vision RoPE (#17726)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-07 04:51:33 +00:00 |
|
|
|
822de7fb94
|
[Misc] Split model loader (#17712)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-05-07 12:42:26 +08:00 |
|
|
|
8d84d836d1
|
[BugFix][Spec Decode] Fix hidden size mismatch between target and eagle head (#17740)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-05-06 19:51:26 -07:00 |
|
|
|
950b71186f
|
Replace lm-eval bash script with pytest and use enforce_eager for faster CI (#17717)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-06 18:00:10 -07:00 |
|
|
|
e50a1f1a9c
|
[TPU] Add kernel test for moe_pallas (#17496)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2025-05-06 17:59:57 -07:00 |
|
|
|
a17cef70ea
|
Removed unused marlin cuda code (#17684)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-06 17:59:47 -07:00 |
|
|
|
18dd5e01f2
|
[Model] Mamba2 causal conv1d Refactor to Split Prefill and Decode Requests for Corresponding Kernels (#17146)
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com>
|
2025-05-06 17:59:30 -07:00 |
|
|
|
6de3e13413
|
Add logging for torch nightly version (#17669)
Signed-off-by: Yang Wang <elainewy@meta.com>
|
2025-05-07 00:45:51 +00:00 |
|
|
|
ed3a1d2106
|
[ROCm] fix num_stages for default moe config to avoid triton OutOfResource error (#17744)
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
|
2025-05-07 00:39:48 +00:00 |
|
|
|
022afbeb4e
|
Fix doc build performance (#17748)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-07 00:36:41 +00:00 |
|
|
|
2f925e5777
|
[Kernel] Unified Triton kernel that doesn't distinguish between prefill + decode (#16828)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-06 18:21:48 -04:00 |
|
|
|
de906b95f9
|
[Bugfix] Fix for the condition to accept empty encoder inputs for mllama (#17732)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-05-06 19:59:06 +00:00 |
|
|
|
d456aea71f
|
[Misc] Add Next Edit Prediction (NEP) datasets support in benchmark_serving.py (#16839)
Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
Signed-off-by: dtransposed <>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b.c.jetbrains-grazie.internal>
|
2025-05-06 15:38:45 -04:00 |
|
|
|
621ca2c0ab
|
[TPU] Increase block size and reset block shapes (#16458)
|
2025-05-06 13:55:04 -04:00 |
|
|
|
6115b11582
|
Make right sidebar more readable in "Supported Models" (#17723)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-06 16:48:26 +00:00 |
|
|
|
5b8c390747
|
[Bugfix] Fix modality limits in vision language example (#17721)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-06 16:12:28 +00:00 |
|
|
|
7525d5f3d5
|
[doc] Add RAG Integration example (#17692)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-05-06 16:10:23 +00:00 |
|
|
|
aabcd2cae3
|
[v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager (#17479)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-05-06 08:50:34 -07:00 |
|
|
|
0d115460a7
|
[Docs] Use gh-file to add links to tool_calling.md (#17709)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-05-06 15:27:19 +00:00 |
|
|
|
175bda67a1
|
[Feat] Add deprecated=True to CLI args (#17426)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-05-06 08:11:27 -07:00 |
|
|
|
cba31c47c4
|
[v1] AttentionMetadata for each layer (#17394)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-05-06 07:58:37 -07:00 |
|
|
|
a6fed02068
|
[V1][PP] Support PP for MultiprocExecutor (#14219)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: jiang.li <jiang1.li@intel.com>
|
2025-05-06 07:58:05 -07:00 |
|
|
|
d419aa5dc4
|
[V1] Enable TPU V1 backend by default (#17673)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-06 06:49:49 -07:00 |
|
|
|
f9bc5a0693
|
[Bugfix] Fix triton import with local TritonPlaceholder (#17446)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-05-06 17:53:09 +08:00 |
|
|
|
05e1f96419
|
Fix dockerfilegraph pre-commit hook (#17698)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-06 08:56:48 +00:00 |
|
|
|
6eae34533a
|
[Misc] Fix ScalarType float4 naming (#17690)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-06 01:07:15 -07:00 |
|
|
|
63ced7b43f
|
[Doc] Update notes for H2O-VL and Gemma3 (#17219)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-06 07:51:02 +00:00 |
|
|
|
dc47ba32f8
|
[Bugfix] Fixed prompt length for random dataset (#17408)
Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com>
|
2025-05-06 07:00:08 +00:00 |
|
|
|
edbf2d609e
|
[easy] Fix logspam on PiecewiseBackend errors (#17138)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-05-05 23:46:11 -07:00 |
|
|
|
999328be0d
|
[Model] Add GraniteMoeHybrid 4.0 model (#17497)
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
|
2025-05-06 12:00:31 +08:00 |
|
|
|
98834fefaa
|
Update nm to rht in doc links + refine fp8 doc (#17678)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-05-06 00:41:14 +00:00 |
|
|
|
90bd2ae172
|
[Bugfix] LoRA - Retire unused maxnreg LoRA kernel argument (#17677)
|
2025-05-05 17:34:29 -07:00 |
|
|
|
5941e0b7ea
|
[TPU][V1] Add support for top-logprobs (#17072)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-05-05 14:20:15 -07:00 |
|
|
|
9765940824
|
[TPU] Enable gemma3-27b with TP>1 on multi-chips. (#17335)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2025-05-05 14:19:58 -07:00 |
|
|
|
5ea5c514da
|
[BugFix] Increase timeout for startup failure test (#17642)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-05-05 20:53:19 +00:00 |
|
|
|
d3efde8176
|
[Benchmarks] Remove invalid option under V1 engine (#17651)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-05-05 16:30:22 -04:00 |
|
|
|
aea302be6c
|
Use git-path commit in hook (#17616)
Signed-off-by: Thomas J. Fan <thomasjpfan@gmail.com>
|
2025-05-05 17:55:32 +00:00 |
|
|
|
cc05b90d86
|
[Doc] Fix broken cuda installation doc rendering (#17654)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-05 17:52:40 +00:00 |
|
|
|
1d0c9d6b2d
|
[Kernel] some optimizations for dense marlin and moe marlin (#16850)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-05-05 09:39:30 -07:00 |
|
|
|
f62cad6431
|
[Build/CI] Upgrade CUTLASS to 3.9.2 (#17641)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-05-04 19:23:17 -07:00 |
|
|
|
5394ad7387
|
[Bugfix] fix KeyError on top logprobs are special tokens (#17637)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-05-04 19:22:35 -07:00 |
|
|
|
68e1ee0072
|
[Bugfix][Easy] Fix whitespace in shm_broadcast.py logging (#17635)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-05-04 19:20:19 -07:00 |
|
|
|
2858830c39
|
[Bugfix] Prioritize dtype in root config before checking text config (#17629)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-04 12:43:05 +00:00 |
|
|
|
d6484ef3c3
|
Add full API docs and improve the UX of navigating them (#17485)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-03 19:42:43 -07:00 |
|
|
|
46fae69cf0
|
[Misc] V0 fallback for --enable-prompt-embeds (#17615)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-03 22:59:24 +00:00 |
|
|
|
f66f1e0fa3
|
[Bugfix] Fix broken Qwen2.5-omni tests (#17613)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-05-03 17:08:14 +00:00 |
|
|
|
887d7af882
|
[Core] Gate prompt_embeds behind a feature flag (#17607)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-05-04 00:19:20 +08:00 |
|