Commit Graph

5608 Commits

Author SHA1 Message Date
84884cd9ac fix: tiny fix make format.sh excutable (#16015)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-04-03 15:18:05 +00:00
a43aa183dc [doc] update contribution link (#15922)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
2025-04-03 10:47:31 +00:00
463bbb1835 [Bugfix][V1] Fix bug from putting llm_engine.model_executor in a background process (#15367)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
2025-04-03 07:32:10 +00:00
5e125e74d1 [misc] improve error message for "Failed to infer device type" (#15994)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-04-03 14:45:03 +08:00
06f21ce7a5 [Benchmark] Add AIMO Dataset to Benchmark (#15955)
Signed-off-by: Ziji Shi <shi.ziji.sm@gmail.com>
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com>
2025-04-03 06:09:18 +00:00
57a810db9c [ROCM][V0] PA kennel selection when no sliding window provided (#15982)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
2025-04-03 05:28:44 +00:00
8b664706aa [bugfix] add seed in torchrun_example.py (#15980)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-04-03 12:25:01 +08:00
37bfee92bf fix: better error message for get_config close #13889 (#15943)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-04-03 03:53:19 +00:00
e73ff24e31 [ROCM][KERNEL] Paged attention for V1 (#15720)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com>
2025-04-02 19:48:00 -07:00
bd7599d34a [V1][TPU] Do not compile sampling more than needed (#15883)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-04-03 01:36:01 +00:00
01b6113659 [TPU] optimize the all-reduce performance (#15903)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
2025-04-03 00:25:14 +00:00
1b84eff03a [V1][TPU] TPU-optimized top-p implementation (avoids scattering). (#15736)
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com>
Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b.c.tpu-prod-env-large-adhoc.internal>
2025-04-02 17:18:08 -07:00
55acf86bf8 Fix huggingface-cli[hf-xet] -> huggingface-cli[hf_xet] (#15969)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-02 23:37:30 +00:00
f021b97993 [V1] Support Mistral3 in V1 (#15950)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-04-02 15:36:24 -07:00
1cab43c2d2 [misc] instruct pytorch to use nvml-based cuda check (#15951)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-04-03 01:02:58 +08:00
8bd651b318 Restricted cmake to be less than version 4 as 4.x breaks the build of… (#15859)
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>
2025-04-02 16:19:39 +00:00
58e234a754 [Misc] V1 LoRA support CPU offload (#15843)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-04-02 23:04:43 +08:00
e86c414d6a [Model] use AutoWeightsLoader in model load_weights (#15770)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-04-02 07:47:31 -07:00
550b2801ad [CPU][Bugfix] Using custom allreduce for CPU backend (#15934)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-04-02 07:46:47 -07:00
cefb9e5a28 [Frontend] Implement Tool Calling with tool_choice='required' (#13483)
Signed-off-by: Liangfu Chen <liangfc@amazon.com>
Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at>
Co-authored-by: Liangfu Chen <liangfc@amazon.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
2025-04-02 07:45:45 -07:00
98d7367b61 [Metrics] Hide deprecated metrics (#15458)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-04-02 07:37:19 -07:00
594a8b9030 [Bugfix] Fix the issue where the model name is empty string, causing no response with the model name. (#15938)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-04-02 06:33:52 -07:00
44f990515b [CI] Remove duplicate entrypoints-test (#15940)
Signed-off-by: Kay Yan <kay.yan@daocloud.io>
2025-04-02 02:44:01 -07:00
252937806c [Bugfix][Benchmarks] Ensure async_request_deepspeed_mii uses the OpenAI choices key (#15926)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-04-02 02:19:35 -07:00
51826d51fa Add minimum version for huggingface_hub to enable Xet downloads (#15873)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-02 02:03:36 -07:00
14e53ed11f [V1] Fix json_object support with xgrammar (#15488)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-04-02 02:00:08 -07:00
ddb94c2605 [core] Add tags parameter to wake_up() (#15500)
Signed-off-by: Eric <erictang000@gmail.com>
2025-04-02 01:59:27 -07:00
90969fb39a [Kernel] Add more dtype support for GGUF dequantization (#15879)
Signed-off-by: lukas.bluebaum <lukas.bluebaum@aleph-alpha.com>
2025-04-02 01:58:48 -07:00
101f1481f9 [Build/CI] Update lm-eval to 0.4.8 (#15912)
Signed-off-by: Chris Thi <chris.c.thi@gmail.com>
2025-04-02 01:47:57 -07:00
2edc87b161 [Bugfix] Fix cache block size calculation for CPU MLA (#15848)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
2025-04-02 01:45:02 -07:00
4203926f10 [CI/Build] Further clean up LoRA tests (#15920)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-04-02 01:39:09 -07:00
cdb57015a7 [Misc] Replace print with logger (#15923)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-04-02 01:37:38 -07:00
aa557e6422 [Benchmark]Fix error message (#15866)
Signed-off-by: wangli <wangli858794774@gmail.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2025-04-02 01:32:24 -07:00
0e00d40e4f [V1][Bugfix] Fix typo in MoE TPU checking (#15927)
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-04-01 23:46:42 -07:00
c920e01242 [Doc] Update rocm.inc.md (#15917)
Signed-off-by: chun37 <chun.jb.37@gmail.com>
2025-04-01 23:38:26 -07:00
274d8e8818 [V1][Minor] Enhance SpecDecoding Metrics Log in V1 (#15902)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-04-01 23:38:02 -07:00
2039c6305b [Bugfix] Fix imports for MoE on CPU (#15841)
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg>
2025-04-02 03:33:55 +00:00
6efb195a6e [V1] Fix: make sure k_index is int64 for apply_top_k_only (#15907)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
2025-04-01 19:06:44 -07:00
24b7fb455a [Spec Decode] Fix input triton kernel for eagle (#15909) 2025-04-01 18:15:14 -07:00
58f5a59769 [Docs] Add Intel as Sponsor (#15913)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-04-01 17:16:55 -07:00
db9dfcfa6a [Docs] Add Ollama meetup slides (#15905)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-04-01 13:58:59 -07:00
9ef98d527e [Model][MiniMaxText01] Support MiniMaxText01 model inference (#13454)
Signed-off-by: qscqesze <475517977@qq.com>
Co-authored-by: qingjun <qingjun@minimaxi.com>
Co-authored-by: qscqesze <475517977@qq.com>
2025-04-01 16:23:55 -04:00
93491aefc7 [BugFix] make sure socket close (#15875)
Signed-off-by: yihong0618 <zouzou0208@gmail.com>
2025-04-01 13:10:24 -07:00
7acd539cd7 [Docs] update usage stats language (#15898)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-04-01 12:54:13 -07:00
e75a6301bd [V1][Spec Decode] Implement Eagle Proposer [1/N] (#15729)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-04-01 12:33:16 -07:00
a79cc68b3a [V1][Metrics] Initial speculative decoding metrics (#15151)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-04-01 10:45:04 -07:00
7e3f7a4ee7 [CI] Disable flaky structure decoding test temporarily. (#15892)
Signed-off-by: Roger Wang <ywang@roblox.com>
2025-04-01 17:42:34 +00:00
9ec8257914 [Model] Add module name prefixes to gemma3 (#15889)
Signed-off-by: Bartholomew Sabat <bartek@recursal.ai>
Co-authored-by: Bartholomew Sabat <bartek@recursal.ai>
2025-04-01 10:13:40 -07:00
38327cf454 [Model] Aya Vision (#15441)
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-04-01 16:30:43 +00:00
dfa82e2a3d [CI/Build] Clean up LoRA tests (#15867)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-04-01 16:28:50 +00:00