Commit Graph

21 Commits

Author SHA1 Message Date
5febdc8750 [Chore] Remove unused batched RoPE op & kernel (#24789)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-13 00:08:20 -07:00
d93c976a0d [Kernel] Have rotary embeddings support tensors (#18046)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-05-14 15:43:55 -07:00
98c89e16ff Make key optional for rotary embedding (#17566)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-05-07 00:11:46 -07:00
85ac82d228 [Kernel] Make rotary_embedding ops more flexible with input shape (#12777) 2025-02-06 08:46:13 -08:00
5467ac3196 [Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047) 2024-06-09 16:23:30 -04:00
5f6d10c14c [CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722) 2024-05-22 07:18:41 +00:00
7e9bd08f60 Add batched RoPE kernel (#3095) 2024-03-13 13:45:26 -07:00
77af974b40 [FIX] Support non-zero CUDA devices in custom kernels (#1959) 2024-01-02 19:09:59 -08:00
76a7983b23 [BugFix] Fix RoPE kernel on long sequences(#2164) 2023-12-17 17:09:10 -08:00
6ccc0bfffb Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Amir Balwel <amoooori04@gmail.com>
Co-authored-by: root <kuanfu.liu@akirakan.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com>
Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>
2023-12-07 23:16:52 -08:00
9f669a9a7c Support YaRN models (#1264)
Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
Co-authored-by: Viktor Ferenczi <viktor@ferenczi.eu>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2023-11-03 14:12:48 -07:00
c1376e0f82 Change scheduler & input tensor shape (#1381) 2023-10-16 17:48:42 -07:00
320a622ec4 [BugFix] Implement RoPE for GPT-J (#941) 2023-09-06 11:54:33 +09:00
8ce9c50d40 Avoid compiling kernels for double data type (#933) 2023-09-02 14:59:47 +09:00
1b0bd0fe8a Add Falcon support (new) (#592) 2023-08-02 14:04:39 -07:00
6fc2a38b11 Add support for LLaMA-2 (#505) 2023-07-20 11:38:27 -07:00
0b98ba15c7 Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00
e070829ae8 Support bfloat16 data type (#54) 2023-05-03 14:09:44 -07:00
a96d63c21d Add support for GPT-NeoX (Pythia) (#50) 2023-04-28 00:32:10 -07:00
897cb2ae28 Optimize data movement (#20) 2023-04-02 00:30:17 -07:00
88c0268a18 Implement custom kernel for LLaMA rotary embedding (#14) 2023-03-30 11:04:21 -07:00