6342adc438
fix: support clang17 for macos and fix the real libomp ( #16086 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-05 11:00:12 +00:00
550b2801ad
[CPU][Bugfix] Using custom allreduce for CPU backend ( #15934 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-04-02 07:46:47 -07:00
4f044b1d67
[Kernel][CPU] CPU MLA ( #14744 )
...
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg >
2025-03-25 09:34:59 +00:00
a597a57595
[Attention] Flash Attention 3 - fp8 ( #14570 )
...
Signed-off-by: Mickael Seznec <mickael@mistral.ai >
2025-03-20 01:14:20 -04:00
ff47aab056
[CPU] Upgrade CPU backend to torch-2.6 ( #13381 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-03-12 10:41:13 +00:00
ada19210a3
Adding cpu inference with VXE ISA for s390x architecture ( #12613 )
...
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com >
Signed-off-by: Rishika Kedia <rishika.kedia@in.ibm.com >
Co-authored-by: Rishika Kedia <rishika.kedia@in.ibm.com >
2025-03-06 08:40:53 -08:00
ed6ea06577
[Hardware] Update the flash attn tag to support Blackwell ( #14244 )
2025-03-05 22:01:37 -08:00
f95903909f
[Kernel] FlashMLA integration ( #13747 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-02-27 10:35:08 +08:00
4fc5c23bb6
[NVIDIA] Support nvfp4 quantization ( #12784 )
2025-02-12 19:51:51 -08:00
33e0602e59
[Misc] Fix improper placement of SPDX header in scripts ( #12694 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-02-03 11:16:59 -08:00
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files ( #12628 )
...
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**
commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:18:24 2025 -0500
Add SPDX license headers to python source files
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
also be easily used by tools to help manage license compliance.
The Linux Foundation runs license scans against the codebase to help
ensure
we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
More information can be found on the SPDX site:
- https://spdx.dev/learn/handling-license-info/
Signed-off-by: Russell Bryant <rbryant@redhat.com >
commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com >
Date: Fri Jan 31 14:36:32 2025 -0500
Check for SPDX headers using pre-commit
Signed-off-by: Russell Bryant <rbryant@redhat.com >
---------
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-02-02 11:58:18 -08:00
103bd17ac5
[Build] Only build 9.0a for scaled_mm and sparse kernels ( #12339 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-01-27 10:40:00 -05:00
cd9d06fb8d
Allow hip sources to be directly included when compiling for rocm. ( #12087 )
2025-01-15 16:46:03 -05:00
cfd3219f58
[Hardware][Apple] Native support for macOS Apple Silicon ( #11696 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
2025-01-08 16:35:49 +08:00
a6760f6456
[Feature] vLLM ARM Enablement for AARCH64 CPUs ( #9228 )
...
Signed-off-by: Sanket Kale <sanketk.kale@fujitsu.com >
Co-authored-by: Sanket Kale <sanketk.kale@fujitsu.com >
Co-authored-by: mgoin <michael@neuralmagic.com >
2024-11-25 18:32:39 -08:00
1ea291a417
Fix: Build error seen on Power Architecture ( #10421 )
...
Signed-off-by: Manjul Mohan <manjul.mohan@ibm.com >
Signed-off-by: B-201 <Joy25810@foxmail.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Signed-off-by: ismael-dm <ismaeldm99@gmail.com >
Signed-off-by: Andrew Nesbitt <andrewnez@gmail.com >
Signed-off-by: mgoin <michael@neuralmagic.com >
Signed-off-by: yan ma <yan.ma@intel.com >
Signed-off-by: Angus Wang <wangjadehao@gmail.com >
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: rickyx <rickyx@anyscale.com >
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Signed-off-by: Mengqing Cao <cmq0113@163.com >
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com >
Co-authored-by: Manjul Mohan manjul.mohan@ibm.com <manjulmohan@ltcd97-lp2.aus.stglabs.ibm.com >
Co-authored-by: B-201 <Joy25810@foxmail.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: ismael-dm <ismaeldm99@gmail.com >
Co-authored-by: Andrew Nesbitt <andrewnez@gmail.com >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
Co-authored-by: Yan Ma <yan.ma@intel.com >
Co-authored-by: Angus Wang <wangjadehao@gmail.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Ricky Xu <rickyx@anyscale.com >
Co-authored-by: Kevin H. Luu <kevin@anyscale.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Mengqing Cao <cmq0113@163.com >
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2024-11-19 09:34:57 -08:00
a6f332d0d9
[Hardware][CPU][bugfix] Fix half dtype support on AVX2-only target ( #10108 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2024-11-07 18:42:50 +08:00
a4b3e0c1e9
[Hardware][CPU] Update torch 2.5 ( #9911 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2024-11-07 04:43:08 +00:00
3cb07a36a2
[Misc] Upgrade to pytorch 2.5 ( #9588 )
...
Signed-off-by: Bill Nell <bill@neuralmagic.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2024-10-27 09:44:24 +00:00
5eda21e773
[Hardware][CPU] compressed-tensor INT8 W8A8 AZP support ( #9344 )
2024-10-17 12:21:04 -04:00
e5dc713c23
[Hardware][PowerPC] Make oneDNN dependency optional for Power ( #9039 )
...
Signed-off-by: Varad Ahirwadkar <varad.ahirwadkar1@ibm.com >
2024-10-04 17:24:42 +00:00
aeb37c2a72
[CI/Build] Per file CUDA Archs (improve wheel size and dev build times) ( #8845 )
2024-10-03 22:55:25 -04:00
57a0702e63
[Bugfix] Fix CPU CMake build ( #8723 )
...
Co-authored-by: Yuan <yuan.zhou@intel.com >
2024-09-22 20:40:46 -07:00
71c60491f2
[Kernel] Build flash-attn from source ( #8245 )
2024-09-20 23:27:10 -07:00
de6f90a13d
[Misc] guard against change in cuda library name ( #8609 )
2024-09-20 06:36:30 +08:00
73202dbe77
[Kernel][Misc] register ops to prevent graph breaks ( #6917 )
...
Co-authored-by: Sage Moore <sage@neuralmagic.com >
2024-09-11 12:52:19 -07:00
0b952af458
[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend ( #7257 )
2024-09-11 09:46:46 -07:00
f80ab3521c
Clean up remaining Punica C information ( #7027 )
2024-08-04 15:37:08 -07:00
a8d604ca2a
[Misc] Disambiguate quantized types via a new ScalarType ( #6396 )
2024-08-02 13:51:58 -07:00
3bbb4936dc
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation ( #6125 )
2024-07-26 13:50:10 -07:00
38a1674abb
Support CPU inference with VSX PowerPC ISA ( #5652 )
2024-06-26 21:53:04 +00:00
dd793d1de5
[Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes ( #5422 )
2024-06-25 15:56:15 -07:00
f758aed0e8
[Bugfix][CI/Build][AMD][ROCm]Fixed the cmake build bug which generate garbage on certain devices ( #5641 )
2024-06-18 23:21:29 -07:00
ab66536dbf
[CI/BUILD] Support non-AVX512 vLLM building and testing ( #5574 )
2024-06-17 14:36:10 -04:00
cd9c0d65d9
[Hardware][Intel] Support CPU inference with AVX2 ISA ( #5452 )
2024-06-13 17:22:24 -06:00
5467ac3196
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops ( #5047 )
2024-06-09 16:23:30 -04:00
c833101740
[Kernel] Refactor FP8 kv-cache with NVIDIA float8_e4m3 support ( #4535 )
2024-05-09 18:04:17 -06:00
59a6abf3c9
[Hotfix][CI/Build][Kernel] CUDA 11.8 does not support layernorm optimizations ( #3782 )
2024-04-08 14:31:02 -07:00
2ff767b513
Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) ( #3290 )
...
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Co-authored-by: HaiShaw <hixiao@gmail.com >
Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com >
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com >
Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu >
Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com >
Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com >
Co-authored-by: guofangze <guofangze@kuaishou.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2024-04-03 14:15:55 -07:00
0e3f06fe9c
[Hardware][Intel] Add CPU inference backend ( #3634 )
...
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: Yuan Zhou <yuan.zhou@intel.com >
2024-04-01 22:07:30 -07:00
b6d103542c
[Kernel] Layernorm performance optimization ( #3662 )
2024-03-30 14:26:38 -07:00
51c31bc10c
CMake build elf without PTX ( #3739 )
2024-03-30 01:53:08 +00:00
3ad438c66f
Fix build when nvtools is missing ( #3698 )
2024-03-29 18:52:39 -07:00
01bfb22b41
[CI] Try introducing isort. ( #3495 )
2024-03-25 07:59:47 -07:00
9fdf3de346
Cmake based build system ( #2830 )
2024-03-18 15:38:33 -07:00