|
|
d6953beb91
|
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 07:06:22 -07:00 |
|
|
|
d5e0fca264
|
[torch.compile] Cleanup compilation tests and custom passes, add debug utils, fix DCE bug (#23091), fix test (#24376), and prep for custom op matching (#24604) (#24542)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: luka <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-22 12:30:05 -07:00 |
|
|
|
e6585ddb45
|
[Bugfix] Fix accuracy issue for silu_mul + nvfp4 quant fusion kernel (#24833)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-17 16:37:23 -07:00 |
|
|
|
e68dc2f014
|
[Bugfix] Fix unstable silu_mul+nvfp4 quant fusion test (#24370)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-09-06 20:39:34 +00:00 |
|
|
|
eedb2a2a10
|
[Bugfix] Fix silu_mul+quant fusion test (#24341)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-09-05 20:13:42 +00:00 |
|
|
|
37241077d5
|
[Misc] Removed force_fp8_e4m3fnuz from FP8LinearOp (#23725)
Signed-off-by: Julien Lin <jullin@nvidia.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-04 09:25:40 -04:00 |
|
|
|
16a45b3a28
|
[NVIDIA] Support SiluMul + NVFP4 quant fusion (#23671)
Signed-off-by: jindih <jindih@nvidia.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: jindih <jindih@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedic <lgovedic@redhat.com>
|
2025-08-28 19:36:50 +00:00 |
|
|
|
f66673a39d
|
[Kernel] Added flashinfer fp8 per-tensor gemms (#22895)
Signed-off-by: Julien Lin <jullin@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-26 06:54:04 -07:00 |
|
|
|
31d5c1797f
|
[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf (#19830)
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 04:56:28 +00:00 |
|
|
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
|
|
7b2f28deba
|
[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-05-13 22:13:56 -07:00 |
|
|
|
4b2ed7926a
|
Improve configs - the rest! (#17562)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-09 15:18:44 -07:00 |
|
|
|
460a2b1100
|
[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations (#10867)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-05-01 07:59:28 -07:00 |
|