d42613686f
Fix issue with fp8 ops on some models. ( #8045 )
...
_scaled_mm errors when an input is non contiguous.
2025-05-10 07:52:56 -04:00
ac10a0d69e
Make loras work with --async-offload ( #7824 )
2025-04-26 19:56:22 -04:00
0dcc75ca54
Add experimental --async-offload lowvram weight offloading. ( #7820 )
...
This should speed up the lowvram mode a bit. It currently is only enabled when --async-offload is used but it will be enabled by default in the future if there are no problems.
2025-04-26 16:11:21 -04:00
9ad792f927
Basic support for hidream i1 model.
2025-04-15 17:35:05 -04:00
8a438115fb
add RMSNorm to comfy.ops
2025-04-14 18:00:33 -04:00
1714a4c158
Add CublasOps support ( #7574 )
...
* CublasOps support
* Guard CublasOps behind --fast arg
2025-04-12 18:29:15 -04:00
70e15fd743
No need for scale_input when fp8 matrix mult is disabled.
2025-03-07 04:49:20 -05:00
e1474150de
Support fp8_scaled diffusion models that don't use fp8 matrix mult.
2025-03-07 04:39:21 -05:00
4dc6709307
Rename argument in last commit and document the options.
2025-03-01 02:43:49 -05:00
4d55f16ae8
Use enum list for --fast options ( #7024 )
2025-03-01 02:37:35 -05:00
cf0b549d48
--fast now takes a number as argument to indicate how fast you want it.
...
The idea is that you can indicate how much quality vs speed you want.
At the moment:
--fast 2 enables fp16 accumulation if your pytorch supports it.
--fast 5 enables fp8 matrix mult on fp8 models and the optimization above.
--fast without a number enables all optimizations.
2025-02-28 02:48:20 -05:00
ab888e1e0b
Add add_weight_wrapper function to model patcher.
...
Functions can now easily be added to wrap/modify model weights.
2025-02-12 05:55:35 -05:00
99a1fb6027
Make fast fp8 take a bit less peak memory.
2024-12-24 18:05:19 -05:00
fbf68c4e52
clamp input ( #5928 )
2024-12-07 14:00:31 -05:00
915fdb5745
Fix lowvram edge case.
2024-10-22 16:34:50 -04:00
8ce2a1052c
Optimizations to --fast and scaled fp8.
2024-10-22 02:12:28 -04:00
0075c6d096
Mixed precision diffusion models with scaled fp8.
...
This change allows supports for diffusion models where all the linears are
scaled fp8 while the other weights are the original precision.
2024-10-21 18:12:51 -04:00
83ca891118
Support scaled fp8 t5xxl model.
2024-10-20 22:27:00 -04:00
f9f9faface
Fixed model merging issue with scaled fp8.
2024-10-20 06:24:31 -04:00
a68bbafddb
Support diffusion models with scaled fp8 weights.
2024-10-19 23:47:42 -04:00
67158994a4
Use the lowvram cast_to function for everything.
2024-10-17 17:25:56 -04:00
e38c94228b
Add a weight_dtype fp8_e4m3fn_fast to the Diffusion Model Loader node.
...
This is used to load weights in fp8 and use fp8 matrix multiplication.
2024-10-09 19:43:17 -04:00
9c41bc8d10
Remove useless line.
2024-09-23 02:32:29 -04:00
dc96a1ae19
Load controlnet in fp8 if weights are in fp8.
2024-09-21 04:50:12 -04:00
8ae23d8e80
Fix onnx export.
2024-08-23 17:52:47 -04:00
c7ee4b37a1
Try to fix some lora issues.
2024-08-22 15:32:18 -04:00
904bf58e7d
Make --fast work on pytorch nightly.
2024-08-21 14:01:41 -04:00
5f50263088
Replace use of .view with .reshape ( #4522 )
...
When generating images with fp8_e4_m3 Flux and batch size >1, using --fast, ComfyUI throws a "view size is not compatible with input tensor's size and stride" error pointing at the first of these two calls to view.
As reshape is semantically equivalent to view except for working on a broader set of inputs, there should be no downside to changing this. The only difference is that it clones the underlying data in cases where .view would error out. I have confirmed that the output still looks as expected, but cannot confirm that no mutable use is made of the tensors anywhere.
Note that --fast is only marginally faster than the default.
2024-08-21 11:21:48 -04:00
03ec517afb
Remove useless line, adjust windows default reserved vram.
2024-08-21 00:47:19 -04:00
510f3438c1
Speed up fp8 matrix mult by using better code.
2024-08-20 22:53:26 -04:00
9953f22fce
Add --fast argument to enable experimental optimizations.
...
Optimizations that might break things/lower quality will be put behind
this flag first and might be enabled by default in the future.
Currently the only optimization is float8_e4m3fn matrix multiplication on
4000/ADA series Nvidia cards or later. If you have one of these cards you
will see a speed boost when using fp8_e4m3fn flux for example.
2024-08-20 11:55:51 -04:00
538cb068bc
Make cast_to a nop if weight is already good.
2024-08-20 10:46:36 -04:00
39f114c44b
Less broken non blocking?
2024-08-18 16:53:17 -04:00
6730f3e1a3
Disable non blocking.
...
It fixed some perf issues but caused other issues that need to be debugged.
2024-08-18 14:38:09 -04:00
73332160c8
Enable non blocking transfers in lowvram mode.
2024-08-18 10:29:33 -04:00
b85216a3c0
Lower T5 memory usage by a few hundred MB.
2024-07-31 00:52:34 -04:00
25853d0be8
Use common function for casting weights to input.
2024-07-30 10:49:14 -04:00
bb1969cab7
Initial support for the stable audio open model.
2024-06-15 12:14:56 -04:00
6c23854f54
Fix OSX latent2rgb previews.
2024-05-22 13:56:28 -04:00
448d9263a2
Fix control loras breaking.
2024-03-14 09:30:21 -04:00
db8b59ecff
Lower memory usage for loras in lowvram mode at the cost of perf.
2024-03-13 20:07:27 -04:00
667c92814e
Stable Cascade Stage B.
2024-02-16 13:02:03 -05:00
78a70fda87
Remove useless import.
2024-01-19 15:38:05 -05:00
36a7953142
Greatly improve lowvram sampling speed by getting rid of accelerate.
...
Let me know if this breaks anything.
2023-12-22 14:38:45 -05:00
77755ab8db
Refactor comfy.ops
...
comfy.ops -> comfy.ops.disable_weight_init
This should make it more clear what they actually do.
Some unused code has also been removed.
2023-12-11 23:27:13 -05:00
ba07cb748e
Use faster manual cast for fp8 in unet.
2023-12-11 18:24:44 -05:00
57926635e8
Switch text encoder to manual cast.
...
Use fp16 text encoder weights for CPU inference to lower memory usage.
2023-12-10 23:00:54 -05:00
af365e4dd1
All the unet ops with weights are now handled by comfy.ops
2023-12-04 03:12:18 -05:00
412d3ff57d
Refactor.
2023-11-11 01:11:06 -05:00
00c0b2c507
Initialize text encoder to target dtype.
2023-08-23 21:01:15 -04:00