Commit Graph

1062 Commits

Author SHA1 Message Date
2ca8f6e23d Make the stochastic fp8 rounding reproducible. 2024-08-26 15:12:06 -04:00
7985ff88b9 Use less memory in float8 lora patching by doing calculations in fp16. 2024-08-26 14:45:58 -04:00
c6812947e9 Fix potential memory leak. 2024-08-26 02:07:32 -04:00
9230f65823 Fix some controlnets OOMing when loading. 2024-08-25 05:54:29 -04:00
8ae23d8e80 Fix onnx export. 2024-08-23 17:52:47 -04:00
7df42b9a23 Fix dora. 2024-08-23 04:58:59 -04:00
5d8bbb7281 Cleanup. 2024-08-23 04:06:27 -04:00
2c1d2375d6 Fix. 2024-08-23 04:04:55 -04:00
64ccb3c7e3 Rework IPEX check for future inclusion of XPU into Pytorch upstream and do a bit more optimization of ipex.optimize(). (#4562) 2024-08-23 03:59:57 -04:00
9465b23432 Added SD15_Inpaint_Diffusers model support for unet_config_from_diffusers_unet function (#4565) 2024-08-23 03:57:08 -04:00
c0b0da264b Missing imports. 2024-08-22 17:20:51 -04:00
c26ca27207 Move calculate function to comfy.lora 2024-08-22 17:12:00 -04:00
7c6bb84016 Code cleanups. 2024-08-22 17:05:12 -04:00
c54d3ed5e6 Fix issue with models staying loaded in memory. 2024-08-22 15:58:20 -04:00
c7ee4b37a1 Try to fix some lora issues. 2024-08-22 15:32:18 -04:00
7b70b266d8 Generalize MacOS version check for force-upcast-attention (#4548)
This code automatically forces upcasting attention for MacOS versions 14.5 and 14.6. My computer returns the string "14.6.1" for `platform.mac_ver()[0]`, so this generalizes the comparison to catch more versions.

I am running MacOS Sonoma 14.6.1 (latest version) and was seeing black image generation on previously functional workflows after recent software updates. This PR solved the issue for me.

See comfyanonymous/ComfyUI#3521
2024-08-22 13:24:21 -04:00
8f60d093ba Fix issue. 2024-08-22 10:38:24 -04:00
843a7ff70c fp16 is actually faster than fp32 on a GTX 1080. 2024-08-21 23:23:50 -04:00
a60620dcea Fix slow performance on 10 series Nvidia GPUs. 2024-08-21 16:39:02 -04:00
015f73dc49 Try a different type of flux fp16 fix. 2024-08-21 16:17:15 -04:00
904bf58e7d Make --fast work on pytorch nightly. 2024-08-21 14:01:41 -04:00
5f50263088 Replace use of .view with .reshape (#4522)
When generating images with fp8_e4_m3 Flux and batch size >1, using --fast, ComfyUI throws a "view size is not compatible with input tensor's size and stride" error pointing at the first of these two calls to view.

As reshape is semantically equivalent to view except for working on a broader set of inputs, there should be no downside to changing this. The only difference is that it clones the underlying data in cases where .view would error out. I have confirmed that the output still looks as expected, but cannot confirm that no mutable use is made of the tensors anywhere.

Note that --fast is only marginally faster than the default.
2024-08-21 11:21:48 -04:00
03ec517afb Remove useless line, adjust windows default reserved vram. 2024-08-21 00:47:19 -04:00
510f3438c1 Speed up fp8 matrix mult by using better code. 2024-08-20 22:53:26 -04:00
ea63b1c092 Simpletrainer lycoris format. 2024-08-20 12:05:13 -04:00
9953f22fce Add --fast argument to enable experimental optimizations.
Optimizations that might break things/lower quality will be put behind
this flag first and might be enabled by default in the future.

Currently the only optimization is float8_e4m3fn matrix multiplication on
4000/ADA series Nvidia cards or later. If you have one of these cards you
will see a speed boost when using fp8_e4m3fn flux for example.
2024-08-20 11:55:51 -04:00
d1a6bd6845 Support loading long clipl model with the CLIP loader node. 2024-08-20 10:46:36 -04:00
83dbac28eb Properly set if clip text pooled projection instead of using hack. 2024-08-20 10:46:36 -04:00
538cb068bc Make cast_to a nop if weight is already good. 2024-08-20 10:46:36 -04:00
1b3eee672c Fix potential issue with multi devices. 2024-08-20 10:46:36 -04:00
9eee470244 New load_text_encoder_state_dicts function.
Now you can load text encoders straight from a list of state dicts.
2024-08-19 17:36:35 -04:00
045377ea89 Add a --reserve-vram argument if you don't want comfy to use all of it.
--reserve-vram 1.0 for example will make ComfyUI try to keep 1GB vram free.

This can also be useful if workflows are failing because of OOM errors but
in that case please report it if --reserve-vram improves your situation.
2024-08-19 17:16:18 -04:00
4d341b78e8 Bug fixes. 2024-08-19 16:28:55 -04:00
6138f92084 Use better dtype for the lowvram lora system. 2024-08-19 15:35:25 -04:00
be0726c1ed Remove duplication. 2024-08-19 15:26:50 -04:00
4506ddc86a Better subnormal fp8 stochastic rounding. Thanks Ashen. 2024-08-19 13:38:03 -04:00
20ace7c853 Code cleanup. 2024-08-19 12:48:59 -04:00
22ec02afc0 Handle subnormal numbers in float8 rounding. 2024-08-19 05:51:08 -04:00
39f114c44b Less broken non blocking? 2024-08-18 16:53:17 -04:00
6730f3e1a3 Disable non blocking.
It fixed some perf issues but caused other issues that need to be debugged.
2024-08-18 14:38:09 -04:00
73332160c8 Enable non blocking transfers in lowvram mode. 2024-08-18 10:29:33 -04:00
2622c55aff Automatically use RF variant of dpmpp_2s_ancestral if RF model. 2024-08-18 00:47:25 -04:00
1beb348ee2 dpmpp_2s_ancestral_RF for rectified flow (Flux, SD3 and Auraflow). 2024-08-18 00:33:30 -04:00
d31df04c8a Indentation. 2024-08-17 23:00:44 -04:00
e68763f40c Add Flux model support for InstantX style controlnet residuals (#4444)
* Add Flux model support for InstantX style controlnet residuals

* Refactor Flux controlnet residual step to a separate method

* Rollback minor change

* New format for applying controlnet residuals: input->double_blocks, output->single_blocks

* Adjust XLabs Flux controlnet to fit new syntax of applying Flux controlnet residuals

* Remove unnecessary import and minor style change
2024-08-17 22:58:23 -04:00
4f7a3cb6fb unet -> diffusion_models. 2024-08-17 21:31:04 -04:00
bb222ceddb Fix loras having a weak effect when applied on fp8. 2024-08-17 15:20:17 -04:00
fca42836f2 Add model_options for text encoder. 2024-08-17 11:17:20 -04:00
cd5017c1c9 calculate_weight function to use a different dtype. 2024-08-17 01:06:08 -04:00
83f343146a Fix potential lowvram issue. 2024-08-16 17:12:42 -04:00