Commit Graph

343 Commits

Author SHA1 Message Date
5d8898c056 Fix some performance issues with weight loading and unloading.
Lower peak memory usage when changing model.

Fix case where model weights would be unloaded and reloaded.
2024-03-28 18:04:42 -04:00
c6de09b02e Optimize memory unload strategy for more optimized performance. 2024-03-24 02:36:30 -04:00
4b9005e949 Fix regression with model merging. 2024-03-20 13:56:12 -04:00
c18a203a8a Don't unload model weights for non weight patches. 2024-03-20 02:27:58 -04:00
db8b59ecff Lower memory usage for loras in lowvram mode at the cost of perf. 2024-03-13 20:07:27 -04:00
0ed72befe1 Change log levels.
Logging level now defaults to info. --verbose sets it to debug.
2024-03-11 13:54:56 -04:00
65397ce601 Replace prints with logging and add --verbose argument. 2024-03-10 12:14:23 -04:00
dce3555339 Add some tesla pascal GPUs to the fp16 working but slower list. 2024-03-02 17:16:31 -05:00
88f300401c Enable fp16 by default on mps. 2024-02-19 12:00:48 -05:00
929e266f3e Manual cast for bf16 on older GPUs. 2024-02-17 09:01:17 -05:00
0b3c50480c Make --force-fp32 disable loading models in bf16. 2024-02-16 23:01:54 -05:00
f83109f09b Stable Cascade Stage C. 2024-02-16 10:55:08 -05:00
aeaeca10bd Small refactor of is_device_* functions. 2024-02-15 21:10:10 -05:00
66e28ef45c Don't use is_bf16_supported to check for fp16 support. 2024-02-04 20:53:35 -05:00
24129d78e6 Speed up SDXL on 16xx series with fp16 weights and manual cast. 2024-02-04 13:23:43 -05:00
4b0239066d Always use fp16 for the text encoders. 2024-02-02 10:02:49 -05:00
f9e55d8463 Only auto enable bf16 VAE on nvidia GPUs that actually support it. 2024-01-15 03:10:22 -05:00
1b103e0cb2 Add argument to run the VAE on the CPU. 2023-12-30 05:49:07 -05:00
e1e322cf69 Load weights that can't be lowvramed to target device. 2023-12-28 21:41:10 -05:00
a252963f95 --disable-smart-memory now unloads everything like it did originally. 2023-12-23 04:25:06 -05:00
36a7953142 Greatly improve lowvram sampling speed by getting rid of accelerate.
Let me know if this breaks anything.
2023-12-22 14:38:45 -05:00
2f9d6a97ec Add --deterministic option to make pytorch use deterministic algorithms. 2023-12-17 16:59:21 -05:00
b0aab1e4ea Add an option --fp16-unet to force using fp16 for the unet. 2023-12-11 18:36:29 -05:00
ba07cb748e Use faster manual cast for fp8 in unet. 2023-12-11 18:24:44 -05:00
57926635e8 Switch text encoder to manual cast.
Use fp16 text encoder weights for CPU inference to lower memory usage.
2023-12-10 23:00:54 -05:00
340177e6e8 Disable non blocking on mps. 2023-12-10 01:30:35 -05:00
9ac0b487ac Make --gpu-only put intermediate values in GPU memory instead of cpu. 2023-12-08 02:35:45 -05:00
2db86b4676 Slightly faster lora applying. 2023-12-06 05:13:14 -05:00
ca82ade765 Use .itemsize to get dtype size for fp8. 2023-12-04 11:52:06 -05:00
31b0f6f3d8 UNET weights can now be stored in fp8.
--fp8_e4m3fn-unet and --fp8_e5m2-unet are the two different formats
supported by pytorch.
2023-12-04 11:10:00 -05:00
0cf4e86939 Add some command line arguments to store text encoder weights in fp8.
Pytorch supports two variants of fp8:
--fp8_e4m3fn-text-enc (the one that seems to give better results)
--fp8_e5m2-text-enc
2023-11-17 02:56:59 -05:00
7339479b10 Disable xformers when it can't load properly. 2023-11-13 12:31:10 -05:00
dd4ba68b6e Allow different models to estimate memory usage differently. 2023-11-12 04:03:52 -05:00
8594c8be4d Empty the cache when torch cache is more than 25% free mem. 2023-10-22 13:58:12 -04:00
c8013f73e5 Add some Quadro cards to the list of cards with broken fp16. 2023-10-16 16:48:46 -04:00
fd4c5f07e7 Add a --bf16-unet to test running the unet in bf16. 2023-10-13 14:51:10 -04:00
9a55dadb4c Refactor code so model can be a dtype other than fp32 or fp16. 2023-10-13 14:41:17 -04:00
88733c997f pytorch_attention_enabled can now return True when xformers is enabled. 2023-10-11 21:30:57 -04:00
20d3852aa1 Pull some small changes from the other repo. 2023-10-11 20:38:48 -04:00
eec449ca8e Allow Intel GPUs to LoRA cast on GPU since it supports BF16 natively. 2023-09-22 21:11:27 -07:00
1cdfb3dba4 Only do the cast on the device if the device supports it. 2023-09-20 17:52:41 -04:00
321c5fa295 Enable pytorch attention by default on xpu. 2023-09-17 04:09:19 -04:00
0966d3ce82 Don't run text encoders on xpu because there are issues. 2023-09-14 12:16:07 -04:00
1938f5c5fe Add a force argument to soft_empty_cache to force a cache empty. 2023-09-04 00:58:18 -04:00
4a0c4ce4ef Some fixes to generalize CUDA specific functionality to Intel or other GPUs. 2023-09-02 18:22:10 -07:00
b8c7c770d3 Enable bf16-vae by default on ampere and up. 2023-08-27 23:06:19 -04:00
a57b0c797b Fix lowvram model merging. 2023-08-26 11:52:07 -04:00
f72780a7e3 The new smart memory management makes this unnecessary. 2023-08-25 18:02:15 -04:00
30eb92c3cb Code cleanups. 2023-08-24 19:39:18 -04:00
51dde87e97 Try to free enough vram for control lora inference. 2023-08-24 17:20:54 -04:00