Compare commits

...

1752 Commits

Author SHA1 Message Date
f1c2301697 fix typo in stale-issues (#4735) 2024-09-01 17:44:49 -04:00
8d31a6632f Speed up inference on nvidia 10 series on Linux. 2024-09-01 17:29:31 -04:00
b643eae08b Make minimum_inference_memory() depend on --reserve-vram 2024-09-01 01:18:34 -04:00
baa6b4dc36 Update manual install instructions. 2024-08-31 04:37:23 -04:00
d4aeefc297 add github action to automatically handle stale user support issues (#4683)
* add github action to automatically handle stale user support issues

* improve stale message

* remove token part
2024-08-31 01:57:18 -04:00
587e7ca654 Remove github buttons. 2024-08-31 01:53:10 -04:00
c90459eba0 Update ComfyUI_frontend to 1.2.40 (#4691)
* Update ComfyUI_frontend to 1.2.40

* Add files
2024-08-30 19:32:10 -04:00
04278afb10 feat: return import_failed from init_extra_nodes function (#4694) 2024-08-30 19:26:47 -04:00
935ae153e1 Cleanup. 2024-08-30 12:53:59 -04:00
e91662e784 Get logs endpoint & system_stats additions (#4690)
* Add route for getting output logs

* Include ComfyUI version

* Move to own function

* Changed to memory logger

* Unify logger setup logic

* Fix get version git fallback

---------

Co-authored-by: pythongosssss <125205205+pythongosssss@users.noreply.github.com>
2024-08-30 12:46:37 -04:00
63fafaef45 Fix potential issue with hydit controlnets. 2024-08-30 04:58:41 -04:00
ec28cd9136 swap legacy sdv15 link (#4682)
* swap legacy sdv15 link

* swap v15 ckpt examples to safetensors

* link the fp16 copy of the model by default
2024-08-29 19:48:48 -04:00
6eb5d64522 Fix glora lowvram issue. 2024-08-29 19:07:23 -04:00
10a79e9898 Implement model part of flux union controlnet. 2024-08-29 18:41:22 -04:00
ea3f39bd69 InstantX depth flux controlnet. 2024-08-29 02:14:19 -04:00
b33cd61070 InstantX canny controlnet. 2024-08-28 19:02:50 -04:00
34eda0f853 fix: remove redundant useless loop (#4656)
fix: potential error of undefined variable

https://github.com/comfyanonymous/ComfyUI/discussions/4650
2024-08-28 17:46:30 -04:00
d31e226650 Unify RMSNorm code. 2024-08-28 16:56:38 -04:00
b79fd7d92c ComfyUI supports more than just stable diffusion. 2024-08-28 16:12:24 -04:00
38c22e631a Fix case where model was not properly unloaded in merging workflows. 2024-08-27 19:03:51 -04:00
6bbdcd28ae Support weight padding on diff weight patch (#4576) 2024-08-27 13:55:37 -04:00
ab130001a8 Do RMSNorm in native type. 2024-08-27 02:41:56 -04:00
ca4b8f30e0 Cleanup empty dir if frontend zip download failed (#4574) 2024-08-27 02:07:25 -04:00
70b84058c1 Add relative file path to the progress report. (#4621) 2024-08-27 02:06:12 -04:00
2ca8f6e23d Make the stochastic fp8 rounding reproducible. 2024-08-26 15:12:06 -04:00
7985ff88b9 Use less memory in float8 lora patching by doing calculations in fp16. 2024-08-26 14:45:58 -04:00
c6812947e9 Fix potential memory leak. 2024-08-26 02:07:32 -04:00
9230f65823 Fix some controlnets OOMing when loading. 2024-08-25 05:54:29 -04:00
6ab1e6fd4a [Bug #4529] Fix graph partial validation failure (#4588)
Currently, if a graph partially fails validation (i.e. some outputs are
valid while others have links from missing nodes), the execution loop
could get an exception resulting in server lockup.

This isn't actually possible to reproduce via the default UI, but is a
potential issue for people using the API to construct invalid graphs.
2024-08-24 15:34:58 -04:00
07dcbc3a3e Clarify how to use high quality previews. 2024-08-24 02:31:03 -04:00
8ae23d8e80 Fix onnx export. 2024-08-23 17:52:47 -04:00
7df42b9a23 Fix dora. 2024-08-23 04:58:59 -04:00
5d8bbb7281 Cleanup. 2024-08-23 04:06:27 -04:00
2c1d2375d6 Fix. 2024-08-23 04:04:55 -04:00
64ccb3c7e3 Rework IPEX check for future inclusion of XPU into Pytorch upstream and do a bit more optimization of ipex.optimize(). (#4562) 2024-08-23 03:59:57 -04:00
9465b23432 Added SD15_Inpaint_Diffusers model support for unet_config_from_diffusers_unet function (#4565) 2024-08-23 03:57:08 -04:00
bb4416dd5b Fix task.status.status_str caused by #2666 (#4551)
* Fix task.status.status_str caused by 2666 regression

* fix

* fix
2024-08-22 17:38:30 -04:00
c0b0da264b Missing imports. 2024-08-22 17:20:51 -04:00
c26ca27207 Move calculate function to comfy.lora 2024-08-22 17:12:00 -04:00
7c6bb84016 Code cleanups. 2024-08-22 17:05:12 -04:00
c54d3ed5e6 Fix issue with models staying loaded in memory. 2024-08-22 15:58:20 -04:00
c7ee4b37a1 Try to fix some lora issues. 2024-08-22 15:32:18 -04:00
7b70b266d8 Generalize MacOS version check for force-upcast-attention (#4548)
This code automatically forces upcasting attention for MacOS versions 14.5 and 14.6. My computer returns the string "14.6.1" for `platform.mac_ver()[0]`, so this generalizes the comparison to catch more versions.

I am running MacOS Sonoma 14.6.1 (latest version) and was seeing black image generation on previously functional workflows after recent software updates. This PR solved the issue for me.

See comfyanonymous/ComfyUI#3521
2024-08-22 13:24:21 -04:00
8f60d093ba Fix issue. 2024-08-22 10:38:24 -04:00
dafbe321d2 Fix a bug where cached outputs affected IS_CHANGED (#4535)
This change fixes a bug where non-constant values could be passed to the
IS_CHANGED function. This would result in workflows taking an extra
execution before they acted as if they were cached.

The actual change is like 4 characters -- the rest is adding unit tests.
2024-08-21 23:38:46 -04:00
5f84ea63e8 Add a shortcut to the nightly package to run with --fast. 2024-08-21 23:36:58 -04:00
843a7ff70c fp16 is actually faster than fp32 on a GTX 1080. 2024-08-21 23:23:50 -04:00
a60620dcea Fix slow performance on 10 series Nvidia GPUs. 2024-08-21 16:39:02 -04:00
015f73dc49 Try a different type of flux fp16 fix. 2024-08-21 16:17:15 -04:00
904bf58e7d Make --fast work on pytorch nightly. 2024-08-21 14:01:41 -04:00
5f50263088 Replace use of .view with .reshape (#4522)
When generating images with fp8_e4_m3 Flux and batch size >1, using --fast, ComfyUI throws a "view size is not compatible with input tensor's size and stride" error pointing at the first of these two calls to view.

As reshape is semantically equivalent to view except for working on a broader set of inputs, there should be no downside to changing this. The only difference is that it clones the underlying data in cases where .view would error out. I have confirmed that the output still looks as expected, but cannot confirm that no mutable use is made of the tensors anywhere.

Note that --fast is only marginally faster than the default.
2024-08-21 11:21:48 -04:00
5e806f555d add a get models list api route (#4519)
* get models list api route

* remove copypasta
2024-08-21 02:04:42 -04:00
f07e5bb522 Add GET /internal/files. (#4295)
* Create internal route table.

* List files.

* Add GET /internal/files.

Retrieves list of files in models, output, and user directories.

* Refactor file names.

* Use typing_extensions for Python 3.8

* Fix tests.

* Remove print statements.

* Update README.

* Add output and user to valid directory test.

* Add missing type hints.
2024-08-21 01:25:06 -04:00
03ec517afb Remove useless line, adjust windows default reserved vram. 2024-08-21 00:47:19 -04:00
f257fc999f Add optional deprecated/experimental flag to node class (#4506)
* Add optional deprecated flag to node class

* nit

* Add experimental flag
2024-08-21 00:01:34 -04:00
bb50e69839 Update frontend to 1.2.30 (#4513) 2024-08-21 00:00:49 -04:00
510f3438c1 Speed up fp8 matrix mult by using better code. 2024-08-20 22:53:26 -04:00
ea63b1c092 Simpletrainer lycoris format. 2024-08-20 12:05:13 -04:00
9953f22fce Add --fast argument to enable experimental optimizations.
Optimizations that might break things/lower quality will be put behind
this flag first and might be enabled by default in the future.

Currently the only optimization is float8_e4m3fn matrix multiplication on
4000/ADA series Nvidia cards or later. If you have one of these cards you
will see a speed boost when using fp8_e4m3fn flux for example.
2024-08-20 11:55:51 -04:00
d1a6bd6845 Support loading long clipl model with the CLIP loader node. 2024-08-20 10:46:36 -04:00
83dbac28eb Properly set if clip text pooled projection instead of using hack. 2024-08-20 10:46:36 -04:00
538cb068bc Make cast_to a nop if weight is already good. 2024-08-20 10:46:36 -04:00
1b3eee672c Fix potential issue with multi devices. 2024-08-20 10:46:36 -04:00
5a69f84c3c Update README.md (Add shield badges) (#4490) 2024-08-19 18:25:20 -04:00
9eee470244 New load_text_encoder_state_dicts function.
Now you can load text encoders straight from a list of state dicts.
2024-08-19 17:36:35 -04:00
045377ea89 Add a --reserve-vram argument if you don't want comfy to use all of it.
--reserve-vram 1.0 for example will make ComfyUI try to keep 1GB vram free.

This can also be useful if workflows are failing because of OOM errors but
in that case please report it if --reserve-vram improves your situation.
2024-08-19 17:16:18 -04:00
4d341b78e8 Bug fixes. 2024-08-19 16:28:55 -04:00
6138f92084 Use better dtype for the lowvram lora system. 2024-08-19 15:35:25 -04:00
be0726c1ed Remove duplication. 2024-08-19 15:26:50 -04:00
766ae119a8 CheckpointSave node name. 2024-08-19 15:06:12 -04:00
fc90ceb6ba Update issue template config.yml to direct frontend issues to frontend repos (#4486)
* Update config.yml

* Typos
2024-08-19 13:41:30 -04:00
4506ddc86a Better subnormal fp8 stochastic rounding. Thanks Ashen. 2024-08-19 13:38:03 -04:00
20ace7c853 Code cleanup. 2024-08-19 12:48:59 -04:00
b29b3b86c5 Update README to include frontend section (#4468)
* Update README to include frontend section

* nit
2024-08-19 07:12:32 -04:00
22ec02afc0 Handle subnormal numbers in float8 rounding. 2024-08-19 05:51:08 -04:00
39f114c44b Less broken non blocking? 2024-08-18 16:53:17 -04:00
6730f3e1a3 Disable non blocking.
It fixed some perf issues but caused other issues that need to be debugged.
2024-08-18 14:38:09 -04:00
73332160c8 Enable non blocking transfers in lowvram mode. 2024-08-18 10:29:33 -04:00
2622c55aff Automatically use RF variant of dpmpp_2s_ancestral if RF model. 2024-08-18 00:47:25 -04:00
1beb348ee2 dpmpp_2s_ancestral_RF for rectified flow (Flux, SD3 and Auraflow). 2024-08-18 00:33:30 -04:00
9aa39e743c Add new shortcuts to readme (#4442) 2024-08-17 23:52:56 -04:00
d31df04c8a Indentation. 2024-08-17 23:00:44 -04:00
e68763f40c Add Flux model support for InstantX style controlnet residuals (#4444)
* Add Flux model support for InstantX style controlnet residuals

* Refactor Flux controlnet residual step to a separate method

* Rollback minor change

* New format for applying controlnet residuals: input->double_blocks, output->single_blocks

* Adjust XLabs Flux controlnet to fit new syntax of applying Flux controlnet residuals

* Remove unnecessary import and minor style change
2024-08-17 22:58:23 -04:00
310ad09258 Add a ModelSave node. 2024-08-17 21:43:07 -04:00
4f7a3cb6fb unet -> diffusion_models. 2024-08-17 21:31:04 -04:00
bb222ceddb Fix loras having a weak effect when applied on fp8. 2024-08-17 15:20:17 -04:00
14af129c55 Improve execution UX.
Some branches with VAELoader -> VAEDecode -> Preview were being executed
last. With this change they will be executed earlier.
2024-08-17 11:37:21 -04:00
fca42836f2 Add model_options for text encoder. 2024-08-17 11:17:20 -04:00
858d51f91a Fix VAEDecode -> Preview not being executed first. 2024-08-17 04:08:54 -04:00
cd5017c1c9 calculate_weight function to use a different dtype. 2024-08-17 01:06:08 -04:00
83f343146a Fix potential lowvram issue. 2024-08-16 17:12:42 -04:00
b021cf67c7 Update frontend to 1.2.26 (#4415) 2024-08-16 15:25:02 -04:00
1770fc77ed Implement support for taef1 latent previews (#4409)
* add taef1 handling to several places

* remove guess_latent_channels and add latent_channels info directly to flux model

* remove TODO

* fix numbers
2024-08-16 12:53:13 -04:00
05a9f3faa1 Log a warning when there's an issue with IS_CHANGED. 2024-08-16 08:50:17 -04:00
86c5970ac0 Fix custom nodes hooking the map_node_over_list and breaking things. 2024-08-16 08:40:31 -04:00
bfc214f434 Use new TS frontend uncompressed (#4379)
* Swap frontend uncompressed

* Add uncompressed files
2024-08-15 16:50:25 -04:00
3f5939add6 Tell github not to count the web directory in language stats. 2024-08-15 13:48:56 -04:00
5960f946a9 Move a few files from comfy -> comfy_execution.
Python code in the comfy folder should not import things from outside it.
2024-08-15 11:21:14 -04:00
5cfe38f41c Execution Model Inversion (#2666)
* Execution Model Inversion

This PR inverts the execution model -- from recursively calling nodes to
using a topological sort of the nodes. This change allows for
modification of the node graph during execution. This allows for two
major advantages:

    1. The implementation of lazy evaluation in nodes. For example, if a
    "Mix Images" node has a mix factor of exactly 0.0, the second image
    input doesn't even need to be evaluated (and visa-versa if the mix
    factor is 1.0).

    2. Dynamic expansion of nodes. This allows for the creation of dynamic
    "node groups". Specifically, custom nodes can return subgraphs that
    replace the original node in the graph. This is an incredibly
    powerful concept. Using this functionality, it was easy to
    implement:
        a. Components (a.k.a. node groups)
        b. Flow control (i.e. while loops) via tail recursion
        c. All-in-one nodes that replicate the WebUI functionality
        d. and more
    All of those were able to be implemented entirely via custom nodes,
    so those features are *not* a part of this PR. (There are some
    front-end changes that should occur before that functionality is
    made widely available, particularly around variant sockets.)

The custom nodes associated with this PR can be found at:
https://github.com/BadCafeCode/execution-inversion-demo-comfyui

Note that some of them require that variant socket types ("*") be
enabled.

* Allow `input_info` to be of type `None`

* Handle errors (like OOM) more gracefully

* Add a command-line argument to enable variants

This allows the use of nodes that have sockets of type '*' without
applying a patch to the code.

* Fix an overly aggressive assertion.

This could happen when attempting to evaluate `IS_CHANGED` for a node
during the creation of the cache (in order to create the cache key).

* Fix Pyright warnings

* Add execution model unit tests

* Fix issue with unused literals

Behavior should now match the master branch with regard to undeclared
inputs. Undeclared inputs that are socket connections will be used while
undeclared inputs that are literals will be ignored.

* Make custom VALIDATE_INPUTS skip normal validation

Additionally, if `VALIDATE_INPUTS` takes an argument named `input_types`,
that variable will be a dictionary of the socket type of all incoming
connections. If that argument exists, normal socket type validation will
not occur. This removes the last hurdle for enabling variant types
entirely from custom nodes, so I've removed that command-line option.

I've added appropriate unit tests for these changes.

* Fix example in unit test

This wouldn't have caused any issues in the unit test, but it would have
bugged the UI if someone copy+pasted it into their own node pack.

* Use fstrings instead of '%' formatting syntax

* Use custom exception types.

* Display an error for dependency cycles

Previously, dependency cycles that were created during node expansion
would cause the application to quit (due to an uncaught exception). Now,
we'll throw a proper error to the UI. We also make an attempt to 'blame'
the most relevant node in the UI.

* Add docs on when ExecutionBlocker should be used

* Remove unused functionality

* Rename ExecutionResult.SLEEPING to PENDING

* Remove superfluous function parameter

* Pass None for uneval inputs instead of default

This applies to `VALIDATE_INPUTS`, `check_lazy_status`, and lazy values
in evaluation functions.

* Add a test for mixed node expansion

This test ensures that a node that returns a combination of expanded
subgraphs and literal values functions correctly.

* Raise exception for bad get_node calls.

* Minor refactor of IsChangedCache.get

* Refactor `map_node_over_list` function

* Fix ui output for duplicated nodes

* Add documentation on `check_lazy_status`

* Add file for execution model unit tests

* Clean up Javascript code as per review

* Improve documentation

Converted some comments to docstrings as per review

* Add a new unit test for mixed lazy results

This test validates that when an output list is fed to a lazy node, the
node will properly evaluate previous nodes that are needed by any inputs
to the lazy node.

No code in the execution model has been changed. The test already
passes.

* Allow kwargs in VALIDATE_INPUTS functions

When kwargs are used, validation is skipped for all inputs as if they
had been mentioned explicitly.

* List cached nodes in `execution_cached` message

This was previously just bugged in this PR.
2024-08-15 11:21:11 -04:00
0f9c2a7822 Try to fix SDXL OOM issue on some configurations. 2024-08-14 23:08:54 -04:00
153d0a8142 Add a update/update_comfyui_stable.bat to the standalones. 2024-08-14 22:29:23 -04:00
ab4dd19b91 Remove legacy ui test files (#4316) 2024-08-14 21:01:06 -04:00
f1d6cef71c Revert "Disable cuda malloc by default."
This reverts commit 50bf66e5c4.
2024-08-14 08:38:07 -04:00
33fb282d5c Fix issue. 2024-08-14 02:51:47 -04:00
50bf66e5c4 Disable cuda malloc by default. 2024-08-14 02:49:25 -04:00
e60e19b175 Add support for simple tooltips (#3842)
* Add support for simple tooltips

* Fix overflow

* Add tooltips for nodes in the default workflow

* new line

* Prevent potential crash

* PR feedback

* Hide tooltip when clicking (e.g. combo widget)

* Refactor tooltips, add node level support

* Fix

* move

* Fix test (and undo last change)

* Fixed indent

* Fix dom widgets, dont show tooltip if not over canvas
2024-08-14 01:22:10 -04:00
a5af64d3ce Revert "Not sure if this actually changes anything but it can't hurt."
This reverts commit 34608de2e9.
2024-08-14 01:05:17 -04:00
3e52e0364c Add model downloading endpoint. (#4248)
* Add model downloading endpoint.

* Move client session init to async function.

* Break up large function.

* Send "download_progress" as websocket event.

* Fixed

* Fixed.

* Use async mock.

* Move server set up to right before run call.

* Validate that model subdirectory cannot contain relative paths.

* Add download_model test checking for invalid paths.

* Remove DS_Store.

* Consolidate DownloadStatus and DownloadModelResult

* Add progress_interval as an optional parameter.

* Use tuple type from annotations.

* Use pydantic.

* Update comment.

* Revert "Use pydantic."

This reverts commit 7461e8eb00.

* Add new line.

* Add newline EOF.

* Validate model filename as well.

* Add comment to not reply on internal.

* Restrict downloading to safetensor files only.
2024-08-13 15:48:52 -04:00
34608de2e9 Not sure if this actually changes anything but it can't hurt. 2024-08-13 13:29:16 -04:00
39fb74c5bd Fix bug when model cannot be partially unloaded. 2024-08-13 03:57:55 -04:00
74e124f4d7 Fix some issues with TE being in lowvram mode. 2024-08-12 23:42:21 -04:00
a562c17e8a load_unet -> load_diffusion_model with a model_options argument. 2024-08-12 23:20:57 -04:00
5942c17d55 Order of operations matters. 2024-08-12 21:56:18 -04:00
c032b11e07 xlabs Flux controlnet implementation. (#4260)
* xlabs Flux controlnet.

* Fix not working on old python.

* Remove comment.
2024-08-12 21:22:22 -04:00
b8ffb2937f Memory tweaks. 2024-08-12 15:07:11 -04:00
ce37c11164 add DS_Store to gitignore (#4324) 2024-08-12 12:32:34 -04:00
b5c3906b38 Automatically link the Comfy CI page on PRs (#4326)
also use_prior_commit so it doesn't get a janked merge commit instead of the real one
2024-08-12 12:32:16 -04:00
5d43e75e5b Fix some issues with the model sometimes not getting patched. 2024-08-12 12:27:54 -04:00
517f4a94e4 Fix some lora loading slowdowns. 2024-08-12 11:50:32 -04:00
52a471c5c7 Change name of log. 2024-08-12 10:35:06 -04:00
ad76574cb8 Fix some potential issues with the previous commits. 2024-08-12 00:23:29 -04:00
9acfe4df41 Support loading directly to vram with CLIPLoader node. 2024-08-12 00:06:01 -04:00
9829b013ea Fix mistake in last commit. 2024-08-12 00:00:17 -04:00
5c69cde037 Load TE model straight to vram if certain conditions are met. 2024-08-11 23:52:43 -04:00
e9589d6d92 Add a way to set model dtype and ops from load_checkpoint_guess_config. 2024-08-11 08:50:34 -04:00
0d82a798a5 Remove the ckpt_path from load_state_dict_guess_config. 2024-08-11 08:37:35 -04:00
925fff26fd alternative to load_checkpoint_guess_config that accepts a loaded state dict (#4249)
* make alternative fn

* add back ckpt path as 2nd argument?
2024-08-11 08:36:52 -04:00
75b9b55b22 Fix issues with #4302 and support loading diffusers format flux. 2024-08-10 21:28:24 -04:00
1765f1c60c FLUX: Added full diffusers mapping for FLUX.1 schnell and dev. Adds full LoRA support from diffusers LoRAs. (#4302) 2024-08-10 21:26:41 -04:00
1de69fe4d5 Fix some issues with inference slowing down. 2024-08-10 16:21:25 -04:00
ae197f651b Speed up hunyuan dit inference a bit. 2024-08-10 07:36:27 -04:00
1b5b8ca81a Fix regression. 2024-08-09 21:45:21 -04:00
6678d5cf65 Fix regression. 2024-08-09 14:02:38 -04:00
e172564eea Update controlnet.py to fix the default controlnet weight as constant (#4285) 2024-08-09 13:40:05 -04:00
a3cc326748 Better fix for lowvram issue. 2024-08-09 12:16:25 -04:00
86a97e91fc Fix controlnet regression. 2024-08-09 12:08:58 -04:00
5acdadc9f3 Fix issue with some lowvram weights. 2024-08-09 03:58:28 -04:00
55ad9d5f8c Fix regression. 2024-08-09 03:36:40 -04:00
a9f04edc58 Implement text encoder part of HunyuanDiT loras. 2024-08-09 03:21:10 -04:00
a475ec2300 Cleanup HunyuanDit controlnets.
Use the: ControlNetApply SD3 and HunyuanDiT node.
2024-08-09 02:59:34 -04:00
06eb9fb426 feat: add support for HunYuanDit ControlNet (#4245)
* add support for HunYuanDit ControlNet

* fix hunyuandit controlnet

* fix typo in hunyuandit controlnet

* fix typo in hunyuandit controlnet

* fix code format style

* add control_weight support for HunyuanDit Controlnet

* use control_weights in HunyuanDit Controlnet

* fix typo
2024-08-09 02:59:24 -04:00
413322645e Raw torch is faster than einops? 2024-08-08 22:09:29 -04:00
11200de970 Cleaner code. 2024-08-08 20:07:09 -04:00
037c38eb0f Try to improve inference speed on some machines. 2024-08-08 17:29:27 -04:00
1e11d2d1f5 Better prints. 2024-08-08 17:29:27 -04:00
65ea6be38f PullRequest CI Run: use pull_request_target to allow the CI Dashboard to work (#4277)
'_target' allows secrets to pass through, and we're just using the secret that allows uploading to the dashboard and are manually vetting PRs before running this workflow anyway
2024-08-08 17:20:48 -04:00
5df6f57b5d minor fix on copypasta action name (#4276)
my bad sorry
2024-08-08 16:30:59 -04:00
6588bfdef9 add GitHub workflow for CI tests of PRs (#4275)
When the 'Run-CI-Test' label is added to a PR, it will be tested by the CI, on a small matrix of stable versions.
2024-08-08 16:24:49 -04:00
50ed2879ef Add full CI test matrix GitHub Workflow (#4274)
automatically runs a matrix of full GPU-enabled tests on all new commits to the ComfyUI master branch
2024-08-08 15:40:07 -04:00
66d4233210 Fix. 2024-08-08 15:16:51 -04:00
591010b7ef Support diffusers text attention flux loras. 2024-08-08 14:45:52 -04:00
08f92d55e9 Partial model shift support. 2024-08-08 14:45:06 -04:00
8115d8cce9 Add Flux fp16 support hack. 2024-08-07 15:08:39 -04:00
6969fc9ba4 Make supported_dtypes a priority list. 2024-08-07 15:00:06 -04:00
cb7c4b4be3 Workaround for lora OOM on lowvram mode. 2024-08-07 14:30:54 -04:00
1208863eca Fix "Comfy" lora keys.
They are in this format now:
diffusion_model.full.model.key.name.lora_up.weight
2024-08-07 13:49:31 -04:00
e1c528196e Fix bundled embed. 2024-08-07 13:30:45 -04:00
17030fd4c0 Support for "Comfy" lora format.
The keys are just: model.full.model.key.name.lora_up.weight

It is supported by all comfyui supported models.

Now people can just convert loras to this format instead of having to ask
for me to implement them.
2024-08-07 13:18:32 -04:00
c19dcd362f Controlnet code refactor. 2024-08-07 12:59:28 -04:00
1c08bf35b4 Support format for embeddings bundled in loras. 2024-08-07 03:45:25 -04:00
2a02546e20 Add type hints to folder_paths.py (#4191)
* add type hints to folder_paths.py

* replace deprecated standard collections type hints

* fix type error when using Python 3.8
2024-08-06 21:59:34 -04:00
b334605a66 Fix OOMs happening in some cases.
A cloned model patcher sometimes reported a model was loaded on a device
when it wasn't.
2024-08-06 13:36:04 -04:00
de17a9755e Unload all models if there's an OOM error. 2024-08-06 03:30:28 -04:00
c14ac98fed Unload models and load them back in lowvram mode no free vram. 2024-08-06 03:22:39 -04:00
2894511893 Clone taesd with depth of 1 to reduce download size. (#4232) 2024-08-06 01:46:09 -04:00
f3bc40223a Add format metadata to CLIP save to make compatible with diffusers safetensors loading (#4233) 2024-08-06 01:45:24 -04:00
841e74ac40 Change browser test CI python to 3.8 (#4234) 2024-08-06 01:27:28 -04:00
2d75df45e6 Flux tweak memory usage. 2024-08-05 21:58:28 -04:00
1abc9c8703 Stable release uses cached dependencies (#4231)
* Release stable based on existing tag.

* Update default cuda to 12.1.
2024-08-05 20:07:16 -04:00
8edbcf5209 Improve performance on some lowend GPUs. 2024-08-05 16:24:04 -04:00
e545a636ba This probably doesn't work anymore. 2024-08-05 12:31:42 -04:00
33e5203a2a Don't cache index.html (#4211) 2024-08-05 12:25:28 -04:00
a178e25912 Fix Flux FP64 math on XPU (#4210) 2024-08-05 01:26:20 -04:00
78e133d041 Support simple diffusers Flux loras. 2024-08-04 22:05:48 -04:00
7afa985fba Correct spelling 'token_weight_pars_t5' to 'token_weight_pairs_t5' (#4200) 2024-08-04 17:10:02 -04:00
ddb6a9f47c Set the step in EmptySD3LatentImage to 16.
These models work better when the res is a multiple of 16.
2024-08-04 15:59:02 -04:00
3b71f84b50 ONNX tracing fixes. 2024-08-04 15:45:43 -04:00
0a6b008117 Fix issue with some custom nodes. 2024-08-04 10:03:33 -04:00
56f3c660bf ModelSamplingFlux now takes a resolution and adjusts the shift with it.
If you want to sample Flux dev exactly how the reference code does use
the same resolution as your image in this node.
2024-08-04 04:06:00 -04:00
f7a5107784 Fix crash. 2024-08-03 16:55:38 -04:00
91be9c2867 Tweak lowvram memory formula. 2024-08-03 16:44:50 -04:00
03c5018c98 Lower lowvram memory to 1/3 of free memory. 2024-08-03 15:14:07 -04:00
2ba5cc8b86 Fix some issues. 2024-08-03 15:06:40 -04:00
1e68002b87 Cap lowvram to half of free memory. 2024-08-03 14:50:20 -04:00
ba9095e5bd Automatically use fp8 for diffusion model weights if:
Checkpoint contains weights in fp8.

There isn't enough memory to load the diffusion model in GPU vram.
2024-08-03 13:45:19 -04:00
f123328b82 Load T5 in fp8 if it's in fp8 in the Flux checkpoint. 2024-08-03 12:39:33 -04:00
63a7e8edba More aggressive batch splitting. 2024-08-03 11:53:30 -04:00
0eea47d580 Add ModelSamplingFlux to experiment with the shift value.
Default shift on Flux Schnell is 0.0
2024-08-03 03:54:38 -04:00
7cd0cdfce6 Add advanced model merge node for Flux model. 2024-08-02 23:20:53 -04:00
ea03c9dcd2 Better per model memory usage estimations. 2024-08-02 18:09:24 -04:00
3a9ee995cf Tweak regular SD memory formula. 2024-08-02 17:34:30 -04:00
47da42d928 Better Flux vram estimation. 2024-08-02 17:02:35 -04:00
17bbd83176 Fix bug loading flac workflow when it contains = character. 2024-08-02 13:14:28 -04:00
bfb52de866 Lower SAG scale step for finer control (#4158)
* Lower SAG step for finer control

Since the introduction of cfg++ which uses very low cfg value, a step of 0.1 in SAG might be too high for finer control. Even SAG of 0.1 can be too high when cfg is only 0.6, so I change the step to 0.01.

* Lower PAG step as well.

* Update nodes_sag.py
2024-08-02 10:29:03 -04:00
eca962c6da Add FluxGuidance node.
This lets you adjust the guidance on the dev model which is a parameter
that is passed to the diffusion model.
2024-08-02 10:25:49 -04:00
c1696cd1b5 Add missing import (#4174) 2024-08-02 09:34:12 -04:00
369f459b20 Fix no longer working on old pytorch. 2024-08-01 22:20:24 -04:00
ce9ac2fe05 Fix clip_g/clip_l mixup (#4168) 2024-08-01 21:40:56 -04:00
e638f2858a Hack to make all resolutions work on Flux models. 2024-08-01 21:39:18 -04:00
a531001cc7 Add CLIPTextEncodeFlux. 2024-08-01 18:53:25 -04:00
d420bc792a Tweak the memory usage formulas for Flux and SD. 2024-08-01 17:53:45 -04:00
d965474aaa Make ComfyUI split batches a higher priority than weight offload. 2024-08-01 16:39:59 -04:00
1c61361fd2 Fast preview support for Flux. 2024-08-01 16:28:11 -04:00
a6decf1e62 Fix bfloat16 potentially not being enabled on mps. 2024-08-01 16:18:44 -04:00
48eb1399c0 Try to fix mac issue. 2024-08-01 13:41:27 -04:00
b4f6ebb2e8 Rename UNETLoader node to "Load Diffusion Model". 2024-08-01 13:33:30 -04:00
d7430a1651 Add a way to load the diffusion model in fp8 with UNETLoader node. 2024-08-01 13:30:51 -04:00
f2b80f95d2 Better Mac support on flux model. 2024-08-01 13:10:50 -04:00
1aa9cf3292 Make lowvram more aggressive on low memory machines. 2024-08-01 12:11:57 -04:00
2f88d19ef3 Add link to Flux examples to readme. 2024-08-01 11:48:19 -04:00
eb96c3bd82 Fix .sft file loading (they are safetensors files). 2024-08-01 11:32:58 -04:00
5f98de7697 Load flux t5 in fp8 if weights are in fp8. 2024-08-01 11:05:56 -04:00
8d34211a7a Fix old python versions no longer working. 2024-08-01 09:57:20 -04:00
1589b58d3e Basic Flux Schnell and Flux Dev model implementation. 2024-08-01 09:49:29 -04:00
7ad574bffd Mac supports bf16 just make sure you are using the latest pytorch. 2024-08-01 09:42:17 -04:00
e2382b6adb Make lowvram less aggressive when there are large amounts of free memory. 2024-08-01 03:58:58 -04:00
c24f897352 Fix to get fp8 working on T5 base. 2024-07-31 02:00:19 -04:00
a5991a7aa6 Fix hunyuan dit text encoder weights always being in fp32. 2024-07-31 01:34:57 -04:00
2c038ccef0 Lower CLIP memory usage by a bit. 2024-07-31 01:32:35 -04:00
b85216a3c0 Lower T5 memory usage by a few hundred MB. 2024-07-31 00:52:34 -04:00
82cae45d44 Fix potential issue with non clip text embeddings. 2024-07-30 14:41:13 -04:00
25853d0be8 Use common function for casting weights to input. 2024-07-30 10:49:14 -04:00
79040635da Remove unnecessary code. 2024-07-30 05:01:34 -04:00
66d35c07ce Improve artifacts on hydit, auraflow and SD3 on specific resolutions.
This breaks seeds for resolutions that are not a multiple of 16 in pixel
resolution by using circular padding instead of reflection padding but
should lower the amount of artifacts when doing img2img at those
resolutions.
2024-07-29 20:48:50 -04:00
c75b50607b Less confusing exception if pillow() function fails. 2024-07-29 11:15:37 -04:00
4ba7fa0244 Refactor: Move sd2_clip.py to text_encoders folder. 2024-07-28 01:19:20 -04:00
ab76abc767 Active workflow use primary fg color (#4090) 2024-07-27 23:34:19 -04:00
9300058026 Add dpmpp_2s_ancestral as custom sampler (#4101)
Adding dpmpp_2s_ancestral as custom sampler node to enable its use with eta and s_noise when using custom sampling.
2024-07-27 16:19:50 -04:00
f82d09c9b4 Update packaging workflow. 2024-07-27 04:48:19 -04:00
e6829e7ac5 Add a way to set custom dependencies in the release workflow. 2024-07-27 04:41:46 -04:00
07f6a1a685 Handle case in the updater when master branch is not in local repo. 2024-07-27 03:15:22 -04:00
e746965c50 Update nightly package workflow. 2024-07-27 01:20:18 -04:00
45a2842d7f Set stable releases as a prerelease initially.
This should give time to test the standalone package before making it live.
2024-07-26 14:52:20 -04:00
17b41f622e Change windows standalone URL to stable release. (#4065) 2024-07-26 14:37:40 -04:00
cf4418b806 Don't treat Bert model like CLIP.
Bert can accept up to 512 tokens so any prompt with more than 77 should
just be passed to it as is instead of splitting it up like CLIP.
2024-07-26 13:08:12 -04:00
6225a7827c Add CLIPTextEncodeHunyuanDiT.
Useful for testing what each text encoder does.
2024-07-26 13:08:06 -04:00
b6779d8df3 Fix undo incorrectly undoing text input (#4114)
Fixes an issue where under certain conditions, the ComfyUI custom undo / redo functions would not run when intended to.

When trying to undo an action like deleting several nodes, instead the native browser undo runs - e.g. a textarea gets focus and the last typed text is undone.  Clicking outside the text area and typing again just keeps doing the same thing.
2024-07-26 12:25:42 -04:00
8328a2d8cd Let hunyuan dit work with all prompt lengths. 2024-07-26 12:11:32 -04:00
afe732bef9 Hunyuan dit can now accept longer prompts. 2024-07-26 11:52:58 -04:00
a9ac56fc0d Own BertModel implementation that works with lowvram. 2024-07-26 04:47:17 -04:00
25b51b1a8b Hunyuan DiT lora support. 2024-07-25 22:42:54 -04:00
61a2b00bc2 Add HunyuanDiT support to readme. 2024-07-25 19:06:43 -04:00
a5f4292f9f Basic hunyuan dit implementation. (#4102)
* Let tokenizers return weights to be stored in the saved checkpoint.

* Basic hunyuan dit implementation.

* Fix some resolutions not working.

* Support hydit checkpoint save.

* Init with right dtype.

* Switch to optimized attention in pooler.

* Fix black images on hunyuan dit.
2024-07-25 18:21:08 -04:00
f87810cd3e Let tokenizers return weights to be stored in the saved checkpoint. 2024-07-25 10:52:09 -04:00
10c919f4c7 Make it possible to load tokenizer data from checkpoints. 2024-07-24 16:43:53 -04:00
ce80e69fb8 Avoid loading the dll when it's not necessary. 2024-07-24 13:50:34 -04:00
19944ad252 Add code to fix issues with new pytorch version on the standalone. 2024-07-24 12:49:29 -04:00
10b43ceea5 Remove duplicate code. 2024-07-24 01:12:59 -04:00
0a4c49c57c Support MT5. 2024-07-23 15:35:28 -04:00
88ed893034 Allow SPieceTokenizer to load model from a byte string. 2024-07-23 14:17:42 -04:00
334ba48cea More generic unet prefix detection code. 2024-07-23 14:13:32 -04:00
14764aa2e2 Rename LLAMATokenizer to SPieceTokenizer. 2024-07-22 12:21:45 -04:00
b2c995f623 "auto" type is only relevant to the SetUnionControlNetType node. 2024-07-22 11:30:38 -04:00
4151fbfa8a Add error message on union controlnet (#4081) 2024-07-22 11:27:32 -04:00
6045ed31f8 Supress frontend exception on unhandled message type (#4078)
* Supress frontend exception on unhandled message type

* nit
2024-07-21 21:15:01 -04:00
f836e69346 Fix bug with SaveAudio node with --gpu-only 2024-07-21 16:16:45 -04:00
5b69cfe7c3 Add timestamp to execution messages (#4076)
* Add timestamp to execution messages

* Add execution_end message

* Rename to execution_success
2024-07-21 15:29:10 -04:00
95fa9545f1 Only append zero to noise schedule if last sigma isn't zero. 2024-07-20 12:37:30 -04:00
11b74147ee Fix/webp exif little endian (#4061)
* Fix for isLittleEndian flag in parseExifData.

* Add break after reading first exif chunk in getWebpMetadata.
2024-07-19 18:39:04 -04:00
6ab8cad22e Implement beta sampling scheduler.
It is based on: https://arxiv.org/abs/2407.12173

Add "beta" to the list of schedulers and the BetaSamplingScheduler node.
2024-07-19 18:05:09 -04:00
011b11d8d7 LoadAudio restores file value from workflow (#4043)
* LoadAudio restores file value from workflow

* use onAfterGraphConfigured

* Don't use anonnymous function
2024-07-18 21:59:18 -04:00
ff6ca2a892 Move PAG to model_patches/unet section.
Move other unet model_patches nodes to model_patches/unet section.
2024-07-18 17:22:51 -04:00
374e093e09 Disable audio widget trying to get previews (#4044) 2024-07-17 16:11:10 -04:00
855789403b support clip-vit-large-patch14-336 (#4042)
* support clip-vit-large-patch14-336

* support clip-vit-large-patch14-336
2024-07-17 13:12:50 -04:00
6f7869f365 Get clip vision image size from config. 2024-07-17 13:05:38 -04:00
281ad42df4 Fix lowvram union controlnet bug. 2024-07-17 10:16:31 -04:00
1cde6b2eff Disallow use of eval with pylint (#4033) 2024-07-16 21:15:08 -04:00
c5a48b15bd Make default hash lib configurable without code changes via CLI argument (#3947)
* cli_args: Add --duplicate-check-hash-function.

* server.py: compare_image_hash configurable hash function

Uses an argument added in cli_args to specify the type of hashing to default to for duplicate hash checking.  Uses an `eval()` to identify the specific hashlib class to utilize, but ultimately safely operates because we have specific options and only those options/choices in the arg parser.  So we don't have any unsafe input there.

* Add hasher() to node_helpers

* hashlib selection moved to node_helpers

* default-hashing-function instead of dupe checking hasher

This makes a default-hashing-function option instead of previous selected option.

* Use args.default_hashing_function

* Use safer handling for node_helpers.hasher()

Uses a safer handling method than `eval` to evaluate default hashing function.

* Stray parentheses are evil.

* Indentation fix.

Somehow when I hit save I didn't notice I missed a space to make indentation work proper.  Oops!
2024-07-16 18:27:09 -04:00
f2298799ba Fix annotation (#4035) 2024-07-16 18:20:39 -04:00
60383f3b64 Move controlnet nodes to conditioning/controlnet. 2024-07-16 17:08:25 -04:00
8270c62530 Add SetUnionControlNetType to set the type of the union controlnet model. 2024-07-16 17:04:53 -04:00
821f93872e Allow model sampling to set number of timesteps. 2024-07-16 15:18:40 -04:00
e1630391d6 Allow version names like v0.0.1 for the FrontendManager. 2024-07-16 11:29:38 -04:00
99458e8aca Add FrontendManager to manage non-default front-end impl (#3897)
* Add frontend manager

* Add tests

* nit

* Add unit test to github CI

* Fix path

* nit

* ignore

* Add logging

* Install test deps

* Remove 'stable' keyword support

* Update test

* Add web-root arg

* Rename web-root to front-end-root

* Add test on non-exist version number

* Use repo owner/name to replace hard coded provider list

* Inline cmd args

* nit

* Fix unit test
2024-07-16 11:26:11 -04:00
33346fd9b8 Fix bug with custom nodes on other drives. 2024-07-15 20:38:26 -04:00
136c93cb47 Fix bug with workflow not registering change.
There was an issue when only the class type of a node changed with all the
inputs staying the same.
2024-07-15 20:01:49 -04:00
1305fb294c Refactor: Move some code to the comfy/text_encoders folder. 2024-07-15 17:36:24 -04:00
7914c47d5a Quick fix for the promax controlnet. 2024-07-14 10:07:36 -04:00
79547efb65 New menu fixes - fix send to workflow (#3909)
* Fix send to workflow
Fix center align of close workflow dialog
Better support for elements around canvas

* More resilent to extra elements added to body
2024-07-14 02:04:40 -04:00
a3dffc447a Support AuraFlow Lora and loading model weights in diffusers format.
You can load model weights in diffusers format using the UNETLoader node.
2024-07-13 13:51:40 -04:00
ce2473bb01 Add link to AuraFlow example in Readme. 2024-07-12 15:25:07 -04:00
4ca9b9cc29 Add Github Workflow for releasing stable versions and standalone bundle. (#3949)
* Add stable release.

* Only build CUDA 12.1 + 3.11 Python.

* Upgrade checkout and setup-python to latest version.

* lzma2

* Update artifact name to be ComfyUI_windows_portable_nvidia.7z
2024-07-12 13:33:57 -04:00
29c2e26724 Better tokenizing code for AuraFlow. 2024-07-12 01:15:25 -04:00
b6f09cf649 Add sentencepiece dependency. 2024-07-11 22:58:03 -04:00
8e012043a9 Add a ModelSamplingAuraFlow node to change the shift value.
Set the default AuraFlow shift value to 1.73 (sqrt(3)).
2024-07-11 17:57:36 -04:00
9f291d75b3 AuraFlow model implementation. 2024-07-11 16:52:26 -04:00
f45157e3ac Fix error message never being shown. 2024-07-11 11:46:51 -04:00
5e1fced639 Cleaner support for loading different diffusion model types. 2024-07-11 11:37:31 -04:00
ffe0bb0a33 Remove useless code. 2024-07-10 20:33:12 -04:00
391c1046cf More flexibility with text encoder return values.
Text encoders can now return other values to the CONDITIONING than the cond
and pooled output.
2024-07-10 20:06:50 -04:00
e44fa5667f Support returning text encoder attention masks. 2024-07-10 19:31:22 -04:00
90389b3b8a Update bug issue template (#3996)
* Update issue template

* nit
2024-07-10 11:28:15 -04:00
8d3f979b63 Check unhandled exception in test log in test action (#3987)
* Upload console logs

* Check unhandled exception
2024-07-09 17:12:57 -04:00
83f70a88fb Add __module__ to node info (#3936)
Use more explicit name 'python_module'

Parse abs ath

Move parse to nodes.py
2024-07-09 17:07:15 -04:00
f1a01c2c7e Add sampler_pre_cfg_function (#3979)
* Update samplers.py

* Update model_patcher.py
2024-07-09 16:20:49 -04:00
c3db344746 Fix ConditioningZeroOut when there is no pooled output. 2024-07-09 11:52:31 -04:00
d160073829 Fix loadGraphData call during restore (#3976) 2024-07-09 11:23:26 -04:00
ade7aa1b0c Remove useless import. 2024-07-09 11:05:05 -04:00
faa57430b0 Controlnet union model basic implementation.
This is only the model code itself, it currently defaults to an empty
embedding [0] * 6 which seems to work better than treating it like a
regular controlnet.

TODO: Add nodes to select the image type.
2024-07-08 23:49:02 -04:00
bb663bcd6c Rename clip_t5base to t5base for stable audio text encoder. 2024-07-08 08:53:55 -04:00
628f0b8ebc Move audio nodes out of _for_testing. 2024-07-07 09:22:32 -04:00
2dc84d1444 Add a way to set the timestep multiplier in the flow sampling. 2024-07-06 04:06:03 -04:00
ff63893d10 Support other types of T5 models. 2024-07-06 02:42:53 -04:00
4040491149 Better T5xxl detection. 2024-07-06 00:53:33 -04:00
b8e58a9394 Cleanup T5 code a bit. 2024-07-06 00:36:49 -04:00
80c4590998 Allow specifying the padding token for the tokenizer. 2024-07-06 00:06:49 -04:00
ce649d61c0 Allow zeroing out of embeds with unused attention mask. 2024-07-05 23:48:17 -04:00
b4c2d03d47 Remove duplicate import. 2024-07-05 12:10:22 -04:00
1dc87df4c5 Readme changes. 2024-07-04 22:03:37 -04:00
cedbc94cc0 Forgot this in last commit. 2024-07-04 21:49:50 -04:00
bd2d3e27d7 Show comfy_extras warning at the end.
Remove code.
2024-07-04 21:44:27 -04:00
720b17442d Temporary revert. 2024-07-04 21:09:58 -04:00
0e3dfd9e34 Use relative path for custom/extra node module name (#3944)
* Fix module name for comfy extra nodes

* Use module name relative to root dir
2024-07-04 20:49:07 -04:00
739b76630e Remove useless code. 2024-07-04 15:14:13 -04:00
24b969d3da Skip state check hook on first load (#3915) 2024-07-03 20:30:07 -04:00
086ac75228 3.8 Compatible type annotation (#3938) 2024-07-03 19:31:46 -04:00
d7484ef30c Support loading checkpoints with the UNETLoader node. 2024-07-03 11:34:32 -04:00
537f35c7bc Don't update dict if contiguous. 2024-07-02 20:21:51 -04:00
3f46362d22 fix non-contiguous tensor saving (from channels-last) (#3932) 2024-07-02 20:16:33 -04:00
01991f72ce Fix SamplerEulerCFGpp node. 2024-07-02 12:21:08 -04:00
2f03201690 Remove some empty lines. 2024-07-02 01:32:23 -04:00
52aaee251f Fix to #3465. Prevent, resaving of duplicate images if overwrite not specified (#3472)
* Fix to #3465. Prevent the, resaving of duplicate images if overwrite not specified

This is a fix to #3465 

Adds function compare_image_hash to do a sha256 hash comparison between an uploaded image and existing images with matching file names. 

This changes the behavior so that only images having the same filename that are actually different are saved to input, existing images are instead now opened instead of resaved with increment. 

Currently, exact duplicates with the same filename are resave saved with an incremented filename in the format:

<filename> (n).ext 

with the code: 

```
while os.path.exists(filepath): 
                        filename = f"{split[0]} ({i}){split[1]}"
                        filepath = os.path.join(full_output_folder, filename)
                        i += 1
```

This commit changes this to: 

```
while os.path.exists(filepath): 
                        if compare_image_hash(filepath, image):
                            image_is_duplicate = True
                            break
                        filename = f"{split[0]} ({i}){split[1]}"
                        filepath = os.path.join(full_output_folder, filename)
                        i += 1
```

a check for if image_is_duplicate = False is done before saving the file. 

Currently, if you load the same image of a cat named cat.jpg into the LoadImage node 3 times, you will get 3 new files in your input folder with incremented file names.

With this change, you will now only have the single copy of cat.jpg, that will be re-opened instead of re-saved. 

However if you load 3 different images of cats named cat.jpg, you will get the expected behavior of having:
cat.jpg
cat (1).jpg
cat (2).jpg

This saves space and clutter. After checking my own input folder, I have 800+ images that are duplicates that were resaved with incremented file names amounting to more than 5GB of duplicated data.

* fixed typo in expression
2024-07-02 01:30:33 -04:00
1ef66b0955 Add example for how to add custom API routes (#3597) 2024-07-01 18:02:42 -04:00
9dd549e253 Add --no-custom-node cmd flag (#3903)
* Add --no-custom-node cmd flag

* nit
2024-07-01 17:54:03 -04:00
b82d67d5bf Add SamplerEulerAncestralCFG++ custom sampler node (#3901)
(for eta and s_noise)
2024-07-01 17:42:17 -04:00
755c48d78e Fix several typos in example_node.py.example (#3204)
This change includes corrections for several spelling errors in the
documentation of example_node.py.example file.

These were previously raised by #3157, but they missed a few.
2024-07-01 17:21:12 -04:00
5dccfefe8d Switch nightly pytorch standalone package to lzma2. 2024-07-01 17:17:25 -04:00
0cd4a6a5e5 Fine-tuning GitHub Actions (#3169)
* Bumping GitHub Actions versions

* Using LZMA2 for 7zip compression in Windows packaging
2024-07-01 17:15:49 -04:00
601b4b63e1 Add CONTRIBUTING.md (#3910)
* Create CONTRIBUTING.md

* Add feature-request channel link.

* Remove discord links for channels.
2024-07-01 13:51:00 -04:00
e53b1592ba enable cmd shortcuts for mac (mute & bypass) (#3792) 2024-07-01 13:45:34 -04:00
7c5fa7f4a2 Fix loadGraphData func call (#3918) 2024-07-01 12:10:44 -04:00
521421f53e Fix workflow not importing from flac files on some systems. 2024-06-30 15:51:54 -04:00
dbb7dd3b5e Add to readme that Stable Audio is supported. 2024-06-30 00:15:49 -04:00
05e831697a Switch to the real cfg++ method in the samplers.
The old _pp ones will be updated automatically to the regular ones with 2x
the cfg.

My fault for not checking what the "_pp" samplers actually did.
2024-06-29 11:59:48 -04:00
fbb7a1f1b6 PreviewAudio node. 2024-06-29 01:33:22 -04:00
c39cf7fff0 Revert "Add integration test for Linux with Nvidia GPU. #3884 (#3895)" (#3905)
This reverts commit 449bf52923.
2024-06-28 16:09:55 -04:00
02cac1d487 Revert "Add macOs integration test for default workflow. (#3898)" (#3904)
This reverts commit 97b409cd48.
2024-06-28 16:09:39 -04:00
7ecb2ec169 Audio second setting in EmptyLatentAudio. 2024-06-28 02:55:36 -04:00
0d9009c96e New menu/workflows fixes (#3900)
* Fix auto queue

* Detect added nodes via search

* Fix loading workflows

* Add button click style
2024-06-28 01:07:19 -04:00
264caca20e ControlNetApplySD3 node can now be used to use SD3 controlnets. 2024-06-27 18:43:11 -04:00
f8f7568d03 Basic SD3 controlnet implementation.
Still missing the node to properly use it.
2024-06-27 18:43:11 -04:00
66aaa14001 Controlnet refactor. 2024-06-27 18:43:11 -04:00
97b409cd48 Add macOs integration test for default workflow. (#3898) 2024-06-27 16:10:16 -04:00
449bf52923 Add integration test for Linux with Nvidia GPU. #3884 (#3895)
* Add linux integration test.

* Fix directory path.

* Add paths ignore.

* Fix conda env directory path.
2024-06-27 16:08:26 -04:00
8ceb5a02a3 Support saving stable audio checkpoint that can be loaded back. 2024-06-27 11:06:52 -04:00
5ff3d4eb3a Fix audio upload when no audio in input dir (#3891) 2024-06-27 09:13:52 -04:00
4f9d2b057c Remove print. 2024-06-27 02:54:15 -04:00
4650e7d6e9 Save and load workflow from the flac files output by SaveAudio. 2024-06-27 02:07:29 -04:00
3b423afcca Add audio widget (#3863)
* Add audio widget

* Fix audio bugs

* Add CSS

* Populate audio widget when load history
2024-06-27 00:22:55 -04:00
44947e7ad4 Add DEIS order 3 sampler.
Order 4 seems to give bad results.
2024-06-26 22:40:05 -04:00
175fe02522 Ignore .vscode/ (#3879) 2024-06-26 19:59:19 -04:00
bc5a0f10db Ignore *.log (#3880) 2024-06-26 19:59:09 -04:00
a3e83f695d Update test ref (#3882)
* Update ref

* Disable some tests
2024-06-26 19:58:56 -04:00
f12fa1d8d7 Enable browser tests on push (#3878) 2024-06-26 09:09:21 -04:00
e3579f3360 Fix merge issue breaking api json loading (#3876) 2024-06-26 09:08:48 -04:00
edfce78c86 add issue templates for ComfyUI Issues Page (#3868) 2024-06-26 01:37:27 -04:00
e99d97a9d9 Remove duplicated Reset View button (#3865)
* Remove duplicated Reset View button

* Disable flaky test
2024-06-26 01:23:55 -04:00
69d710e40f Implement my alternative take on CFG++ as the euler_pp sampler.
Add euler_ancestral_pp which is the ancestral version of euler with the
same modification.
2024-06-25 07:41:52 -04:00
90aebb6c86 New Menu & Workflow Management (#3112)
* menu

* wip

* wip

* wip

* wip

* wip

* workflow saving/loading

* Support inserting workflows
Move buttosn to top of lists

* fix session storage
implement renaming

* temp

* refactor, better workflow instance management

* wip

* progress on progress

* added send to workflow
various fixes

* Support multiple image loaders

* Support dynamic size breakpoints based on content

* various fixes
add close unsaved warning

* Add filtering tree

* prevent renaming unsaved

* fix zindex on hover

* fix top offset

* use filename as workflow name

* resize on setting change

* hide element until it is drawn

* remove glow

* Fix export name

* Fix test, revert accidental changes to groupNode

* Fix colors on all themes

* show hover items on smaller screen (mobile)

* remove debugging code

* dialog fix

* Dont reorder open workflows
Allow elements around canvas

* Toggle body display on setting change

* Fix menu disappearing on chrome

* Increase delay when typing, remove margin on Safari, fix dialog location

* Fix overflow issue on iOS

* Add reset view button
Prevent view changes causing history entries

* Bottom menu wip

* Various fixes

* Fix merge

* Fix breaking old menu position

* Fix merge adding restore view to loadGraphData
2024-06-25 06:49:25 -04:00
eab211bb1e Resample audio to 44100 when VAE encoding it. 2024-06-24 16:55:20 -04:00
866f54da8d Add browser test action synced with TS repo (#3852)
* Add browser test action

* Add npm install task
2024-06-24 14:47:28 -04:00
73ca780019 Add SamplerEulerCFG++ node.
This node should match the DDIM implementation of CFG++ when "regular" is
selected.

"alternative" is a slightly different take on CFG++
2024-06-23 13:21:18 -04:00
2f360ae898 Support OneTrainer SD3 lora format. 2024-06-22 13:08:04 -04:00
4ef1479dcd Multi dimension tiled scale function and tiled VAE audio encoding fallback. 2024-06-22 11:57:49 -04:00
887a6341ed Proper ModelMergeSD3_2B node. 2024-06-21 08:41:31 -04:00
1e2839f4d9 More proper tiled audio decoding. 2024-06-20 16:50:31 -04:00
d5efde89b7 Add ipndm_v sampler, works best with the exponential scheduler. 2024-06-20 08:51:49 -04:00
45e10cac19 feat: add gits scheduler (#3769) 2024-06-20 08:12:15 -04:00
d7f0964266 Fix routes (#3790) 2024-06-19 22:36:31 -04:00
028a583bef Fix issue with full diffusers SD3 loras. 2024-06-19 22:32:04 -04:00
0d6a57938e Support loading diffusers SD3 model format with UNETLoader node. 2024-06-19 22:21:18 -04:00
b08a9dd04b Remove empty line. 2024-06-19 20:20:35 -04:00
eee815ec99 Update sd1_clip.py (#3684)
Made token instance check more flexible so it also works with integers from numpy arrays or long tensors
2024-06-19 16:42:41 -04:00
e11052afcf Add ipndm sampler. 2024-06-19 16:32:30 -04:00
97ae6ef460 Add api/ prefix to api endpoints (#3779) 2024-06-19 10:39:17 -04:00
3914d5a2ae Support full SD3 loras. 2024-06-19 10:13:33 -04:00
55f0dc124e Add soundfile dependency so that windows can save audio. 2024-06-18 09:57:40 -04:00
a45df69570 Basic tiled decoding for audio VAE. 2024-06-17 22:48:23 -04:00
379ff92e9e fix app.js no graph defined (#3754)
* local test

* fix "graph" not found

* fix

---------

Co-authored-by: Xiujuan Li <xiujuali@amazon.com>
2024-06-17 07:56:53 -04:00
b7c473d1ab Fix lora keys for SimpleTuner (#3759) 2024-06-17 07:55:06 -04:00
6425252c4f Use fp16 as the default vae dtype for the audio VAE. 2024-06-16 13:12:54 -04:00
8ddc151a4c Squash depreciation warning on new pytorch. 2024-06-16 13:06:23 -04:00
ca9d300a80 Better estimation for memory usage during audio VAE encoding/decoding. 2024-06-16 11:47:32 -04:00
746a0410d4 Fix VAEEncode with taesd3. 2024-06-16 03:10:04 -04:00
04e8798c37 Improvements to the TAESD3 implementation. 2024-06-16 02:04:24 -04:00
df7db0e027 support TAESD3 (#3738) 2024-06-16 02:03:53 -04:00
bb1969cab7 Initial support for the stable audio open model. 2024-06-15 12:14:56 -04:00
1281f933c1 Small optimization. 2024-06-15 02:44:38 -04:00
f2e844e054 Optimize some unneeded if conditions in the sampling code. 2024-06-15 02:26:19 -04:00
0ec513d877 Add a --force-channels-last to inference models in channel last mode. 2024-06-15 01:08:12 -04:00
0e06b370db Print key names for easier debugging. 2024-06-14 18:18:53 -04:00
5eb98f0092 Exempt IPEX from non_blocking previews fixing segmentation faults. (#3708) 2024-06-13 18:51:14 -04:00
ac151ac169 Support SD3 diffusers lora. 2024-06-13 18:26:10 -04:00
37a08a41b3 Support setting weight offsets in weight patcher. 2024-06-13 17:21:26 -04:00
605e64f6d3 Fix lowvram issue. 2024-06-12 10:39:33 -04:00
0eaa34ec5b Fix regular empty latent image not working with SD3 and custom sampler. 2024-06-12 10:32:34 -04:00
321e509e0a Add link to SD3 example page to README. 2024-06-12 09:48:27 -04:00
c8b5e08dc3 Default shift value on SD3 is 3.0 2024-06-12 02:24:39 -04:00
1ddf512fdc Don't auto convert clip and vae weights to fp16 when saving checkpoint. 2024-06-12 01:07:58 -04:00
32be358213 Save SD3 modelspec.architecture in CheckpointSave node. 2024-06-12 01:02:07 -04:00
694e0b48e0 SD3 better memory usage estimation. 2024-06-12 00:49:00 -04:00
69c8d6d8a6 Single and dual clip loader nodes support SD3.
You can use the CLIPLoader to use the t5xxl only or the DualCLIPLoader to
use CLIP-L and CLIP-G only for sd3.
2024-06-11 23:27:39 -04:00
0e49211a11 Load the SD3 T5xxl model in the same dtype stored in the checkpoint. 2024-06-11 17:03:26 -04:00
5889b7ca0a Support multiple text encoder configurations on SD3. 2024-06-11 13:14:43 -04:00
1c34d338d7 Update EmptySD3LatentImage to use 1024 resolution by default. 2024-06-11 07:37:22 -04:00
9424522ead Reuse code. 2024-06-11 07:20:26 -04:00
73ce178021 Remove redundancy in mmdit.py (#3685) 2024-06-11 06:30:25 -04:00
4134564dc1 Require safetensors library to be at least 0.4.2 for fp8 support. 2024-06-11 06:26:13 -04:00
a82fae2375 Fix bug with cosxl edit model. 2024-06-10 16:00:03 -04:00
8c4a9befa7 SD3 Support. 2024-06-10 14:06:23 -04:00
a5e6a632f9 Support sampling non 2D latents. 2024-06-10 01:31:09 -04:00
742d5720d1 Support zeroing out text embeddings with the attention mask. 2024-06-09 16:51:58 -04:00
6cd8ffc465 Reshape the empty latent image to the right amount of channels if needed. 2024-06-08 02:35:08 -04:00
56333d4850 Use the end token for the text encoder attention mask. 2024-06-07 03:05:23 -04:00
0dccb4617d Remove some unnecessary arguments. 2024-06-06 14:49:45 -04:00
104fcea0c8 Add function to get the list of currently loaded models. 2024-06-05 23:25:16 -04:00
b1fd26fe9e pytorch xpu should be flash or mem efficient attention? 2024-06-04 17:44:14 -04:00
20447e9ec9 Fix alpha in PorterDuffImageComposite. (#3411)
There were two bugs in PorterDuffImageComposite.

The first one is the fact that it uses the mask input directly as alpha, missing the conversion (`1-a`). The fix is similar to c16f5744.

The second one is that all color composition formulas assume alpha premultiplied values, while the input is not premultiplied.

This change fixes both of these issue.
2024-06-04 16:37:11 -04:00
cb8d0ebccc Don't load the view coordinates when loading a workflow from the history.
I think this makes things slightly less annoying for some users.
2024-06-03 19:48:27 -04:00
809cc85a8e Remove useless code. 2024-06-02 19:23:37 -04:00
b249862080 Add an annoying print to a function I want to remove. 2024-06-01 12:47:31 -04:00
e2c585f3be Fix to allow use of PerpNegGuider with cfg_function_post hooks (like PAG) (#3618) 2024-06-01 12:36:08 -04:00
04b308229e Small refactor of preview code. 2024-05-31 11:18:37 -04:00
bf3e334d46 Disable non_blocking when --deterministic or directml. 2024-05-30 11:07:38 -04:00
71ec5b144e Update commands to install nightly pytorch in readme. 2024-05-29 00:20:02 -04:00
91542d4f8b Import spandrel_extra_arches if present.
I will not add this dependency to the default ones because models in the
spandrel_extra_arches package are non commercial and therefore not
compatible with free software licenses like the one ComfyUI uses.

If you don't mind this you can install it manually yourself.
2024-05-28 01:42:11 -04:00
b26da2245f Fix UnetParams annotation typo (#3589) 2024-05-27 19:30:35 -04:00
0920e0e5fe Remove some unused imports. 2024-05-27 19:08:27 -04:00
34030fed92 improve dom widget performance (#3584) 2024-05-27 14:26:07 -04:00
f6a203951f Extend core snapToGrid to LiteGraph Groups. (#3393)
Extends the core Comfy.SnapToGrid behavior for nodes to apply to LiteGraph's LGraphGroup with the same behavior. Also, pulls out redundant rounding code into util function.
2024-05-27 14:05:51 -04:00
16a493a190 Keep compatibility with some custom nodes. 2024-05-26 15:37:41 -04:00
9a151b7def Fix issue and unpin spandrel package. 2024-05-26 13:44:47 -04:00
8cfd677cc0 Replace chainner_models with Spandrel package (#2146)
* Replace chainner_models with Spandrel

* Update to latest spandrel

* Use spandrel_foss instead

* update spandrel to new FOSS-compliant version
2024-05-26 13:44:17 -04:00
ffc4b7c30e Fix DORA strength.
This is a different version of #3298 with more correct behavior.
2024-05-25 02:50:11 -04:00
5b87369474 Load titles from API format JSON (#3563) 2024-05-24 23:53:15 -04:00
efa5a711b2 Reduce memory usage when applying DORA: #3557 2024-05-24 23:36:48 -04:00
58c9838274 Speed up TAESD preview. 2024-05-24 02:37:57 -04:00
b02bcced05 Fix FreeU not working when shape is tensor. 2024-05-23 11:48:04 -04:00
6507a9c716 Remove the CTRL-Delete keybind.
On some keyboards it's apparently too easy to accidentally do CTRL-Delete
when pressing CTRL-Enter repeatedly.

CTRL-Backspace can still be used to clear the workflow.
2024-05-23 01:29:22 -04:00
6c23854f54 Fix OSX latent2rgb previews. 2024-05-22 13:56:28 -04:00
7718ada4ed Add type annotation UnetWrapperFunction (#3531)
* Add type annotation UnetWrapperFunction

* nit

* Add types.py
2024-05-22 02:07:27 -04:00
8508df2569 Work around black image bug on Mac 14.5 by forcing attention upcasting. 2024-05-21 16:56:33 -04:00
83d969e397 Disable xformers when tracing model. 2024-05-21 13:55:49 -04:00
1900e5119f Fix potential issue. 2024-05-20 08:19:54 -04:00
276f8fce9f Print error when node is missing. 2024-05-20 07:04:08 -04:00
4bc1884478 Provide a better error message when attempting to execute the workflow with a missing node. (#3517) 2024-05-20 06:58:46 -04:00
09e069ae6c Log the pytorch version. 2024-05-20 06:22:29 -04:00
11a2ad5110 Fix controlnet not upcasting on models that have it enabled. 2024-05-19 17:58:03 -04:00
4ae1515f14 Slightly faster latent2rgb previews. 2024-05-19 17:42:35 -04:00
f37a47110b Make --preview-method auto default to the fast latent2rgb previews. 2024-05-19 11:45:36 -04:00
0bdc2b15c7 Cleanup. 2024-05-18 10:11:44 -04:00
98f828fad9 Remove unnecessary code. 2024-05-18 09:36:44 -04:00
1c4af5918a Better error message if the webcam node doesn't work. 2024-05-17 14:02:09 -04:00
91590adf04 Add webcam node (#3497)
* Add webcam node

* unused import
2024-05-17 13:16:08 -04:00
19300655dd Don't automatically switch to lowvram mode on GPUs with low memory. 2024-05-17 00:31:32 -04:00
46daf0a9a7 Add debug options to force on and off attention upcasting. 2024-05-16 04:09:41 -04:00
58f8388020 More proper fix for #3484. 2024-05-16 00:11:01 -04:00
2d41642716 Fix lowvram dora issue. 2024-05-15 02:47:40 -04:00
ec6f16adb6 Fix SAG. 2024-05-14 18:02:27 -04:00
bb4940d837 Only enable attention upcasting on models that actually need it. 2024-05-14 17:00:50 -04:00
b0ab31d06c Refactor attention upcasting code part 1. 2024-05-14 12:47:31 -04:00
2de3b69b30 Support saving some more modelspec types. 2024-05-13 21:54:11 -04:00
cf6e1efb69 Show message on error when loading wf from file (works on drag and drop) (#3466) 2024-05-13 15:22:22 -04:00
ece5acb8e8 Fix nightly package workflow. 2024-05-12 16:05:10 -04:00
794a357f7a Update the nightly workflow. 2024-05-12 07:24:12 -04:00
22edd3add5 Fix to LoadImage Node for #3416 HDR images loading additional smaller… (#3454)
* Fix to LoadImage Node for #3416 HDR images loading additional smaller images. 

Added a blocking if statement  in the ImageSequence.Iterator that checks if subsequent images after the first match dimensionally, and prevent them from being appended to output_images if they do not match. 

This does not fix or change current behavior for PIL 10.2.0 where the images are loaded at the same size, but it does for 10.3.0 where they are loaded at their correct smaller sizes.

* added list of excluded formats that should return 1 image

added an explicit check for the image format so that additional formats can be added to the list that have problematic behavior.
2024-05-12 07:07:38 -04:00
f509c6fe21 Fix Intel GPU memory allocation accuracy and documentation update. (#3459)
* Change calculation of memory total to be more accurate, allocated is actually smaller than reserved.

* Update README.md install documentation for Intel GPUs.
2024-05-12 06:36:30 -04:00
fa6dd7e5bb Fix lowvram issue with saving checkpoints.
The previous fix didn't cover the case where the model was loaded in
lowvram mode right before.
2024-05-12 06:13:45 -04:00
49c20cdc70 No longer necessary. 2024-05-12 05:34:43 -04:00
e1489ad257 Fix issue with lowvram mode breaking model saving. 2024-05-11 21:55:20 -04:00
4f63ee99f1 Add a button to reset the view. 2024-05-10 17:30:52 -04:00
f374ea714d Setting for saving and restoring canvas position and zoom level (#3437) 2024-05-10 17:07:46 -04:00
0fecfd2b1a Added generic wrapper function node_helpers.pillow to fix PIL issues #4472 and #2445 (#3422)
* Update node_helpers.py to use generic pillow wrapper to resolve multiple meta-data related issues.

replaced open_image function with a generic pillow function that takes Pil functions as a dependency injection and applies the ImageFile.LOAD_TRUNCATED_IMAGES try except fix to them. 

This provides an extensible function to handle related errors that can wrap offending functions when discovered without the need to repeat code.

* Update a few Pil functions to use node_helpers.pillow wrapper

Update a Pil function calls in a few locations to use the generic node_helpers.pillow wrapper that takes the function as a dependency injection and uses the try except method with ImageFIle.LOAD_TRUNCATED_IMAGES solution

* Corrected comment in issue #s fixed.

* Update node_helpers.py to remove import of Image from PIL

import of Image is no longer required as functions are Injected
2024-05-09 05:38:00 -04:00
93e876a3be Remove warnings that confuse people. 2024-05-09 05:29:42 -04:00
cd07340d96 Typo fix. 2024-05-08 18:36:56 -04:00
c33412288f Fix issue with loading some JPG: #3416 2024-05-07 05:41:06 -04:00
d7fa417bfa feat: shortcuts for zoom in/out (#3410)
* feat: shortcuts for zoom in/out

* feat: pen support for canvas zoom

ctrl + LMB + vertical drag

* Ctrl+LMB+Drag -> ctrl+Shift+LMB+Drag

---------

Co-authored-by: Lt.Dr.Data <lt.dr.data@gmail.com>
2024-05-07 04:40:56 -04:00
c61eadf69a Make the load checkpoint with config function call the regular one.
I was going to completely remove this function because it is unmaintainable
but I think this is the best compromise.

The clip skip and v_prediction parts of the configs should still work but
not the fp16 vs fp32.
2024-05-06 20:04:39 -04:00
Pam
3787b4f246 Use get_model_object in Deep Shrink node (#3408) 2024-05-06 18:39:39 -04:00
565eb6d176 Add a SplitSigmasDenoise node as an alternative to SplitSigmas. 2024-05-05 05:24:36 -04:00
9a70b70de4 add opacity slider in maskeditor (#3404)
Co-authored-by: vilanele <nomail@email.com>
2024-05-05 05:01:06 -04:00
72508a8d19 Only set LOAD_TRUNCATED_IMAGES when if the Image open fails.
Document which PIL issues this works around.
2024-05-04 03:51:03 -04:00
0d45efb7d6 Fixed Issue with LoadImage node when loading PNG files with embedded ICC profiles. (#3316)
* Fix issue with how PIL loads small PNG files nodes.py

Added flag to prevent ValueError: Decompressed Data Too Large
when loading PNG images with large meta data such as large embedded color profiles

* Update LoadImage node to fix error when loading PNG's in nodes.py

Fixed Value Error: Decompressed Data Too Large thrown by PIL when attempting to opening PNG files with large embedded ICC colorspaces by setting the follow flag to true when loading png images:  ImageFile.LOAD_TRUNCATED_IMAGES = True

* Update node_helpers.py to include open_image helper function

open_image includes try except to catch Pillow Value Errors that occur when large ICC profiles are embedded in images.

* Update LoadImage node to use open_image helper function inplace of Image.open

open_image helper function in node_helpers.py  fixes a Pillow error when attempting to open images with large embedded ICC profiles by adding an exception handler to load the image with truncated meta data if regular loading is not possible.
2024-05-04 03:32:41 -04:00
daa92a8ff4 Fix potential issues with the int rounding fix. 2024-05-03 05:49:21 -04:00
89d0e9abeb Fix int widgets rounding. 2024-05-02 03:34:19 -04:00
a56d02efc7 Change torch.xpu to ipex.optimize, xpu device initialization and remove workaround for text node issue from older IPEX. (#3388) 2024-05-02 03:26:50 -04:00
f81a6fade8 Fix some edge cases with samplers and arrays with a single sigma. 2024-05-01 17:05:30 -04:00
94d5a12801 Don't load the model in SDTurboScheduler 2024-05-01 16:57:10 -04:00
2aed53c4ac Workaround xformers bug. 2024-04-30 21:23:40 -04:00
bacce529fb Add TLS Support (#3312)
* Add TLS Support

* Add to readme

* Add guidance for windows users on generating certificates

* Add guidance for windows users on generating certificates

* Fix typo
2024-04-30 20:17:02 -04:00
bb8b48a260 Update Readme. 2024-04-30 20:11:34 -04:00
eecd69b53a Add a SamplerLCMUpscale node.
This sampler is an LCM sampler that upscales the latent during sampling.

It can be used to generate at a higher resolution with an LCM model very
quickly.

To try it use it with a basic 5 step LCM workflow with scale_ratio 1.5 or
2.0
2024-04-29 20:00:47 -04:00
059773a6df Add some nodes to multiply the attention in UNet and Clip models. 2024-04-28 13:03:43 -04:00
10fcd09f4a Add a denoise value to AlignYourStepsScheduler. 2024-04-27 00:48:41 -04:00
8cab3be673 Update command for AMD stable pytorch install in README. 2024-04-26 15:44:12 -04:00
7990ae18c1 Fix error when more cond masks passed in than batch size (#3353) 2024-04-26 12:51:12 -04:00
16eabdf70d Free more vram for upscale models. 2024-04-25 17:04:19 -04:00
8dc19e40d1 Don't init a VAE model when there are no VAE weights. 2024-04-24 09:20:31 -04:00
27d5808fc4 Increase max lora strength to 100.0 2024-04-23 13:07:39 -04:00
Pam
b8218522f1 Increase sigma_min/sigma_max range for custom schedulers (#3317) 2024-04-23 09:40:10 -04:00
d09b5ef4ef Free some memory before loading upscale models. 2024-04-22 18:51:15 -04:00
4ee9aad6ca Speed up Sharpen node. 2024-04-21 09:02:06 -04:00
644a3ae58d Implement Align Your Steps as a AlignYourStepsScheduler node. 2024-04-20 04:34:12 -04:00
133dc3351b Faster blur. 2024-04-19 03:52:02 -04:00
5d08802f78 Sync some minor changes from the other repo. 2024-04-19 03:43:09 -04:00
c59fe9f254 Support VAE without quant_conv. 2024-04-18 21:05:33 -04:00
a88b0ebc2d Improve node input/widget conversion sub-menus (#3281)
* Make input/widget conversion sub-menus optional

* Improve input/widget conversion sub-menu text

- Fix incorrect text for conversion from widget to input, previously it
  effectively said "convert input to input"
- Use "input" instead of "🔘".  The former is clearer and consistent
  with the rest of the application.
- Use title case (consistent with the rest of the menu entries).
- Strip the trailing periods. There is already a visual indicator for
  sub-menus, and no other sub-menus use trailing periods.
2024-04-18 16:41:23 -04:00
d64e217427 Fix annoying float issue causing the value to be rounded to above the max. 2024-04-17 17:34:02 -04:00
072e3bd2b5 Fixed an issue where the main menu disappears intermittently as the coordinates become negative. (#3269) 2024-04-17 16:36:49 -04:00
abc69cab45 Add a helpful warning for links that don't point anywhere. 2024-04-17 12:28:05 -04:00
45ec1cbe96 Implement PerpNeg as a guider. 2024-04-16 02:57:34 -04:00
8903dce862 This can be removed since PAG doesn't use the uncond. 2024-04-15 12:14:00 -04:00
719fb2c81d Add basic PAG node. 2024-04-14 23:49:50 -04:00
258dbc06c3 Fix some memory related issues. 2024-04-14 12:08:58 -04:00
744ac944db Don't make dynamicPrompts the default on multiline string inputs.
This should be less confusing to those who want to use multiline input
without them.
2024-04-13 16:18:00 -04:00
58812ab8ca Support SDXS 512 model. 2024-04-12 22:12:35 -04:00
0256e7f769 Fix tests. 2024-04-12 20:02:53 -04:00
2bef134ebf change Convert.. input (#3246) 2024-04-12 17:02:17 -04:00
4bd7d55b90 Add some colors to SamplerCustom links.
If you don't like them I am open to a PR.
2024-04-11 22:43:05 -04:00
fd7c636680 Add an AddNoise node to add noise depending on the sigma. 2024-04-10 23:40:31 -04:00
831511a1ee Fix issue with sampling_settings persisting across models. 2024-04-09 23:20:43 -04:00
4201181b35 Add ModelMergeSD1, ModelMergeSD2 and ModelMergeSDXL. 2024-04-09 04:31:14 -04:00
30abc324c2 Support properly saving CosXL checkpoints. 2024-04-08 00:36:22 -04:00
d644b6bcd8 Cleanup some more conditioning nodes. 2024-04-07 14:40:43 -04:00
c9fc242e2c The middle prompt should be treated more as a negative prompt. 2024-04-07 14:34:43 -04:00
80bda6c163 Cleanup a few conditioning nodes. 2024-04-07 14:27:40 -04:00
0a03009808 Fix issue with controlnet models getting loaded multiple times. 2024-04-06 18:38:39 -04:00
de172f8be7 Improve A1111 metadata parsing (#3216)
* A1111 import: Set VAE name

This patch sets the VAE name for the `VAELoader` when present in the png
metadata.

* A1111 import: Skip all hashes

When importing from A1111 the parsing assumes that values of a key will
never contain a ":", which is not correct.

There are 2 cases where we can have ":" in the value:

- Inside a string. E.g.:
  Lora hashes: "xl_more_art-full_v1: fe3b4816be83, add-detail-xl: 9c783c8ce46c"

- When the value is a json dictionary. E.g.:
  Hashes: {"vae": "63aeecb90f", "embed:negativeXL_D": "fff5d51ab6"}

This patch changes how we parse the metadata to take those 2 cases into
account and also skips the following additional keys that are present in
some Forge images:

- Version
- VAE hash
- TI hashes
- Lora hashes
- Hashes

* A1111 import: Parse Hires steps

This patch parses the `Hires steps` parameter that is part of the High
Resolution Upscale configuration when it  is present, and fallbacks to
the one from the `samplerNode` (like the code currently does) if it's
not present.
2024-04-06 12:10:17 -04:00
d8dea4cdb8 Fix DisableNoise node. 2024-04-05 21:36:23 -04:00
a7dd82e668 Fix copy paste issue with litegraph. 2024-04-05 14:59:05 -04:00
38ed2da2dd Fix typo in lowvram patcher (#3209) 2024-04-05 12:02:13 -04:00
ea9ac9d30b Fix PerpNeg node. 2024-04-05 11:46:54 -04:00
1088d1850f Support for CosXL models. 2024-04-05 10:53:41 -04:00
41ed7e85ea Fix object_patches_backup not being the same object across clones. 2024-04-05 00:22:44 -04:00
0f5768e038 Fix missing arguments in cfg_function. 2024-04-04 23:38:57 -04:00
1f4fc9ea0c Fix issue with get_model_object on patched model. 2024-04-04 23:01:02 -04:00
1a0486bb96 Fix model needing to be loaded on GPU to generate the sigmas. 2024-04-04 22:08:49 -04:00
1f8d8e6c77 Add InstructPixToPixConditioning node. 2024-04-04 15:06:17 -04:00
5272fd4b03 Add DualCFGGuider used in IP2P models for example. 2024-04-04 14:57:44 -04:00
cfbf3be54b Add basic guider for models with no cfg. 2024-04-04 13:57:32 -04:00
c6bd456c45 Make zero denoise a NOP. 2024-04-04 11:41:27 -04:00
fcfd2bdf8a Small cleanup. 2024-04-04 11:16:49 -04:00
f117566299 SamplerCustomAdvanced node.
This node enables the creation of nodes to change the guider/denoiser and
the noise algorithm.
2024-04-04 01:32:25 -04:00
0542088ef8 Refactor sampler code for more advanced sampler nodes part 2. 2024-04-04 01:26:41 -04:00
57753c964a Refactor sampling code for more advanced sampler nodes. 2024-04-03 22:09:51 -04:00
6c6a39251f Fix saving text encoder in fp8. 2024-04-02 11:46:34 -04:00
e6482fbbfc Refactor calc_cond_uncond_batch into calc_cond_batch.
calc_cond_batch can take an arbitrary amount of cond inputs.

Added a calc_cond_uncond_batch wrapper with a warning so custom nodes
won't break.
2024-04-01 18:07:47 -04:00
1306464538 --force-fp16 is no longer necessary on Mac. 2024-03-31 12:50:28 -04:00
575acb69e4 IP2P model loading support.
This is the code to load the model and inference it with only a text
prompt. This commit does not contain the nodes to properly use it with an
image input.

This supports both the original SD1 instructpix2pix model and the
diffusers SDXL one.
2024-03-31 03:10:28 -04:00
96b4c757cf Add log to debug custom nodes that hang when imported. 2024-03-30 11:52:11 -04:00
94a5a67c32 Cleanup to support different types of inpaint models. 2024-03-29 14:44:13 -04:00
9bf6061dfc Switch prints to logging in folder_paths and add some extra debug. 2024-03-29 03:07:13 -04:00
5d8898c056 Fix some performance issues with weight loading and unloading.
Lower peak memory usage when changing model.

Fix case where model weights would be unloaded and reloaded.
2024-03-28 18:04:42 -04:00
327ca1313d Support SDXS 0.9 2024-03-27 23:58:58 -04:00
8ae1e4d125 Make step on sharpen node smaller. 2024-03-27 01:28:31 -04:00
2f93b91646 Add Tesla GPUs to cuda malloc blacklist. 2024-03-26 23:09:28 -04:00
c9673926fb Fix test. 2024-03-26 04:07:30 -04:00
11838e60f4 Increase the max resolution. 2024-03-26 04:00:53 -04:00
ae77590b4e dora_scale support for lora file. 2024-03-25 18:09:23 -04:00
c6de09b02e Optimize memory unload strategy for more optimized performance. 2024-03-24 02:36:30 -04:00
6a32c06f06 Move cleanup_models to improve performance. 2024-03-23 17:27:10 -04:00
a28a9dc836 Add an example to use the SaveImageWebsocket node and enable it. 2024-03-22 12:56:48 -04:00
0624838237 Add inverse noise scaling function. 2024-03-21 14:49:11 -04:00
5d875d77fe Fix regression with lcm not working with batches. 2024-03-20 20:48:54 -04:00
4b9005e949 Fix regression with model merging. 2024-03-20 13:56:12 -04:00
c18a203a8a Don't unload model weights for non weight patches. 2024-03-20 02:27:58 -04:00
150a3e946f Make LCM sampler use the model noise scaling function. 2024-03-20 01:35:59 -04:00
d14bdb1896 Revert, NOTE: this will be removed again soon please fix your nodes. 2024-03-19 11:17:49 -04:00
0c55f16c9e Remove code that should be useless now. 2024-03-19 09:47:14 -04:00
40e124c6be SV3D support. 2024-03-18 16:54:13 -04:00
0b78213bda Fix neg scale step. 2024-03-18 15:51:23 -04:00
b1a16d4500 Fix stable cascade img2img not working with all resolutions. 2024-03-18 13:51:38 -04:00
cacb022c4a Make saved SD1 checkpoints match more closely the official one. 2024-03-18 00:26:23 -04:00
d3406d8d58 Increase image batch nodes maximum values. 2024-03-17 08:57:49 -04:00
d7897fff2c Move cascade scale factor from stage_a to latent_formats.py 2024-03-16 14:49:35 -04:00
f2fe635c9f SamplerDPMAdaptative node to test the different options. 2024-03-15 22:36:10 -04:00
448d9263a2 Fix control loras breaking. 2024-03-14 09:30:21 -04:00
db8b59ecff Lower memory usage for loras in lowvram mode at the cost of perf. 2024-03-13 20:07:27 -04:00
eda8704386 Add SamplerDPMPP_3M_SDE node. 2024-03-12 12:16:37 -04:00
e7b8e240f7 Add SamplerLMS node. 2024-03-12 04:34:34 -04:00
2a813c3b09 Switch some more prints to logging. 2024-03-11 16:34:58 -04:00
0ed72befe1 Change log levels.
Logging level now defaults to info. --verbose sets it to debug.
2024-03-11 13:54:56 -04:00
dc6d4151a2 Not needed anymore. 2024-03-11 12:30:11 -04:00
03f4cfb7cd Replace more prints with logging. 2024-03-11 00:58:49 -04:00
65397ce601 Replace prints with logging and add --verbose argument. 2024-03-10 12:14:23 -04:00
4656273e72 Added additional nodes for CLIP merging 2024-03-09 19:32:33 +01:00
a9ee9589b7 Add SamplerEulerAncestral node. 2024-03-09 08:21:43 -05:00
0a4675266e Make message about missing dependencies more clear. 2024-03-08 18:43:13 -05:00
314d28c251 Pass extra_pnginfo as None when not in input data. 2024-03-07 15:07:47 -05:00
55f37baae8 Move some stable cascade nodes outside of _for_testing. 2024-03-07 01:49:20 -05:00
3f75419e2e Add a node to use the super resolution controlnet. 2024-03-07 01:48:31 -05:00
5f60ee246e Support loading the sr cascade controlnet. 2024-03-07 01:22:48 -05:00
03e6e81629 Set upscale algorithm to bilinear for stable cascade controlnet. 2024-03-06 02:59:40 -05:00
03e83bb5d0 Support stable cascade canny controlnet. 2024-03-06 02:25:42 -05:00
10860bcd28 Add compression_ratio to controlnet code. 2024-03-05 15:15:20 -05:00
a38b9b3ac1 Add debugging info for when comfy_extra nodes fail to import. 2024-03-04 13:24:08 -05:00
b7b5593166 Fix nightly workflow and update other workflows. 2024-03-04 13:06:13 -05:00
6d8834f08f Add Morphology nodes from kornia (#2781)
* import kornia

* Added morphology nodexs

* Add kornia to requirements

* fix choices

* options, also move to postprocessors

* fix placing and step
2024-03-04 12:50:28 -05:00
caddef8d88 Auto disable cuda malloc on unsupported GPUs on Linux. 2024-03-04 09:03:59 -05:00
478f71a249 Remove useless check. 2024-03-04 08:51:25 -05:00
0490ce8244 Fix differential diffusion node for batches. 2024-03-04 00:43:09 -05:00
b2e1744a16 Add a ThresholdMask node. 2024-03-04 00:31:59 -05:00
0db3111b5f Disable site dir in updater when doing pip install. 2024-03-03 16:25:16 -05:00
12c1080ebc Simplify differential diffusion code. 2024-03-03 15:34:42 -05:00
727021bdea Implement Differential Diffusion (#2876)
* Implement Differential Diffusion

* Cleanup.

* Fix.

* Masks should be applied at full strength.

* Fix colors.

* Register the node.

* Cleaner code.

* Fix issue with getting unipc sampler.

* Adjust thresholds.

* Switch to linear thresholds.

* Only calculate nearest_idx on valid thresholds.
2024-03-03 15:34:13 -05:00
1abf8374ec utils.set_attr can now be used to set any attribute.
The old set_attr has been renamed to set_attr_param.
2024-03-02 17:27:23 -05:00
dce3555339 Add some tesla pascal GPUs to the fp16 working but slower list. 2024-03-02 17:16:31 -05:00
51df846598 Let conditioning specify custom concat conds. 2024-03-02 11:44:06 -05:00
9f71e4b62d Let model patches patch sub objects. 2024-03-02 11:43:27 -05:00
00425563c0 Cleanup: Use sampling noise scaling function for inpainting. 2024-03-01 14:24:41 -05:00
c62e836167 Move noise scaling to object with sampling math. 2024-03-01 12:54:38 -05:00
cb7c3a2921 Allow image_only_indicator to be None. 2024-02-29 13:11:30 -05:00
b3e97fc714 Koala 700M and 1B support.
Use the UNET Loader node to load the unet file to use them.
2024-02-28 12:10:11 -05:00
37a86e4618 Remove duplicate text_projection key from some saved models. 2024-02-28 03:57:41 -05:00
8daedc5bf2 Auto detect playground v2.5 model. 2024-02-27 18:03:03 -05:00
d46583ecec Playground V2.5 support with ModelSamplingContinuousEDM node.
Use ModelSamplingContinuousEDM with edm_playground_v2.5 selected.
2024-02-27 15:12:33 -05:00
1e0fcc9a65 Make XL checkpoints save in a more standard format. 2024-02-27 02:07:40 -05:00
b416be7d78 Make the text projection saved in the checkpoint the right format. 2024-02-27 01:52:23 -05:00
03c47fc0f2 Add a min_length property to tokenizer class. 2024-02-26 21:36:37 -05:00
e61755ead0 Update the old updater if present when running on the windows standalone. 2024-02-26 13:32:14 -05:00
36f7face37 Update the standalone package updater so it can self update. 2024-02-26 08:51:16 -05:00
8ac69f62e5 Make return_projected_pooled setable from the __init__ 2024-02-25 14:49:13 -05:00
ca7c310a0e Support loading old CLIP models saved with CLIPSave. 2024-02-25 08:29:12 -05:00
8d7910cee9 disable follow_symlinks in static serving for security reason (#2902) 2024-02-25 07:43:26 -05:00
4a7e751ce6 Add example for how to use WEB_DIRECTORY to add frontend extensions. 2024-02-25 07:34:22 -05:00
c2cb8e889b Always return unprojected pooled output for gligen. 2024-02-25 07:33:13 -05:00
1cb3f6a83b Move text projection into the CLIP model code.
Fix issue with not loading the SSD1B clip correctly.
2024-02-25 01:41:08 -05:00
6533b172c1 Support text encoder text_projection in lora. 2024-02-24 23:50:46 -05:00
1e5f0f66be Support lora keys with lora_prior_unet_ and lora_prior_te_ 2024-02-23 12:21:20 -05:00
e1cb93c383 Fix model and cond transformer options merge 2024-02-23 01:19:43 -07:00
10847dfafe Cleanup uni_pc inpainting.
This causes some small changes to the uni pc inpainting behavior but it
seems to improve results slightly.
2024-02-23 02:39:35 -05:00
877a8f7a3c Merge branch 'patch-1' of https://github.com/feffy380/ComfyUI 2024-02-22 16:23:50 -05:00
f81dbe26e2 FIX recursive_will_execute performance (simple ~300x performance increase} (#2852)
* FIX recursive_will_execute performance

* Minimize code changes

* memo must be created outside lambda
2024-02-21 20:21:24 -05:00
7faa4507ec ModelSamplingDiscrete: x0 model support that predict a denoised image. 2024-02-21 08:05:43 -05:00
820807c8ed Fix Perp-Neg math
adjust perp-neg implementation to match the paper
2024-02-21 10:33:03 +01:00
18c151b3e3 Add some latent2rgb matrices for previews. 2024-02-20 10:57:24 -05:00
0d0fbabd1d Pass pooled CLIP to stage b. 2024-02-20 04:24:45 -05:00
c6b7a157ed Align simple scheduling closer to official stable cascade scheduler. 2024-02-20 04:24:39 -05:00
ec4d89cee9 Add to Readme that stable cascade is supported. 2024-02-19 13:41:55 -05:00
a311524969 Node to make stable cascade image to image easier. 2024-02-19 13:36:20 -05:00
88f300401c Enable fp16 by default on mps. 2024-02-19 12:00:48 -05:00
e93cdd0ad0 Remove print. 2024-02-19 11:47:26 -05:00
3711b31dff Support Stable Cascade in checkpoint format. 2024-02-19 11:20:48 -05:00
d91f45ef28 Some cleanups to how the text encoders are loaded. 2024-02-19 10:46:30 -05:00
dbe0979b3f Larger range for min/max compression for StableCascade_EmptyLatentImage. 2024-02-19 08:59:53 -05:00
a7b5eaa7e3 Forgot to commit this. 2024-02-19 04:25:46 -05:00
3b2e579926 Support loading the Stable Cascade effnet and previewer as a VAE.
The effnet can be used to encode images for img2img with Stage C.
2024-02-19 04:10:01 -05:00
2e4628ac8d Merge branch 'iTXt-png-metadata-support' of https://github.com/shiimizu/ComfyUI 2024-02-18 23:44:58 -05:00
5171414143 Support additional PNG info. 2024-02-18 17:57:53 -08:00
dccca1daa5 Fix gligen lowvram mode. 2024-02-18 02:20:23 -05:00
8b60d33bb7 Add ModelSamplingStableCascade to control the shift sampling parameter.
shift is 2.0 by default on Stage C and 1.0 by default on Stage B.
2024-02-18 00:55:23 -05:00
6bcf57ff10 Fix attention masks properly for multiple batches. 2024-02-17 16:15:18 -05:00
11e3221f1f fp8 weight support for Stable Cascade. 2024-02-17 15:27:31 -05:00
f8706546f3 Fix attention mask batch size in some attention functions. 2024-02-17 15:22:21 -05:00
3b9969c1c5 Properly fix attention masks in CLIP with batches. 2024-02-17 12:13:13 -05:00
5b40e7a5ed Implement shift schedule for cascade stage C. 2024-02-17 11:38:47 -05:00
929e266f3e Manual cast for bf16 on older GPUs. 2024-02-17 09:01:17 -05:00
6c875d846b Fix clip attention mask issues on some hardware. 2024-02-17 07:53:52 -05:00
805c36ac9c Make Stable Cascade work on old pytorch 2.0 2024-02-17 00:42:30 -05:00
f2d1d16f4f Support Stable Cascade Stage B lite. 2024-02-16 23:41:23 -05:00
0b3c50480c Make --force-fp32 disable loading models in bf16. 2024-02-16 23:01:54 -05:00
97d03ae04a StableCascade CLIP model support. 2024-02-16 13:29:04 -05:00
667c92814e Stable Cascade Stage B. 2024-02-16 13:02:03 -05:00
f83109f09b Stable Cascade Stage C. 2024-02-16 10:55:08 -05:00
5e06baf112 Stable Cascade Stage A. 2024-02-16 06:30:39 -05:00
c2c885261a Merge branch 'batch-number-in-filename' of https://github.com/freakabcd/ComfyUI 2024-02-16 05:45:48 -05:00
aeaeca10bd Small refactor of is_device_* functions. 2024-02-15 21:10:10 -05:00
7f89cb48bf Add a disabled SaveImageWebsocket custom node.
This node can be used to efficiently get images without saving them to
disk when using ComfyUI as a backend.
2024-02-14 03:01:25 -05:00
38b7ac6e26 Don't init the CLIP model when the checkpoint has no CLIP weights. 2024-02-13 00:01:08 -05:00
0c9bc19768 Add ImageFromBatch. 2024-02-12 12:46:15 -05:00
cf4910a3a4 Prevent hideWidget being called twice for same widget
Fix for #2766
2024-02-12 08:59:25 +11:00
02409c30d9 Safari: Draws certain elements on CPU. In case of search popup, can cause 10 seconds+ main thread lock due to painting. (#2763)
* lets toggle this setting first.

* also makes it easier for debug. I'll be honest this is generally preferred behavior as well for me but I ain't no power user shrug.

* attempting trick to put the work for filter: brightness on GPU as a first attempt before falling back to not using filter for large lists!

* revert litegraph.core.js changes from branch

* oops
2024-02-12 03:44:53 +09:00
7dd352cbd7 Merge branch 'feature_expose_discard_penultimate_sigma' of https://github.com/blepping/ComfyUI 2024-02-11 12:23:30 -05:00
20e3da6b31 Add a node to give the controlnet a prompt different from the unet. 2024-02-10 08:27:05 -05:00
f44225fd5f Fix infinite while loop being possible in ddim_scheduler 2024-02-09 17:11:34 -06:00
25a4805e51 Add a way to set different conditioning for the controlnet. 2024-02-09 14:13:31 -05:00
2ccc0be28f Add batch number to filename with %batch_num%
Allow configurable addition of batch number to output file name.
2024-02-08 22:03:11 +10:00
a352c021ec Allow custom samplers to request discard penultimate sigma 2024-02-08 02:24:23 -07:00
fd73b5ee3a Merge branch 'improved-mobile-support' of https://github.com/pythongosssss/ComfyUI 2024-02-08 01:06:33 -05:00
c661a8b118 Don't use numpy for calculating sigmas. 2024-02-07 18:52:51 -05:00
7daad468ec Sync litegraph to repo.
https://github.com/comfyanonymous/litegraph.js/pull/6
2024-02-06 12:43:06 -05:00
d2e7f1b04b Support linking converted inputs from api json 2024-02-06 16:55:55 +00:00
236bda2683 Make minimum tile size the size of the overlap. 2024-02-05 01:29:26 -05:00
74b7233f57 Document IS_CHANGED in the example custom node. 2024-02-04 23:15:49 -05:00
66e28ef45c Don't use is_bf16_supported to check for fp16 support. 2024-02-04 20:53:35 -05:00
24129d78e6 Speed up SDXL on 16xx series with fp16 weights and manual cast. 2024-02-04 13:23:43 -05:00
98b80ad1f5 Merge branch 'feature/maskeditor_brush_modes' of https://github.com/UltimaBeaR/ComfyUI 2024-02-03 15:06:10 -05:00
5f3dbede58 Mask editor: semitransparent brush, brush color modes 2024-02-03 10:29:44 +03:00
4b0239066d Always use fp16 for the text encoders. 2024-02-02 10:02:49 -05:00
d0e2354c28 Merge branch 'LatentSeed_update' of https://github.com/FizzleDorf/ComfyUI 2024-02-02 04:38:18 -05:00
f2bae7463e changed default of LatentBatchSeedBehavior to fixed 2024-02-02 18:31:35 +09:00
951a2064a3 Fix frontend webp prompt handling 2024-02-02 13:27:03 +08:00
4c54c2ec0f Merge branch 'increment-wrap' of https://github.com/pksebben/ComfyUI 2024-02-01 17:01:21 -05:00
53a22e1ab9 add increment-wrap as option to ValueControlWidget when isCombo, which loops back to 0 when at end of list 2024-01-31 16:14:50 -08:00
6ab4205422 feat: better pen support for mask editor
- alt-drag: erase
- shift-drag(up/down): zoom in/out
2024-01-31 18:28:36 +09:00
c5a369a33d Update readme for new pytorch 2.2 release. 2024-01-31 02:27:12 -05:00
6565c9ad4d Litegraph node search improvements.
See: https://github.com/comfyanonymous/litegraph.js/pull/5
2024-01-31 02:26:27 -05:00
eeca72488b Merge branch 'group-manage-fixes' of https://github.com/pythongosssss/ComfyUI 2024-01-31 00:25:03 -05:00
4ce587bcd3 Merge branch 'fix/mask-editor-inpaint' of https://github.com/Meowu/ComfyUI 2024-01-30 23:15:31 -05:00
af6165ab69 Fix scrolling with lots of nodes 2024-01-30 18:00:01 +00:00
29558fb3ac Fix crash when no widgets on customized group node 2024-01-30 17:59:47 +00:00
da7a8df0d2 Put VAE key name in model config. 2024-01-30 02:24:38 -05:00
364ef19354 fix: inpaint on mask editor bottom area 2024-01-30 14:23:01 +08:00
ed2fa105ae Make auto saved workflow stored per tab 2024-01-29 18:43:59 +00:00
9321198da6 Add node to set only the conditioning area strength. 2024-01-29 00:24:53 -05:00
079dbf9198 Remove useless code. 2024-01-28 19:36:32 -05:00
7f4725f6b3 Fix some issues with --gpu-only 2024-01-27 02:51:27 -05:00
fc196aac80 Add a LatentBatchSeedBehavior node.
This lets you set it so the latents can use the same seed for the sampling
on every image in the batch.
2024-01-26 23:13:02 -05:00
2d105066df Cleanups. 2024-01-26 21:31:13 -05:00
89507f8adf Remove some unused imports. 2024-01-25 23:42:37 -05:00
d1533d9c0f Add experimental photomaker nodes.
Put the model file in models/photomaker and use PhotoMakerLoader.

Then use PhotoMakerEncode with the keyword "photomaker" to apply the image
2024-01-24 09:51:42 -05:00
b9911dcb2f Sync litegraph with repo.
https://github.com/comfyanonymous/litegraph.js/pull/4
2024-01-23 20:01:37 -05:00
3762e676a9 Support refresh on group node combos (#2625)
* Support refresh on group node combos

* fix check
2024-01-23 14:15:52 -05:00
05cd00695a typo fix - calculate_sigmas_scheduler (#2619)
self.scheduler -> scheduler_name

Co-authored-by: Lt.Dr.Data <lt.dr.data@gmail.com>
2024-01-23 03:47:01 -05:00
8a92ac2120 Ability to hide menu
Responsive setting screen
Touch events for zooming/context menu
2024-01-22 18:56:43 +00:00
f2d432f9a7 Fix potential turbo scheduler model patching issue. 2024-01-22 00:28:13 -05:00
4871a36458 Cleanup some unused imports. 2024-01-21 21:51:22 -05:00
45bf88d8ef Fix queue on change to respect auto queue checkbox (#2608)
* Fix render on change not respecting auto queue checkbox

Fix issue where autoQueueEnabled checkbox is ignored for changes if autoQueueMode is left on `change`

* Make check more specific
2024-01-21 21:34:39 -05:00
ef5a28b597 Merge branch 'patch-1' of https://github.com/TFWol/ComfyUI 2024-01-20 20:17:57 -05:00
5823f18a79 Fix for the extracting issue on windows. 2024-01-19 23:08:15 -05:00
78a70fda87 Remove useless import. 2024-01-19 15:38:05 -05:00
9fff3c46b4 Move some nodes to model_patches section. 2024-01-18 15:57:35 -05:00
d76a04b6ea Add unfinished ImageOnlyCheckpointSave node to save a SVD checkpoint.
This node is unfinished, SVD checkpoints saved with this node will
work with ComfyUI but not with anything else.
2024-01-17 19:46:21 -05:00
fad02dc2df Don't use PEP 604 type hints, to stay compatible with Python<3.10. 2024-01-17 17:16:34 -05:00
ee2c5fa72d Fix renaming upload widget (#2554)
* Fix renaming upload widget

* Allow custom name
2024-01-16 08:58:54 -05:00
818d0c01b2 Merge branch 'fix-logging-setting' of https://github.com/pythongosssss/ComfyUI 2024-01-16 08:29:38 -05:00
93bbe3f4c0 Auto queue on change (#2542)
* Add toggle to enable auto queue when graph is changed

* type fix

* better

* better alignment

* Change undoredo to not ignore inputs when autoqueue in change mode
2024-01-16 08:27:40 -05:00
23687da9a9 Fix logging not checking onChange 2024-01-15 17:45:48 +00:00
f9e55d8463 Only auto enable bf16 VAE on nvidia GPUs that actually support it. 2024-01-15 03:10:22 -05:00
1dab412c79 Add error handling to initial fix to keep cache intact 2024-01-14 15:06:33 -08:00
2395ae740a Make unclip more deterministic.
Pass a seed argument note that this might make old unclip images different.
2024-01-14 17:28:31 -05:00
270daa02a8 Adds copy image option if browser feature available (#2544)
* Adds copy image option if browser feature available

* refactor
2024-01-14 14:53:52 -05:00
432ba1c179 Merge branch 'control_before_generate' of https://github.com/pythongosssss/ComfyUI 2024-01-13 16:06:43 -05:00
b5ece6354d Merge branch 'undoredo-fix-modifiers' of https://github.com/pythongosssss/ComfyUI 2024-01-13 16:03:44 -05:00
9bddc9d94b Fix crash on group render 2024-01-13 21:02:51 +00:00
18511dd581 Manage group nodes (#2455)
* wip group manage

* prototyping ui

* tweaks

* wip

* wip

* more wip

* fixes
add deletion

* Fix tests

* fixes

* Remove test code

* typo

* fix crash when link is invalid
2024-01-13 15:43:20 -05:00
8e916735c0 export function 2024-01-13 18:57:59 +00:00
32034217ae add setting to change control after generate to run before 2024-01-13 18:57:47 +00:00
df49a727ff Fix modifiers triggering key down checks 2024-01-13 17:00:30 +00:00
56d9496b18 Rename status notes to status messages.
I think message describes them better.
2024-01-12 18:17:06 -05:00
bcc0bde2af Clear status notes on execution start. 2024-01-12 17:21:22 -05:00
1805cb2d69 Merge branch 'enhanced-history-status' of https://github.com/realazthat/ComfyUI 2024-01-12 16:36:56 -05:00
53c8a99e6c Make server storage the default.
Remove --server-storage argument.
2024-01-11 17:21:40 -05:00
d4edd9bfa8 Fix hypertile issue with high depths. 2024-01-11 15:13:38 -05:00
1b3d65bd84 Add error, status to /history endpoint 2024-01-11 10:16:42 -05:00
4ab0392f70 Resolved crashing nodes caused by FileNotFoundError during directory traversal
- Implemented a `try-except` block in the `recursive_search` function to handle `FileNotFoundError` gracefully.
- When encountering a file or directory path that cannot be accessed (causing `FileNotFoundError`), the code now logs a warning and skips processing for that specific path instead of crashing the node (CheckpointLoaderSimple was usually the first to break). This allows the rest of the directory traversal to proceed without interruption.
2024-01-11 06:34:33 -08:00
977eda19a6 Don't round noise mask. 2024-01-11 03:29:58 -05:00
10f2609fdd Add InpaintModelConditioning node.
This is an alternative to VAE Encode for inpaint that should work with
lower denoise.

This is a different take on #2501
2024-01-11 03:15:27 -05:00
b4e915e745 Skip SAG when latent is too small. 2024-01-10 04:08:43 -05:00
1a57423d30 Fix issue when using multiple t2i adapters with batched images. 2024-01-10 04:00:49 -05:00
2c80d9acb9 Round up to nearest power of 2 in SAG node to fix some resolution issues. 2024-01-09 15:12:12 -05:00
6a7bc35db8 Use basic attention implementation for small inputs on old pytorch. 2024-01-09 13:46:52 -05:00
b3b5ddb07a Support I mode images in LoadImageMask. 2024-01-08 17:08:17 -05:00
2d74fc4360 Fix issue with user manager parent dir not being created. 2024-01-08 17:08:00 -05:00
235727fed7 Store user settings/data on the server and multi user support (#2160)
* wip per user data

* Rename, hide menu

* better error
rework default user

* store pretty

* Add userdata endpoints
Change nodetemplates to userdata

* add multi user message

* make normal arg

* Fix tests

* Ignore user dir

* user tests

* Changed to default to browser storage and add server-storage arg

* fix crash on empty templates

* fix settings added before load

* ignore parse errors
2024-01-08 17:06:44 -05:00
6a10640f0d Support properly loading images with mode I. 2024-01-08 03:46:36 -05:00
c6951548cf Update optimized_attention_for_device function for new functions that
support masked attention.
2024-01-07 13:52:08 -05:00
aaa9017302 Add attention mask support to sub quad attention. 2024-01-07 04:13:58 -05:00
0c2c9fbdfa Support attention mask in split attention. 2024-01-06 13:16:48 -05:00
3ad0191bfb Implement attention mask on xformers. 2024-01-06 04:33:03 -05:00
af94eb14e3 fix: /free handler function name 2024-01-06 04:27:09 +02:00
7c9a0f7e0a Fix BasicScheduler issue with Loras. 2024-01-05 12:31:13 -05:00
35322a3766 StableZero123_Conditioning_Batched node.
This node lets you generate a batch of images with different elevations or
azimuths by setting the elevation_batch_increment and/or
azimuth_batch_increment.

It also sets the batch index for the latents so that the same init noise is
used on each frame.
2024-01-05 04:20:03 -05:00
6d281b4ff4 Add a /free route to unload models or free all memory.
A POST request to /free with: {"unload_models":true}
will unload models from vram.

A POST request to /free with: {"free_memory":true}
will unload models and free all cached data from the last run workflow.
2024-01-04 17:15:22 -05:00
8c6493578b Implement noise augmentation for SD 4X upscale model. 2024-01-03 14:27:11 -05:00
ef4f6037cb Fix model patches not working in custom sampling scheduler nodes. 2024-01-03 12:16:30 -05:00
a7874d1a8b Add support for the stable diffusion x4 upscaling model.
This is an old model.

Load the checkpoint like a regular one and use the new
SD_4XUpscale_Conditioning node.
2024-01-03 03:37:56 -05:00
2c4e92a98b Fix regression. 2024-01-02 14:41:33 -05:00
5eddfdd80c Refactor VAE code.
Replace constants with downscale_ratio and latent_channels.
2024-01-02 13:24:34 -05:00
8e2c99e3cf Fix issue when websocket is deleted when data is being sent. 2024-01-02 11:50:00 -05:00
a47f609f90 Auto detect out_channels from model. 2024-01-02 01:50:57 -05:00
79f73a4b33 Remove useless code. 2024-01-02 01:50:29 -05:00
66831eb6e9 Add node id and prompt id to websocket progress packet. 2024-01-01 14:27:56 -05:00
d1f3637a5a Add a denoise parameter to BasicScheduler node. 2023-12-31 15:37:20 -05:00
36e15f2507 Reregister nodes when pressing refresh button. 2023-12-31 05:05:14 -05:00
1b103e0cb2 Add argument to run the VAE on the CPU. 2023-12-30 05:49:07 -05:00
144e6580a4 This cache timeout is pretty useless in practice. 2023-12-29 17:47:24 -05:00
04b713dda1 Fix VALIDATE_INPUTS getting called multiple times.
Allow VALIDATE_INPUTS to only validate specific inputs.
2023-12-29 17:36:40 -05:00
12e822c6c8 Use function to calculate model size in model patcher. 2023-12-28 21:46:20 -05:00
e1e322cf69 Load weights that can't be lowvramed to target device. 2023-12-28 21:41:10 -05:00
a8baa40d85 Cleanup. 2023-12-28 12:23:07 -05:00
c782144433 Fix clip vision lowvram mode not working. 2023-12-27 13:50:57 -05:00
e478b1794e Only add _meta title to api prompt when dev mode is enabled in UI. 2023-12-27 01:07:02 -05:00
AYF
f15dce71fd Add title to the API workflow json. (#2380)
* Add `title` to the API workflow json.

* API: Move `title` to `_meta` dictionary, imply unused.
2023-12-27 00:55:11 -05:00
f21bb41787 Fix taesd VAE in lowvram mode. 2023-12-26 12:52:21 -05:00
61b3f15f8f Fix lowvram mode not working with unCLIP and Revision code. 2023-12-26 05:02:02 -05:00
392878a262 Fix hiding dom widgets. 2023-12-25 19:17:40 -08:00
257c2eaaa4 Merge branch 'patch-1' of https://github.com/savolla/ComfyUI 2023-12-25 12:24:31 -05:00
d0165d819a Fix SVD lowvram mode. 2023-12-24 07:13:18 -05:00
a252963f95 --disable-smart-memory now unloads everything like it did originally. 2023-12-23 04:25:06 -05:00
36a7953142 Greatly improve lowvram sampling speed by getting rid of accelerate.
Let me know if this breaks anything.
2023-12-22 14:38:45 -05:00
261bcbb0d9 A few missing comfy ops in the VAE. 2023-12-22 04:05:42 -05:00
d35267e85a Litegraph updates.
Update from upstream repo.

Auto select value in prompt.

Increase maximum number of nodes to 10k.
2023-12-21 13:21:25 -05:00
6781b181ef Fix potential tensor device issue with ImageCompositeMasked. 2023-12-21 02:35:01 -05:00
a1e1c69f7d LoadImage now loads all the frames from animated images as a batch. 2023-12-20 16:39:09 -05:00
5f54614e7f Add a RebatchImages node. 2023-12-20 16:22:18 -05:00
e82942cc29 Add a denoise parameter to the SDTurboScheduler. 2023-12-20 02:54:25 -05:00
ba3f3aa1ca Merge branch 'test-reliability' of https://github.com/pythongosssss/ComfyUI 2023-12-19 16:32:53 -05:00
8680ac3dfd try to improve test reliability 2023-12-19 20:38:07 +00:00
e65110fd93 Fix dom widgets not being hidden 2023-12-19 20:22:01 +00:00
40ea2bd011 Update requirements.txt
the UI launches with one missing module `torchvision`. spits out a `ModuleNotFoundError`. installing `torchvision` module fixed it.
2023-12-19 17:07:55 +03:00
9a7619b72d Fix regression with inpaint model. 2023-12-19 02:32:59 -05:00
571ea8cdcc Fix SAG not working with cfg 1.0 2023-12-18 17:03:32 -05:00
8cf1daa108 Fix SDXL area composition sometimes not using the right pooled output. 2023-12-18 12:54:23 -05:00
d2f322902c Fix wrong Stable Zero123 node name. 2023-12-18 03:59:50 -05:00
2258f85159 Support stable zero 123 model.
To use it use the ImageOnlyCheckpointLoader to load the checkpoint and
the new Stable_Zero123 node.
2023-12-18 03:48:04 -05:00
2f9d6a97ec Add --deterministic option to make pytorch use deterministic algorithms. 2023-12-17 16:59:21 -05:00
a036b94075 Move SaveAnimated nodes to image->animation. 2023-12-17 02:37:22 -05:00
6453dc1ca2 Fix name counter preventing more than 3 of the same node
Fix linked widget offset when populating values
2023-12-16 14:16:12 +00:00
e45d920ae3 Don't resize clip vision image when the size is already good. 2023-12-16 03:06:10 -05:00
13e6d5366e Switch clip vision to manual cast.
Make it use the same dtype as the text encoder.
2023-12-16 02:47:26 -05:00
574efd3782 Fix perpneg not working on SDXL. 2023-12-16 02:30:16 -05:00
172984db01 Fix SAG not working on certain resolutions. 2023-12-16 01:29:57 -05:00
6596654d47 Add a LatentBatch node. 2023-12-16 01:21:00 -05:00
719fa0866f Set clip vision model in eval mode so it works without inference mode. 2023-12-15 18:53:08 -05:00
adc40e3d7b Forgot this. 2023-12-15 15:46:23 -05:00
014c8bf2f2 Refactor LCM to support more model types. 2023-12-15 15:26:12 -05:00
9cad2f06ff Make perp neg take a conditioning input instead of a CLIP one. 2023-12-15 14:40:57 -05:00
574363a8a6 Implement Perp-Neg 2023-12-16 00:28:16 +05:30
a5056cfb1f Remove useless code. 2023-12-15 01:28:16 -05:00
b12b48e170 cleanup. 2023-12-14 20:11:46 -05:00
329c571993 Improve code legibility. 2023-12-14 11:41:49 -05:00
6c5990f7db Fix cfg being calculated more than once if sampler_cfg_function. 2023-12-13 20:28:04 -05:00
ba04a87d10 Refactor and improve the sag node.
Moved all the sag related code to comfy_extras/nodes_sag.py
2023-12-13 16:11:26 -05:00
6761233e9d Implement Self-Attention Guidance (#2201)
* First SAG test

* need to put extra options on the model instead of patcher

* no errors and results seem not-broken

* Use @ashen-uncensored formula, which works better!!!

* Fix a crash when using weird resolutions. Remove an unnecessary UNet call

* Improve comments, optimize memory in blur routine

* SAG works with sampler_cfg_function
2023-12-13 15:52:11 -05:00
390078904c Group node fixes (#2259)
* Prevent cleaning graph state on undo/redo

* Remove pause rendering due to LG bug

* Fix crash on disconnected internal reroutes

* Fix widget inputs being incorrect order and value

* Fix initial primitive values on connect

* basic support for basic rerouted converted inputs

* Populate primitive to reroute input

* dont crash on bad primitive links

* Fix convert to group changing control value

* reduce restrictions

* fix random crash in tests
2023-12-13 00:56:39 -05:00
b454a67bb9 Support segmind vega model. 2023-12-12 19:09:53 -05:00
824e4935f5 Add dtype parameter to VAE object. 2023-12-12 12:03:29 -05:00
32b7e7e769 Add manual cast to controlnet. 2023-12-12 11:32:42 -05:00
3152023fbc Use inference dtype for unet memory usage estimation. 2023-12-11 23:50:38 -05:00
77755ab8db Refactor comfy.ops
comfy.ops -> comfy.ops.disable_weight_init

This should make it more clear what they actually do.

Some unused code has also been removed.
2023-12-11 23:27:13 -05:00
b0aab1e4ea Add an option --fp16-unet to force using fp16 for the unet. 2023-12-11 18:36:29 -05:00
ba07cb748e Use faster manual cast for fp8 in unet. 2023-12-11 18:24:44 -05:00
ab93abd4b2 Prevent cleaning graph state on undo/redo (#2255)
* Prevent cleaning graph state on undo/redo

* Remove pause rendering due to LG bug
2023-12-11 12:33:35 -05:00
57926635e8 Switch text encoder to manual cast.
Use fp16 text encoder weights for CPU inference to lower memory usage.
2023-12-10 23:00:54 -05:00
69033081c5 mask editor bugfix
- Addressing the issue where an unnecessary hidden panel disrupts the drawing.
2023-12-11 00:24:28 +09:00
340177e6e8 Disable non blocking on mps. 2023-12-10 01:30:35 -05:00
614b7e731f Implement GLora. 2023-12-09 18:15:26 -05:00
cb63e230b4 Make lora code a bit cleaner. 2023-12-09 14:15:09 -05:00
9e411073e9 Add instructions for those that have python 3.12 2023-12-09 13:41:30 -05:00
eccc9e64a6 Merge branch 'group-reroute-fix' of https://github.com/pythongosssss/ComfyUI 2023-12-09 12:01:26 -05:00
da74e3bbe3 Update pytorch nightly packaging workflow. 2023-12-09 12:01:17 -05:00
174eba8e95 Use own clip vision model implementation. 2023-12-09 11:56:31 -05:00
080ef75c31 fix 2023-12-09 13:19:21 +00:00
9aaf368a41 Fix internal reroutes connected to other groups 2023-12-09 13:04:35 +00:00
97015b6b38 Cleanup. 2023-12-08 16:02:08 -05:00
a4ec54a40d Add linear_start and linear_end to model_config.sampling_settings 2023-12-08 02:49:30 -05:00
9ac0b487ac Make --gpu-only put intermediate values in GPU memory instead of cpu. 2023-12-08 02:35:45 -05:00
cdff081023 Fix hypertile. 2023-12-07 15:22:35 -05:00
efb704c758 Support attention masking in CLIP implementation. 2023-12-07 02:51:02 -05:00
248d9125b0 Merge branch 'ht_deterministic' of https://github.com/asagi4/ComfyUI 2023-12-07 01:45:11 -05:00
fbdb14d4c4 Cleaner CLIP text encoder implementation.
Use a simple CLIP model implementation instead of the one from
transformers.

This will allow some interesting things that would too hackish to implement
using the transformers implementation.
2023-12-06 23:50:03 -05:00
03eadbb53c Make HyperTile deterministic 2023-12-06 21:17:56 +02:00
2db86b4676 Slightly faster lora applying. 2023-12-06 05:13:14 -05:00
e134547341 Merge branch 'reroute-converted-inputs' of https://github.com/pythongosssss/ComfyUI
# Conflicts:
#	web/extensions/core/widgetInputs.js
2023-12-06 03:01:35 -05:00
8112a0d9fc improve: Mask Editor (#2171)
* renewal mask editor

* fix: ignoring keydown when 2nd open
2023-12-06 01:56:03 -05:00
ef29542030 Merge branch 'primitive-text-replacement' of https://github.com/pythongosssss/ComfyUI 2023-12-05 23:11:03 -05:00
8de6f94f5c Allow widget placeholder replacement on primitives 2023-12-05 21:02:10 +00:00
bcc469a2c9 try to stop test failing 2023-12-05 20:28:52 +00:00
a99da6667f reroute + primitive tests 2023-12-05 20:28:05 +00:00
44265e0810 Allow connecting primitivenode to reroutes 2023-12-05 20:27:13 +00:00
1bbd65ab30 Missed this one. 2023-12-05 12:48:41 -05:00
9b655d4fd7 Fix memory issue with control loras. 2023-12-04 21:55:19 -05:00
26b1c0a771 Fix control lora on fp8. 2023-12-04 13:47:41 -05:00
be3468ddd5 Less useless downcasting. 2023-12-04 12:53:46 -05:00
ca82ade765 Use .itemsize to get dtype size for fp8. 2023-12-04 11:52:06 -05:00
31b0f6f3d8 UNET weights can now be stored in fp8.
--fp8_e4m3fn-unet and --fp8_e5m2-unet are the two different formats
supported by pytorch.
2023-12-04 11:10:00 -05:00
af365e4dd1 All the unet ops with weights are now handled by comfy.ops 2023-12-04 03:12:18 -05:00
6efe561c2a Merge branch 'fix-template-sorting' of https://github.com/pythongosssss/ComfyUI 2023-12-03 22:51:23 -05:00
77ab2c3f69 fix template sorting 2023-12-03 17:17:23 +00:00
44d8abadf0 allow muting group node 2023-12-03 17:04:16 +00:00
496de0891d Allow removing erroring embedded groups
Unregister group nodes on workflow change
2023-12-03 16:49:48 +00:00
61a123a1e0 A different way of handling multiple images passed to SVD.
Previously when a list of 3 images [0, 1, 2] was used for a 6 frame video
they were concated like this:
[0, 1, 2, 0, 1, 2]

now they are concated like this:
[0, 0, 1, 1, 2, 2]
2023-12-03 03:31:47 -05:00
b2517b4ceb Load api workflow if regular workflow isn't in loaded image. 2023-12-02 13:56:11 -05:00
88e2c9746b Merge branch 'image-cache' of https://github.com/jn-jairo/ComfyUI 2023-12-02 13:02:33 -05:00
28220fa839 Fix node growing with DOM widgets when adding image even if enough space 2023-12-02 12:02:03 +00:00
c92f3dca73 Merge branch 'master' into image-cache 2023-12-02 05:16:21 -03:00
2995a24725 Update readme. 2023-12-01 18:29:33 -05:00
8491280504 Add Extension tests (#2125)
* Add test for extension hooks
Add afterConfigureGraph callback

* fix comment
2023-12-01 17:24:20 -05:00
ec7a00aa96 Fix extension widgets not working. 2023-12-01 04:13:04 -05:00
5d5c320054 Fix right click not working for some users. 2023-12-01 02:03:34 -05:00
c97be4db91 Support SD2.1 turbo checkpoint. 2023-11-30 19:27:03 -05:00
6b769bca01 Do a garbage collect after the interval even if nothing is running. 2023-11-30 15:22:32 -05:00
7f469203b7 Group nodes (#1776)
* setup ui unit tests

* Refactoring, adding connections

* Few tweaks

* Fix type

* Add general test

* Refactored and extended test

* move to describe

* for groups

* wip group nodes

* Relink nodes
Fixed widget values
Convert to nodes

* Reconnect on convert back

* add via node menu + canvas
refactor

* Add ws event handling

* fix using wrong node on widget serialize

* allow reroute pipe
fix control_after_generate configure

* allow multiple images

* Add test for converted widgets on missing nodes + fix crash

* tidy

* mores tests + refactor

* throw earlier to get less confusing error

* support outputs

* more test

* add ci action

* use lts node

* Fix?

* Prevent connecting non matching combos

* update

* accidently removed npm i

* Disable logging extension

* fix naming
allow control_after_generate custom name
allow convert from reroutes

* group node tests

* Add executing info, custom node icon
Tidy

* internal reroute just works

* Fix crash on virtual nodes e.g. note

* Save group nodes to templates

* Fix template nodes not being stored

* Fix aborting convert

* tidy

* Fix reconnecting output links on convert to group

* Fix links on convert to nodes

* Handle missing internal nodes

* Trigger callback on text change

* Apply value on connect

* Fix converted widgets not reconnecting

* Group node updates
- persist internal ids in current session
- copy widget values when converting to nodes
- fix issue serializing converted inputs

* Resolve issue with sanitized node name

* Fix internal id

* allow outputs to be used internally and externally

* order widgets on group node
various fixes

* fix imageupload widget requiring a specific name

* groupnode imageupload test
give widget unique name

* Fix issue with external node links

* Add VAE model

* Fix internal node id check

* fix potential crash

* wip widget input support

* more wip group widget inputs

* Group node refactor
Support for primitives/converted widgets

* Fix convert to nodes with internal reroutes

* fix applying primitive

* Fix control widget values

* fix test
2023-11-30 14:13:27 -05:00
d19de2753e Merge branch 'fix_folders_handling' of https://github.com/fazo96/ComfyUI 2023-11-29 14:10:30 -05:00
777f6b1522 Add to README that SDXL Turbo is supported. 2023-11-28 14:45:00 -05:00
b911eefc42 Limit gc.collect() to once every 10 seconds. 2023-11-28 14:20:56 -05:00
57d7f4464f Add SDTurboScheduler node. 2023-11-28 13:35:32 -05:00
21063fa35b Lower compress level of png sent on websocket. 2023-11-28 11:01:05 -05:00
983ebc5792 Use smart model management for VAE to decrease latency. 2023-11-28 04:58:51 -05:00
798a34d009 Lower compress level for image preview. 2023-11-28 04:57:59 -05:00
a667638442 Merge branch 'undo-redo' of https://github.com/pythongosssss/ComfyUI 2023-11-27 22:29:46 -05:00
c45d1b9b67 Add a function to load a unet from a state dict. 2023-11-27 17:41:29 -05:00
f30b992b18 .sigma and .timestep now return tensors on the same device as the input. 2023-11-27 16:41:33 -05:00
488de0b4df ModelSamplingDiscreteLCM -> ModelSamplingDiscreteDistilled 2023-11-27 16:32:03 -05:00
13fdee6abf Try to free memory for both cond+uncond before inference. 2023-11-27 14:55:40 -05:00
be71bb5e13 Tweak memory inference calculations a bit. 2023-11-27 14:04:16 -05:00
9be0b30cf1 fix formatting 2023-11-27 14:02:50 +00:00
34eccd863b Add simple undo redo history 2023-11-27 14:00:15 +00:00
96c2deeefb Merge branch 'path_error_fix' of https://github.com/jeske/ComfyUI 2023-11-27 02:06:08 -05:00
edd6f75d3a better error for invalid output paths 2023-11-26 13:10:31 -07:00
6aa1bcd601 Remove hard coded max_items in history API 2023-11-26 17:23:11 +04:00
39e75862b2 Fix regression from last commit. 2023-11-26 03:43:02 -05:00
50dc39d6ec Clean up the extra_options dict for the transformer patches.
Now everything in transformer_options gets put in extra_options.
2023-11-26 03:13:56 -05:00
5b37270d3a Add a lora loader node for models with no CLIP. 2023-11-25 02:26:50 -05:00
5d6dfce548 Fix importing diffusers unets. 2023-11-24 20:35:29 -05:00
e020ab61f9 Fix output APNG not working with ffmpeg. 2023-11-24 18:24:19 -05:00
8ad5d494d5 Fix APNG not working in ffmpeg. 2023-11-24 18:14:17 -05:00
916e9c998c Use same default fps as webp node. 2023-11-24 11:19:23 -05:00
eff24ea6aa Add a node to save animated PNG files. These work in ffpmeg unlike webp. 2023-11-24 11:12:10 -05:00
3e5ea74ad3 Make buggy xformers fall back on pytorch attention. 2023-11-24 03:55:35 -05:00
982338b9bb Fix issue loading webp files in UI. 2023-11-24 02:08:08 -05:00
c782cf3ea9 Add to Readme that Stable Video Diffusion is supported. 2023-11-24 00:27:08 -05:00
02ffbb2de3 Fix typo. 2023-11-23 23:20:07 -05:00
42dfae6331 Nodes to properly use the SDV img2vid checkpoint.
The img2vid model is conditioned on clip vision output only which means
there's no CLIP model which is why I added a ImageOnlyCheckpointLoader to
load it. Note that the unClipCheckpointLoader can also load it because it
also has a CLIP_VISION output.

SDV_img2vid_Conditioning is the node used to pass the right conditioning
to the img2vid model.

VideoLinearCFGGuidance applies a linearly decreasing CFG scale to each
video frame from the cfg set in the sampler node to min_cfg.

SDV_img2vid_Conditioning can be found in conditioning->video_models
ImageOnlyCheckpointLoader can be found in loaders->video_models
VideoLinearCFGGuidance can be found in sampling->video_models
2023-11-23 19:48:49 -05:00
871cc20e13 Support SVD img2vid model. 2023-11-23 19:41:33 -05:00
1964bf1e78 fix: folder handling issues 2023-11-23 22:24:58 +01:00
022033a0e7 Fix SaveAnimatedWEBP not working when metadata is disabled. 2023-11-23 15:39:35 -05:00
4d2437e681 Call widget onRemove to remove element 2023-11-23 19:43:55 +00:00
a657f96c5c Add a node to save animated webp. 2023-11-23 14:28:41 -05:00
87031a1945 Update readme with link to LCM example page. 2023-11-23 11:59:11 -05:00
d03d8aa2e3 Fix loading groups. 2023-11-23 01:09:15 -05:00
410bf07771 Make VAE memory estimation take dtype into account. 2023-11-22 18:17:19 -05:00
32447f0c39 Add sampling_settings so models can specify specific sampling settings. 2023-11-22 17:24:00 -05:00
70d2ea0faa Control filter list (#2009)
* Add control_filter_list to filter items after queue

* fix regex

* backwards compatibility

* formatting

* revert

* Add and fix test
2023-11-22 12:52:20 -05:00
1ca4802e8c Merge branch 'hide-if-collapsed' of https://github.com/pythongosssss/ComfyUI 2023-11-22 11:46:21 -05:00
ab7d4f7848 Handle collapsing to hide element 2023-11-22 13:53:30 +00:00
c3ae99a749 Allow controlling downscale and upscale methods in PatchModelAddDownscale. 2023-11-22 03:23:16 -05:00
72741105a6 Remove useless code. 2023-11-21 17:27:28 -05:00
6a491ebe27 Allow model config to preprocess the vae state dict on load. 2023-11-21 16:29:18 -05:00
d66b631d74 Merge branch 'fix-collapsed-clip' of https://github.com/pythongosssss/ComfyUI 2023-11-21 13:26:26 -05:00
cd4fc77d5f Add taesd and taesdxl to VAELoader node.
They will show up if both the taesd_encoder and taesd_decoder or taesdxl
model files are present in the models/vae_approx directory.
2023-11-21 12:54:19 -05:00
89e31abc46 Fix clipping of collapsed nodes 2023-11-21 17:54:01 +00:00
6ff06fa796 Animated image output support (#2008)
* Refactor multiline widget into generic DOM widget

* wip webp preview

* webp support

* fix check

* fix sizing

* show image when zoomed out

* Swap webp checkto generic animated image flag

* remove duplicate

* Fix falsy check
2023-11-21 01:33:58 -05:00
ce67dcbcda Make it easy for models to process the unet state dict on load. 2023-11-20 23:17:53 -05:00
2dd5b4dd78 Only show last 200 elements in the UI history tab. 2023-11-20 16:56:29 -05:00
a03dde190e Cap maximum history size at 10000. Delete oldest entry when reached. 2023-11-20 16:38:39 -05:00
31c5ea7b2c Add LatentInterpolate to interpolate between latents. 2023-11-20 03:55:51 -05:00
dba4f3b4fc Add a RepeatImageBatch node. 2023-11-19 06:09:01 -05:00
d9d8702d8d percent_to_sigma now returns a float instead of a tensor. 2023-11-18 23:20:29 -05:00
8a451234b3 Add ImageCrop node. 2023-11-18 04:44:17 -05:00
0cf4e86939 Add some command line arguments to store text encoder weights in fp8.
Pytorch supports two variants of fp8:
--fp8_e4m3fn-text-enc (the one that seems to give better results)
--fp8_e5m2-text-enc
2023-11-17 02:56:59 -05:00
107e78b1cb Add support for loading SSD1B diffusers unet version.
Improve diffusers model detection.
2023-11-16 23:12:55 -05:00
7e3fe3ad28 Make deep shrink behave like it should. 2023-11-16 15:26:28 -05:00
9f00a18095 Fix potential issues. 2023-11-16 14:59:54 -05:00
bd07ad1861 Add PatchModelAddDownscale (Kohya Deep Shrink) node.
By adding a downscale to the unet in the first timesteps this node lets
you generate images at higher resolutions with less consistency issues.
2023-11-16 13:25:46 -05:00
7ea6bb038c Print warning when controlnet can't be applied instead of crashing. 2023-11-16 12:57:12 -05:00
dcec1047e6 Invert the start and end percentages in the code.
This doesn't affect how percentages behave in the frontend but breaks
things if you relied on them in the backend.

percent_to_sigma goes from 0 to 1.0 instead of 1.0 to 0 for less confusion.

Make percent 0 return an extremely large sigma and percent 1.0 return a
zero one to fix imprecision.
2023-11-16 04:23:44 -05:00
7114cfec0e Always clone graph data when loading to fix some load issues. 2023-11-15 15:55:02 -05:00
629e4c552c Merge branch 'master' of https://github.com/42lux/ComfyUI 2023-11-15 01:47:21 -05:00
57eea0efbb heunpp2 sampler. 2023-11-14 23:50:55 -05:00
7b87c825a3 Added Colorschemes. Arc, North and Github. 2023-11-15 02:37:35 +01:00
728613bb3e Fix last pr. 2023-11-14 14:41:31 -05:00
ec3d0ab432 Merge branch 'master' of https://github.com/Jannchie/ComfyUI 2023-11-14 14:38:07 -05:00
c962884a5c Make bislerp work on GPU. 2023-11-14 11:38:36 -05:00
420beeeb05 Clean up and refactor sampler code.
This should make it much easier to write custom nodes with kdiffusion type
samplers.
2023-11-14 00:39:34 -05:00
f2e49b1d57 fix: adaptation to older versions of pytroch 2023-11-14 14:32:05 +09:00
94cc718e9c Add a way to add patches to the input block. 2023-11-14 00:08:12 -05:00
8509bd58b4 Reorganize custom_sampling nodes. 2023-11-13 21:45:23 -05:00
61112c81b9 Add a node to flip the sigmas for unsampling. 2023-11-13 21:45:08 -05:00
eb0407e806 Update litegraph to latest. 2023-11-13 16:26:28 -05:00
7339479b10 Disable xformers when it can't load properly. 2023-11-13 12:31:10 -05:00
f12ec55983 Allow boolean widgets to have no options dict. 2023-11-13 00:42:34 -05:00
4aeef781a3 Support number/text ids when importing API JSON (#1952)
* support numeric/text ids
2023-11-12 14:49:23 -05:00
4781819a85 Make memory estimation aware of model dtype. 2023-11-12 04:28:26 -05:00
dd4ba68b6e Allow different models to estimate memory usage differently. 2023-11-12 04:03:52 -05:00
2c9dba8dc0 sampling_function now has the model object as the argument. 2023-11-12 03:45:10 -05:00
8d80584f6a Remove useless argument from uni_pc sampler. 2023-11-12 01:25:33 -05:00
006b24cc32 Prevent image cache 2023-11-11 15:56:14 -03:00
248aa3e563 Fix bug. 2023-11-11 12:20:16 -05:00
4a8a839b40 Add option to use in place weight updating in ModelPatcher. 2023-11-11 01:11:12 -05:00
412d3ff57d Refactor. 2023-11-11 01:11:06 -05:00
ca2812bae0 Fix RescaleCFG for batch size > 1. 2023-11-10 22:05:25 -05:00
58d5d71a93 Working RescaleCFG node.
This was broken because of recent changes so I fixed it and moved it from
the experiments repo.
2023-11-10 20:52:10 -05:00
3e0033ef30 Fix model merge bug.
Unload models before getting weights for model patching.
2023-11-10 03:19:05 -05:00
002aefa382 Support lcm models.
Use the "lcm" sampler to sample them, you also have to use the
ModelSamplingDiscrete node to set them as lcm models to use them properly.
2023-11-09 18:30:22 -05:00
ca71e542d2 Lower cfg step to 0.1 in sampler nodes. 2023-11-09 17:35:17 -05:00
72e3feb573 Load API JSON (#1932)
* added loading api json

* revert async change

* reorder
2023-11-09 13:33:43 -05:00
cd6df8b323 Fix sanitize node name removing the "/" character. 2023-11-09 13:10:19 -05:00
ec12000136 Add support for full diff lora keys. 2023-11-08 22:05:31 -05:00
064d7583eb Add a CONDConstant for passing non tensor conds to unet. 2023-11-08 01:59:09 -05:00
794dd2064d Fix typo. 2023-11-07 23:41:55 -05:00
0a6fd49a3e Print leftover keys when using the UNETLoader. 2023-11-07 22:15:55 -05:00
fe40109b57 Fix issue with object patches not being copied with patcher. 2023-11-07 22:15:15 -05:00
a527d0c795 Code refactor. 2023-11-07 19:33:40 -05:00
2a23ba0b8c Fix unet ops not entirely on GPU. 2023-11-07 04:30:37 -05:00
844dbf97a7 Add: advanced->model->ModelSamplingDiscrete node.
This allows changing the sampling parameters of the model (eps or vpred)
or set the model to use zsnr.
2023-11-07 03:28:53 -05:00
d07cd44272 Merge branch 'master' of https://github.com/cubiq/ComfyUI 2023-11-07 01:52:13 -05:00
656c0b5d90 CLIP code refactor and improvements.
More generic clip model class that can be used on more types of text
encoders.

Don't apply weighting algorithm when weight is 1.0

Don't compute an empty token output when it's not needed.
2023-11-06 14:17:41 -05:00
b3fcd64c6c Make SDTokenizer class work with more types of tokenizers. 2023-11-06 01:09:18 -05:00
4acfc11a80 add difference blend mode 2023-11-05 19:00:23 +01:00
a6c83b3cd0 Merge branch 'fix_unet_wrapper_function_name' of https://github.com/gameltb/ComfyUI 2023-11-05 12:41:38 -05:00
02f062b5b7 Sanitize unknown node types on load to prevent XSS. 2023-11-05 12:29:28 -05:00
7e455adc07 fix unet_wrapper_function name in ModelPatcher 2023-11-05 17:11:44 +08:00
1ffa8858e7 Move model sampling code to comfy/model_sampling.py 2023-11-04 01:32:23 -04:00
ae2acfc21b Don't convert Nan to zero.
Converting Nan to zero is a bad idea because it makes it hard to tell when
something went wrong.
2023-11-03 13:13:15 -04:00
ee74ef5c9e Increase maximum batch size in LatentRebatch. 2023-11-02 13:07:41 -04:00
6e84a01ecc Refactor the template manager (#1878)
* add drag-drop to node template manager

* better dnd, save field on change

* actually save templates

---------

Co-authored-by: matt3o <matt3o@gmail.com>
2023-11-02 12:29:57 -04:00
dd116abfc4 Merge branch 'quantize-dither' of https://github.com/tsone/ComfyUI 2023-11-02 00:57:00 -04:00
d2e27b48f1 sampler_cfg_function now gets the noisy output as argument again.
This should make things that use sampler_cfg_function behave like before.

Added an input argument for those that want the denoised output.

This means you can calculate the x0 prediction of the model by doing:
(input - cond) for example.
2023-11-01 21:24:08 -04:00
2455aaed8a Allow model or clip to be None in load_lora_for_models. 2023-11-01 20:27:20 -04:00
45a3df1cde Merge branch 'filter-widgets-crash-fix' of https://github.com/Jantolick/ComfyUI 2023-11-01 20:17:25 -04:00
ecb80abb58 Allow ModelSamplingDiscrete to be instantiated without a model config. 2023-11-01 19:13:03 -04:00
88410ace9b fix: handle null case for currentNode widgets to prevent scroll error 2023-11-01 16:52:51 -04:00
e73ec8c4da Not used anymore. 2023-11-01 00:01:30 -04:00
111f1b5255 Fix some issues with sampling precision. 2023-10-31 23:49:29 -04:00
7c0f255de1 Clean up percent start/end and make controlnets work with sigmas. 2023-10-31 22:14:32 -04:00
a268a574fa Remove a bunch of useless code.
DDIM is the same as euler with a small difference in the inpaint code.
DDIM uses randn_like but I set a fixed seed instead.

I'm keeping it in because I'm sure if I remove it people are going to
complain.
2023-10-31 18:11:29 -04:00
1777b54d02 Sampling code changes.
apply_model in model_base now returns the denoised output.

This means that sampling_function now computes things on the denoised
output instead of the model output. This should make things more consistent
across current and future models.
2023-10-31 17:33:43 -04:00
23c5d17837 Added Bayer dithering to Quantize node. 2023-10-31 22:22:40 +01:00
c837a173fa Fix some memory issues in sub quad attention. 2023-10-30 15:30:49 -04:00
125b03eead Fix some OOM issues with split attention. 2023-10-30 13:14:11 -04:00
41b07ff8d7 Fix TAESD preview to only decode first latent, instead of all 2023-10-29 13:30:23 -05:00
a12cc05323 Add --max-upload-size argument, the default is 100MB. 2023-10-29 03:55:46 -04:00
aac8fc99d6 Cleanup webp import code a bit. 2023-10-28 12:24:50 -04:00
2a134bfab9 Fix checkpoint loader with config. 2023-10-27 22:13:55 -04:00
e60ca6929a SD1 and SD2 clip and tokenizer code is now more similar to the SDXL one. 2023-10-27 15:54:04 -04:00
6ec3f12c6e Support SSD1B model and make it easier to support asymmetric unets. 2023-10-27 14:45:15 -04:00
434ce25ec0 Restrict loading embeddings from embedding folders. 2023-10-27 02:54:13 -04:00
40963b5a16 Apply primitive nodes to graph before serializing workflow. 2023-10-26 19:52:41 -04:00
723847f6b3 Faster clip image processing. 2023-10-26 01:53:01 -04:00
a373367b0c Fix some OOM issues with split and sub quad attention. 2023-10-25 20:17:28 -04:00
7fbb217d3a Fix uni_pc returning noisy image when steps <= 3 2023-10-25 16:08:30 -04:00
3783cb8bfd change 'c_adm' to 'y' in ControlNet.get_control 2023-10-25 08:24:32 -05:00
d1d2fea806 Pass extra conds directly to unet. 2023-10-25 00:07:53 -04:00
036f88c621 Refactor to make it easier to add custom conds to models. 2023-10-24 23:31:12 -04:00
3fce8881ca Sampling code refactor to make it easier to add more conds. 2023-10-24 03:38:41 -04:00
5c65da312a Remove prints. 2023-10-23 23:39:22 -04:00
b935bea3a0 The frontend can now load workflows from webp exif. 2023-10-23 21:13:50 -04:00
2ec6158e9e Call widget callback on value control to fix primitive node issue. 2023-10-22 23:38:18 -04:00
8594c8be4d Empty the cache when torch cache is more than 25% free mem. 2023-10-22 13:58:12 -04:00
8b65f5de54 attention_basic now works with hypertile. 2023-10-22 03:59:53 -04:00
e6bc42df46 Make sub_quad and split work with hypertile. 2023-10-22 03:51:29 -04:00
8cfce083c4 Fix primitive node control value not getting loaded. 2023-10-21 22:36:04 -04:00
a0690f9df9 Fix t2i adapter issue. 2023-10-21 20:31:24 -04:00
9906e3efe3 Make xformers work with hypertile. 2023-10-21 13:23:03 -04:00
1443caf373 HyperTile node, can be found in: _for_testing->HyperTile 2023-10-21 05:16:38 -04:00
8d50f0890d Merge branch 'templates-export-import' of https://github.com/jn-jairo/ComfyUI 2023-10-21 01:29:24 -04:00
77c893350a Fix previous commit that broke tests. 2023-10-20 23:13:54 -04:00
e0c0029fc1 Try to speed up the test-ui workflow. 2023-10-20 23:00:05 -04:00
25e3e5af68 Use npm ci for ci instead of npm install in tests. 2023-10-20 22:52:12 -04:00
5818ca83a2 Unit tests + widget input fixes (#1760)
* setup ui unit tests

* Refactoring, adding connections

* Few tweaks

* Fix type

* Add general test

* Refactored and extended test

* move to describe

* for groups

* Add test for converted widgets on missing nodes + fix crash

* tidy

* mores tests + refactor

* throw earlier to get less confusing error

* support outputs

* more test

* add ci action

* use lts node

* Fix?

* Prevent connecting non matching combos

* update

* accidently removed npm i

* Disable logging extension

* added step to generate object_info

* fix python

* install python

* install deps

* fix cwd?

* logging

* Fix double resolve

* create dir

* update pkg
2023-10-20 22:49:04 -04:00
484bfe46c2 Clear importInput after import so change event works with same file 2023-10-20 15:19:29 -03:00
4185324a1d Fix uni_pc sampler math. This changes the images this sampler produces. 2023-10-20 04:16:53 -04:00
f1062be622 fix: Fixing intermittent crashes with undefined graphs in the Firefox browser. 2023-10-20 00:07:08 +09:00
e6962120c6 Make sure cond_concat is on the right device. 2023-10-19 01:14:25 -04:00
45c972aba8 Refactor cond_concat into conditioning. 2023-10-18 20:36:58 -04:00
430a8334c5 Fix some potential issues. 2023-10-18 19:48:36 -04:00
782a24fce6 Refactor cond_concat into model object. 2023-10-18 16:48:37 -04:00
0d45a565da Fix memory issue related to control loras.
The cleanup function was not getting called.
2023-10-18 02:43:01 -04:00
c2bb34d865 Implement updated FreeU as _for_testing->FreeU_V2 node 2023-10-18 02:06:49 -04:00
a555074737 Use name from input to export single node template 2023-10-17 19:44:26 -03:00
6dbb18df92 Export and import templates 2023-10-17 17:53:57 -03:00
d44a2de49f Make VAE code closer to sgm. 2023-10-17 15:18:51 -04:00
f8caa24bcc Support hypernetwork with mish activation function and layer norm. 2023-10-17 12:08:03 -04:00
92f0318630 Try to fix notebook. 2023-10-17 11:39:15 -04:00
88ceeb3f29 Merge branch 'fix-node-bounding' of https://github.com/jn-jairo/ComfyUI 2023-10-17 03:23:49 -04:00
23680a9155 Refactor the attention stuff in the VAE. 2023-10-17 03:19:29 -04:00
c8013f73e5 Add some Quadro cards to the list of cards with broken fp16. 2023-10-16 16:48:46 -04:00
5a608aa37c Fix node getBounding for collapsed nodes 2023-10-16 17:29:23 -03:00
142aac3003 Merge branch 'group-options' of https://github.com/jn-jairo/ComfyUI 2023-10-16 16:18:32 -04:00
682c84ccf3 Fix fit group to nodes with reroute and collapsed nodes 2023-10-16 16:00:01 -03:00
e8c02219ee Fix add selected nodes to empty group 2023-10-16 15:26:36 -03:00
7d5d0fd577 Group options
- Add Group For Selected Nodes
- Add Selected Nodes To Group
- Fit Group To Nodes
2023-10-16 15:12:40 -03:00
bb064c9796 Add a separate optimized_attention_masked function. 2023-10-16 02:31:24 -04:00
7e09e889e3 Make clear that the old CheckpointLoader is deprecated. 2023-10-15 02:22:22 -04:00
2231edec21 Merge branch 'filter-files-extensions' of https://github.com/jn-jairo/ComfyUI 2023-10-14 14:30:24 -04:00
1b782f2494 Merge branch 'group-select-nodes' of https://github.com/jn-jairo/ComfyUI 2023-10-14 14:28:59 -04:00
a0ce8a443e Merge branch 'shortcut-collapse' of https://github.com/jn-jairo/ComfyUI 2023-10-14 14:28:17 -04:00
a7b65b9505 Group menu option select nodes 2023-10-14 12:11:49 -03:00
8d04978298 Allow all extensions if extension list is empty 2023-10-14 11:59:35 -03:00
2e6270e328 Stop auto queue on error 2023-10-14 11:56:44 -03:00
25f0f4e9c8 Shortcut Alt + C to collapse/uncollapse selected nodes 2023-10-14 11:54:33 -03:00
3fcab0c642 Merge branch 'fix-mask-nodes' of https://github.com/jn-jairo/ComfyUI 2023-10-14 02:42:06 -04:00
fd4c5f07e7 Add a --bf16-unet to test running the unet in bf16. 2023-10-13 14:51:10 -04:00
9a55dadb4c Refactor code so model can be a dtype other than fp32 or fp16. 2023-10-13 14:41:17 -04:00
b5fa3d28d7 Fix MaskComposite 2023-10-13 13:40:53 -03:00
87097a11c3 Fix FeatherMask 2023-10-13 12:26:54 -03:00
fee3b0c070 Move and comment out. 2023-10-12 20:54:43 -04:00
851a4bdb80 Update extra_model_paths.yaml.example with comfy specific example 2023-10-12 21:26:27 +08:00
536799d172 Merge branch 'fix-1723' of https://github.com/chrisgoringe/ComfyUI 2023-10-11 23:35:24 -04:00
41d2c5660d add query 2023-10-12 14:26:53 +11:00
88733c997f pytorch_attention_enabled can now return True when xformers is enabled. 2023-10-11 21:30:57 -04:00
20d3852aa1 Pull some small changes from the other repo. 2023-10-11 20:38:48 -04:00
ac7d8cfa87 Allow attn_mask in attention_pytorch. 2023-10-11 20:38:48 -04:00
1a4bd9e9a6 Refactor the attention functions.
There's no reason for the whole CrossAttention object to be repeated when
only the operation in the middle changes.
2023-10-11 20:38:48 -04:00
8cc75c64ff Let unet wrapper functions have .to attributes. 2023-10-11 01:34:38 -04:00
5e885bd9c8 Cleanup. 2023-10-10 21:46:53 -04:00
851bb87ca9 Merge branch 'taesd_safetensors' of https://github.com/mochiya98/ComfyUI 2023-10-10 21:42:35 -04:00
be903eb2e2 Add default CheckpointSave, CLIPSave and VAESave paths to model paths. 2023-10-10 01:25:47 -04:00
877553843f Add a CLIPSave node to save CLIP model weights. 2023-10-10 01:24:49 -04:00
9eb621c95a Supports TAESD models in safetensors format 2023-10-10 13:21:44 +09:00
d1a0abd40b Merge branch 'input-directory' of https://github.com/jn-jairo/ComfyUI 2023-10-09 01:53:29 -04:00
4308862ce0 Add a note to README about pytorch 3.12 not being supported. 2023-10-09 01:51:01 -04:00
7bb9f6b7e8 Add a VAESave node. 2023-10-09 01:42:15 -04:00
c16f5744e3 Fix SplitImageWithAlpha and JoinImageWithAlpha. 2023-10-08 15:52:10 -04:00
1f2f4eaa6f Fix bug when copying node with converted input. 2023-10-08 04:04:25 -04:00
69a824e9a4 Move _for_testing/custom_sampling nodes to sampling/custom_sampling. 2023-10-08 03:20:35 -04:00
a0b1d4f21d improve: image preview (#1683)
* improve image preview
- grid mode: align in rectangle instead of first image, show cell border
- individual mode: proper ratio handling

* improve: fix preview button position instead of relative

* improve: image preview - compact mode for same aspect ratio
2023-10-08 03:00:33 -04:00
1c5d6663fa Update standalone download link. 2023-10-07 16:13:35 -04:00
0986cc7c38 Fix issues with the packaging. 2023-10-07 11:57:32 -04:00
ae3e4e9ad8 access getConfig via a symbol so structuredClone works (#1677) 2023-10-06 16:48:30 -04:00
72188dffc3 load_checkpoint_guess_config can now optionally output the model. 2023-10-06 13:48:18 -04:00
5b828258f1 Merge branch 'widget-input-updates' of https://github.com/pythongosssss/ComfyUI 2023-10-06 12:51:08 -04:00
0134d7ab49 Generate update script with right settings. 2023-10-06 12:49:40 -04:00
d761eaa486 if the output type is an array, use combo 2023-10-06 17:47:46 +01:00
1497528de8 Fix workflow. 2023-10-06 10:43:12 -04:00
640d5080e5 Make xformers optional in packaging. 2023-10-06 10:29:52 -04:00
34b36e3207 More configurable workflows to package windows release. 2023-10-06 10:26:51 -04:00
6f464f801f Update nightly workflow to python 3.11.6 2023-10-06 03:32:00 -04:00
11b404766e Merge branch 'widget-input-updates' of https://github.com/pythongosssss/ComfyUI 2023-10-05 14:20:47 -04:00
b9b178b839 More cleanup of old type data
Fix connecting combos of same type from different types of node
2023-10-05 19:16:39 +01:00
80932ddf40 updated messages 2023-10-05 17:13:13 +01:00
48242be508 Update readme for pytorch 2.1 2023-10-05 08:25:15 -04:00
63e5fd1790 Option to input directory 2023-10-04 19:45:15 -03:00
0e763e880f JoinImageWithAlpha now works with any mask shape. 2023-10-04 15:54:34 -04:00
0b9246d9fa allow connecting numbers merging config 2023-10-04 20:48:55 +01:00
9212bea87c Change a few things in #1578. 2023-10-04 15:43:41 -04:00
214ca7197e Corrected joining images with alpha (for RGBA input), and checking scaling conditions 2023-10-04 19:04:52 +02:00
585fb0475b Adding default alpha when splitting RGB images 2023-10-04 19:04:52 +02:00
ece69bf28c Change channel type to MASK (reduced redundancy, increased usability) 2023-10-04 19:04:52 +02:00
d06cd2805d Added support for Porter-Duff image compositing 2023-10-04 19:04:48 +02:00
9bfec2bdbf Fix quality loss due to low precision 2023-10-04 15:40:59 +02:00
6fc7314393 support refreshing primitive combos
no longer uses combo list as type name
2023-10-03 20:19:12 +01:00
4103f7fad5 Merge branch 'fix/robust_object_info' of https://github.com/ltdrdata/ComfyUI 2023-10-03 11:14:58 -04:00
1f38de1fb3 If an error occurs while retrieving object_info, only the node that encountered the error should be handled as an exception, while the information for the other nodes should continue to be processed normally. 2023-10-03 18:30:38 +09:00
fe1e2dbe90 pytorch nightly is now ROCm 5.7 2023-10-03 00:01:49 -04:00
ec454c771b Refactor with code from comment of #1588 2023-10-02 17:26:59 -04:00
2ef459b1d4 Add VPScheduler node 2023-10-01 03:48:07 -04:00
8ab49dc0a4 DPMPP_SDE node. 2023-09-30 01:51:22 -04:00
213976f8c3 Add ExponentialScheduler and PolyexponentialScheduler nodes. 2023-09-29 09:05:30 -04:00
0f17993d05 fix: typo in extra sampler 2023-09-29 06:09:59 +03:00
1c8ae9dbb2 Allow GrowMask node to work with batches (for AnimateDiff) (#1623)
* Allow mask batches

This allows LatentCompositeMasked -node to work with AnimateDiff. I tried to keep old functionality too, unsure if it's correct, but both single mask and batch of masks seems to work with this change.

* Update nodes_mask.py
2023-09-28 22:01:19 -04:00
66756de100 Add SamplerDPMPP_2M_SDE node. 2023-09-28 21:56:23 -04:00
26b7372805 Fix SplitSigmas. 2023-09-28 01:11:22 -04:00
71713888c4 Print missing VAE keys. 2023-09-28 00:54:57 -04:00
76e0f8fc8f Add function to split sigmas. 2023-09-28 00:40:09 -04:00
2bf051fda8 Add a basic node to generate sigmas from scheduler. 2023-09-28 00:30:45 -04:00
d234ca558a Add missing samplers to KSamplerSelect. 2023-09-28 00:17:03 -04:00
1d7dfc07d5 Make add_noise in SamplerCustom a boolean. 2023-09-27 22:42:23 -04:00
1adcc4c3a2 Add a SamplerCustom Node.
This node takes a list of sigmas and a sampler object as input.

This lets people easily implement custom schedulers and samplers as nodes.

More nodes will be added to it in the future.
2023-09-27 22:21:18 -04:00
bf3fc2f1b7 Refactor sampling related code. 2023-09-27 16:45:22 -04:00
fff491b032 Model patches can now know which batch is positive and negative. 2023-09-27 12:04:07 -04:00
1d6dd83184 Scheduler code refactor. 2023-09-26 17:07:07 -04:00
446caf711c Sampling code refactor. 2023-09-26 13:45:15 -04:00
aeba1cc2a0 Merge branch 'chore/update-actions-versions' of https://github.com/M1kep/ComfyUI 2023-09-26 02:58:55 -04:00
9546a798fb Make LoadImage and LoadImageMask return masks in batch format. 2023-09-26 02:56:40 -04:00
1d36dfb9fe GrowMask now works with mask batches. 2023-09-26 02:53:57 -04:00
d76d71de3f GrowMask can now be used with negative numbers to erode it. 2023-09-26 02:45:31 -04:00
e0efa78b71 chore(CI): Update test-build to use updated version of actions 2023-09-25 21:20:51 -07:00
d2cec6cdbf Make mask functions work with batches of masks and images. 2023-09-25 16:19:37 -04:00
046b4fe0ee Support batches of masks in mask composite nodes. 2023-09-25 16:02:21 -04:00
ba7dfd60f2 Merge branch 'proportional-scale' of https://github.com/jn-jairo/ComfyUI 2023-09-25 12:39:53 -04:00
2381d36e6d 1024 wasn't enough. 2023-09-25 01:46:44 -04:00
42f6d1ebe2 Increase maximum batch sizes of empty image nodes. 2023-09-25 01:22:37 -04:00
f00471cdc8 Do FreeU fft on CPU if the device doesn't support fft functions. 2023-09-24 18:09:44 -04:00
77c124c5a1 Fix typo. 2023-09-24 13:27:57 -04:00
593b7069e7 Proportional scale latent and image 2023-09-24 12:08:54 -03:00
76cdc809bf Support more controlnet models. 2023-09-23 18:47:46 -04:00
05e661e5ef FreeU now works with the refiner. 2023-09-23 12:19:08 -04:00
ae87543653 Merge branch 'cast_intel' of https://github.com/simonlui/ComfyUI 2023-09-23 00:57:17 -04:00
fd93c759e2 Implement FreeU: Free Lunch in Diffusion U-Net node.
_for_testing->FreeU
2023-09-23 00:56:09 -04:00
eec449ca8e Allow Intel GPUs to LoRA cast on GPU since it supports BF16 natively. 2023-09-22 21:11:27 -07:00
afa2399f79 Add a way to set output block patches to modify the h and hsp. 2023-09-22 20:26:47 -04:00
29ccf9f471 Fix typo. 2023-09-22 01:33:46 -04:00
422d16c027 Add some nodes to add, subtract and multiply latents. 2023-09-21 22:23:01 -04:00
492db2de8d Allow having a different pooled output for each image in a batch. 2023-09-21 01:14:42 -04:00
0793eb9269 Only clear clipboard when copying nodes. 2023-09-20 23:16:01 -04:00
4d41bd595c Fix loading group titles. 2023-09-20 21:46:41 -04:00
1122df1a20 Increase range of lora strengths. 2023-09-20 17:58:54 -04:00
1cdfb3dba4 Only do the cast on the device if the device supports it. 2023-09-20 17:52:41 -04:00
b92a86d737 Update litegraph to upstream. 2023-09-20 13:24:08 -04:00
f895260e5e Merge branch 'escape-glob' of https://github.com/seanlynch/ComfyUI 2023-09-19 13:13:40 -04:00
7c9a92f552 Don't depend on torchvision. 2023-09-19 13:12:47 -04:00
8321592408 Escape paths when passing them to globs
Try to prevent JS search from breaking on pathnames with square
brackets.
2023-09-19 08:18:29 -04:00
2b6b178173 Added support for lanczos scaling 2023-09-19 10:40:38 +02:00
6d3dee9d16 Clean up #1541. 2023-09-18 23:33:52 -04:00
f32463936d Unhardcode sampler and scheduler list in test. 2023-09-18 23:24:14 -04:00
7c93afd2cd Manual float precision, toggle for old behavior (#1541)
* Add toggle for float rounding

* Add manual precision override
2023-09-18 23:20:00 -04:00
26cd8405dd Ci quality workflows (#1423)
* Add inference tests

* Clean up

* Rename test graph file

* Add readme for tests

* Separate server fixture

* test file name change

* Assert images are generated

* Clean up comments

* Add __init__.py so tests can run with command line `pytest`

* Fix command line args for pytest

* Loop all samplers/schedulers in test_inference.py

* Ci quality workflows compare (#1)

* Add image comparison tests

* Comparison tests do not pass with empty metadata

* Ensure tests are run in correct order

* Save image files  with test name

* Update tests readme

* Reduce step counts in tests to ~halve runtime

* Ci quality workflows build (#2)

* Add build test github workflow
2023-09-18 23:18:06 -04:00
b92bf8196e Do lora cast on GPU instead of CPU for higher performance. 2023-09-18 23:04:49 -04:00
0109431626 Lower the minimum resolution of EmptyLatentImage. 2023-09-18 16:20:03 -04:00
db63aa7e53 Nodes can now control the rounding in the UI. 2023-09-17 12:49:06 -04:00
321c5fa295 Enable pytorch attention by default on xpu. 2023-09-17 04:09:19 -04:00
0665749b1a Move ModelSubtract and ModelAdd to advanced/model_merging 2023-09-17 02:10:06 -04:00
d6d9b83447 Merge branch 'fix/preview_ratio' of https://github.com/ltdrdata/ComfyUI 2023-09-16 15:43:42 -04:00
61b1f67734 Support models without previews. 2023-09-16 12:59:54 -04:00
4d5e057bb2 fix indent 2023-09-16 20:37:42 +09:00
69680fede7 fix: thumbnail ratio fix for mixed ratio images 2023-09-16 20:36:00 +09:00
43d4935a1d Add cond_or_uncond array to transformer_options so hooks can check what is
cond and what is uncond.
2023-09-15 22:21:14 -04:00
415abb275f Add DDPM sampler. 2023-09-15 19:22:47 -04:00
099226015e Merge branch 'Fix-structuredClone-error-with-early-chrome-version-browser' of https://github.com/KarryCharon/ComfyUI 2023-09-15 15:48:22 -04:00
94e4fe39d8 This isn't used anywhere. 2023-09-15 12:03:03 -04:00
076f3e6310 fix structuredClone undefined error; 2023-09-15 16:37:58 +08:00
44361f6344 Support for text encoder models that need attention_mask. 2023-09-15 02:02:05 -04:00
0d8f376446 Set last layer on SD2.x models uses the proper indexes now.
Before I had made the last layer the penultimate layer because some
checkpoints don't have them but it's not consistent with the others models.

TLDR: for SD2.x models only: CLIPSetLastLayer -1 is now -2.
2023-09-14 20:28:22 -04:00
0966d3ce82 Don't run text encoders on xpu because there are issues. 2023-09-14 12:16:07 -04:00
0e4395a8a3 Allow pasting nodes with connections in firefox 2023-09-13 18:42:44 +01:00
3039b08eb1 Only parse command line args when main.py is called. 2023-09-13 11:38:20 -04:00
30de95e4b4 Add some nodes to subtract and add model weights. 2023-09-13 01:10:31 -04:00
0b829fe35b .gitignore refactor. 2023-09-12 18:44:05 -04:00
ed58730658 Don't leave very large hidden states in the clip vision output. 2023-09-12 15:09:10 -04:00
fb3b728203 Fix issue where autocast fp32 CLIP gave different results from regular. 2023-09-11 21:49:56 -04:00
7d401ed1d0 Add ldm format support to UNETLoader. 2023-09-11 16:36:50 -04:00
9562a6b49e Fix a few clipboard issues. 2023-09-10 11:19:31 -04:00
d4b2bc0964 Merge branch 'master' of https://github.com/miabrahams/ComfyUI 2023-09-10 10:15:02 -04:00
122fd5d37f Merge branch 'add-defaultInput' of https://github.com/chrisgoringe/ComfyUI 2023-09-10 03:18:05 -04:00
7df822212f Allow checkpoints with .pt and .bin extensions. 2023-09-10 02:36:04 -04:00
07691e80c3 Does it make sense to allow configuring the round and precision? 2023-09-09 03:15:31 -04:00
5c8b7ea03c Merge branch 'round-float-widgets' of https://github.com/chrisgoringe/ComfyUI 2023-09-09 03:07:57 -04:00
7372255e49 Specify the precision and rounding based on step 2023-09-09 15:21:38 +10:00
cc2fa311dd fix(server): Disable access logs 2023-09-08 21:11:53 -07:00
e85be36bd2 Add a penultimate_hidden_states to the clip vision output. 2023-09-08 14:06:58 -04:00
10de64af7f Google doesn't want people to use ComfyUI on colab anymore. 2023-09-08 14:02:03 -04:00
264867bf87 Clear clipboard on copy 2023-09-08 12:42:13 -04:00
1e6b67101c Support diffusers format t2i adapters. 2023-09-08 11:36:51 -04:00
3ebe6b539a round float widgets (by default to 0.001) 2023-09-08 20:40:27 +10:00
ff962098fd Fixed Load Image preview not displaying some files (issue #1158) 2023-09-08 08:43:17 +02:00
0782ac2a96 defaultInput 2023-09-08 14:53:59 +10:00
326577d04c Allow cancelling of everything with a progress bar. 2023-09-07 23:37:03 -04:00
9261587d89 Small refactor. 2023-09-07 18:14:30 -04:00
d6d1a8998f Properly check upload filename for directory transversal. 2023-09-07 18:06:22 -04:00
e464fa8f04 Merge branch 'fix-validate' of https://github.com/pythongosssss/ComfyUI 2023-09-07 15:15:52 -04:00
62799c8585 fix crash on node with VALIDATE_INPUTS and actual inputs 2023-09-07 18:42:21 +01:00
f65db2981b Merge branch 'description' of https://github.com/chrisgoringe/ComfyUI 2023-09-07 12:50:46 -04:00
8be46438be Support DiffBIR SwinIR models. 2023-09-07 03:31:43 -04:00
694c705f52 get class description 2023-09-07 12:22:39 +10:00
adb9eb94b0 Send class description if any 2023-09-07 12:22:39 +10:00
cb080e771e Lower refresh timeout for search in litegraph. 2023-09-06 16:18:02 -04:00
f88f7f413a Add a ConditioningSetAreaPercentage node. 2023-09-06 03:28:27 -04:00
21a563d385 Remove prints. 2023-09-05 23:46:37 -04:00
eb2349822b Merge branch 'folder_paths_ignore_git' of https://github.com/M1kep/ComfyUI 2023-09-05 23:37:22 -04:00
bc1f6e2185 fix(ui/widgets): Only set widget forceInput option if a widget is added 2023-09-05 15:06:46 -07:00
f368e5ac7d Don't paste nodes when target is a textarea or a text box. 2023-09-05 01:22:26 -04:00
3e00fa4332 feat: Exclude .git when retrieving filename lists
In the future could support user provided excluded dirs via config file
2023-09-04 17:50:32 -07:00
d196847079 feat: Add support for excluded_dirs to folder_paths.recursive_search
Refactored variable names to better match what they represent
2023-09-04 17:50:32 -07:00
2d9d3ca38b Merge branch 'master' of https://github.com/miabrahams/ComfyUI 2023-09-04 14:51:19 -04:00
1938f5c5fe Add a force argument to soft_empty_cache to force a cache empty. 2023-09-04 00:58:18 -04:00
7746bdf7b0 Merge branch 'generalize_fixes' of https://github.com/simonlui/ComfyUI 2023-09-04 00:43:11 -04:00
2419901e6c Merge branch 'addOnExecutionStart' of https://github.com/chrisgoringe/ComfyUI 2023-09-03 16:59:41 -04:00
6f70227b8c Add support for pasting images into the graph
It can be useful to paste images from the clipboard directly into the node graph.
This commit modifies copy and paste handling to support this.

When an image file is found in the clipboard, we check whether an image node is selected.
If so, paste the image into that node. Otherwise, a new node is created.
If no image data are found in the clipboard, we call the original Litegraph paste.
To ensure that onCopy and onPaste events are fired, we override Litegraph's ctrl+c and ctrl+v handling.

Try to detect whether the pasted image is a real file on disk, or just pixel data copied from e.g. Photoshop.
Pasted pixel data will be called 'image.png' and have a creation time of now.
If it is simply pasted data, we store it in the subfolder /input/clipboard/.

This also adds support for the subfolder property in the IMAGEUPLOAD widget.
2023-09-03 12:08:04 -04:00
2da73b7073 Revert changes in comfy/ldm/modules/diffusionmodules/util.py, which is unused. 2023-09-02 20:07:52 -07:00
a74c5dbf37 Move some functions to utils.py 2023-09-02 22:33:37 -04:00
766c7b3815 Update upscale model code to latest Chainner model code.
Don't add SRFormer because the code license is incompatible with the GPL.

Remove MAT because it's unused and the license is incompatible with GPL.
2023-09-02 22:27:40 -04:00
4a0c4ce4ef Some fixes to generalize CUDA specific functionality to Intel or other GPUs. 2023-09-02 18:22:10 -07:00
dfd6489c96 onExecutionStart 2023-09-03 07:53:02 +10:00
62efc78a4b Display history in reverse order to make it easier to load last gen. 2023-09-02 15:49:16 -04:00
6962cb46a9 Fix issue when node_input is undefined. 2023-09-02 12:17:30 -04:00
7291e303f6 Fix issue with some workflows not getting serialized. 2023-09-02 11:48:44 -04:00
77a176f9e0 Use common function to reshape batch to. 2023-09-02 03:42:49 -04:00
36ea8784a8 Only return tuple of 3 args in CheckpointLoaderSimple. 2023-09-02 03:34:57 -04:00
7891d13329 Added label for autoQueueCheckbox. (#1295)
* Added label for autoQueueCheckbox.

* Menu gets behind of some custom nodes.

* Edited extraOptions.
Options divided in to different divs to manage them with ease.
2023-09-02 02:58:23 -04:00
7931ff0fd9 Support SDXL inpaint models. 2023-09-01 15:22:52 -04:00
c335fdf200 Merge branch 'pixelass-patch-1' of https://github.com/pixelass/ComfyUI 2023-09-01 11:48:11 -04:00
43f2505389 Merge branch 'fix/widget-wonkyness' of https://github.com/M1kep/ComfyUI 2023-09-01 03:07:10 -04:00
0e3b641172 Remove xformers related print. 2023-09-01 02:12:03 -04:00
5c363a9d86 Fix controlnet bug. 2023-09-01 02:01:08 -04:00
69c5e6de85 fix(widgets): Add options object if not present when forceInput: true 2023-08-31 17:58:43 -07:00
9a7a52f8b5 refactor/fix: Treat forceInput widgets as standard widgets 2023-08-31 17:58:43 -07:00
cfe1c54de8 Fix controlnet issue. 2023-08-31 15:16:58 -04:00
57beace324 Fix VAEDecodeTiled minimum. 2023-08-31 14:26:16 -04:00
1c012d69af It doesn't make sense for c_crossattn and c_concat to be lists. 2023-08-31 13:25:00 -04:00
5f101f4da1 Update litegraph with upstream: middle mouse dragging. 2023-08-31 02:39:34 -04:00
2cd3980199 Remove forced lowercase on embeddings endpoint 2023-08-30 20:48:55 +02:00
7e941f9f24 Clean up DiffusersLoader node. 2023-08-30 12:57:07 -04:00
18617967e5 Fix error message in model_patcher.py
Found while tinkering.
2023-08-30 00:25:04 -07:00
fe4c07400c Fix "Load Checkpoint with config" node. 2023-08-29 23:58:32 -04:00
d70b0bc43c Use the GPU for the canny preprocessor when available. 2023-08-29 17:58:40 -04:00
81d9200e18 Add node to convert a specific colour in an image to a mask. 2023-08-29 17:55:42 -04:00
f2f5e5dcbb Support SDXL t2i adapters with 3 channel input. 2023-08-29 16:44:57 -04:00
15adc3699f Move beta_schedule to model_config and allow disabling unet creation. 2023-08-29 14:22:53 -04:00
968078b149 Merge branch 'feat/mute_bypass_nodes_in_group' of https://github.com/M1kep/ComfyUI 2023-08-29 11:33:40 -04:00
66c690e698 Merge branch 'preserve-pnginfo' of https://github.com/chrisgoringe/ComfyUI 2023-08-29 11:32:58 -04:00
bed116a1f9 Remove optimization that caused border. 2023-08-29 11:21:36 -04:00
18379dea36 check for text attr and save 2023-08-29 18:50:28 +10:00
edcff9ab8a copy metadata into modified image 2023-08-29 18:50:28 +10:00
6944288aff refactor(ui): Switch statement, and handle other modes in group actions 2023-08-29 00:24:31 -07:00
e30d546e38 feat(ui): Add node mode toggles to group context menu 2023-08-28 23:49:25 -07:00
8ddd081b09 Use the same units for tile size in VAEDecodeTiled and VAEEncodeTiled. 2023-08-29 01:51:35 -04:00
fbf375f161 Merge branch 'master' of https://github.com/bvhari/ComfyUI 2023-08-29 01:42:00 -04:00
65cae62c71 No need to check filename extensions to detect shuffle controlnet. 2023-08-28 16:49:06 -04:00
4e89b2c25a Put clip vision outputs on the CPU. 2023-08-28 16:26:11 -04:00
a094b45c93 Load clipvision model to GPU for faster performance. 2023-08-28 15:29:27 -04:00
1300a1bb4c Text encoder should initially load on the offload_device not the regular. 2023-08-28 15:08:45 -04:00
f92074b84f Move ModelPatcher to model_patcher.py 2023-08-28 14:51:31 -04:00
BVH
d86b222fe9 Reduce min tile size for encode 2023-08-28 22:39:09 +05:30
4798cf5a62 Implement loras with norm keys. 2023-08-28 11:20:06 -04:00
BVH
9196588088 Make tile size in Tiled VAE encode/decode user configurable 2023-08-28 19:57:22 +05:30
0faee1186f support on prompt event handler (#765)
Co-authored-by: Lt.Dr.Data <lt.dr.data@gmail.com>
2023-08-28 00:52:22 -04:00
b8c7c770d3 Enable bf16-vae by default on ampere and up. 2023-08-27 23:06:19 -04:00
1c794a2161 Fallback to slice attention if xformers doesn't support the operation. 2023-08-27 22:24:42 -04:00
d935ba50c4 Make --bf16-vae work on torch 2.0 2023-08-27 21:33:53 -04:00
412596d325 Merge branch 'increase_client_max_size' of https://github.com/ramyma/ComfyUI 2023-08-27 13:12:39 -04:00
d9f4922993 fix: cannot disable dynamicPrompts (#1327)
* fix: cannot disable dynamicPrompts

* indent fix

---------

Co-authored-by: Lt.Dr.Data <lt.dr.data@gmail.com>
2023-08-27 12:34:24 -04:00
0b6cf7a558 Increase client_max_size to allow bigger request bodies 2023-08-26 19:48:20 +03:00
a57b0c797b Fix lowvram model merging. 2023-08-26 11:52:07 -04:00
f72780a7e3 The new smart memory management makes this unnecessary. 2023-08-25 18:02:15 -04:00
c77f02e1c6 Move controlnet code to comfy/controlnet.py 2023-08-25 17:33:04 -04:00
15a7716fa6 Move lora code to comfy/lora.py 2023-08-25 17:11:51 -04:00
ec96f6d03a Move text_projection to base clip model. 2023-08-24 23:43:48 -04:00
30eb92c3cb Code cleanups. 2023-08-24 19:39:18 -04:00
51dde87e97 Try to free enough vram for control lora inference. 2023-08-24 17:20:54 -04:00
e3d0a9a490 Fix potential issue with text projection matrix multiplication. 2023-08-24 00:54:16 -04:00
cc44ade79e Always shift text encoder to GPU when the device supports fp16. 2023-08-23 21:45:00 -04:00
a6ef08a46a Even with forced fp16 the cpu device should never use it. 2023-08-23 21:38:28 -04:00
00c0b2c507 Initialize text encoder to target dtype. 2023-08-23 21:01:15 -04:00
f081017c1a Save memory by storing text encoder weights in fp16 in most situations.
Do inference in fp32 to make sure quality stays the exact same.
2023-08-23 01:08:51 -04:00
d7b3b0f8c1 Don't hardcode node names for image upload widget. 2023-08-22 19:41:49 -04:00
afcb9cb1df All resolutions now work with t2i adapter for SDXL. 2023-08-22 16:23:54 -04:00
85fde89d7f T2I adapter SDXL. 2023-08-22 14:40:43 -04:00
f2a7cc9121 Add control lora links to colab notebook. 2023-08-22 01:55:09 -04:00
e2256b4087 Add clip_vision_g download command to colab notebook for ReVision. 2023-08-22 01:44:31 -04:00
cf5ae46928 Controlnet/t2iadapter cleanup. 2023-08-22 01:06:26 -04:00
763b0cf024 Fix control lora not working in fp32. 2023-08-21 20:38:31 -04:00
bc76b3829f Merge branch 'custom-node-js' of https://github.com/pythongosssss/ComfyUI 2023-08-21 00:58:38 -04:00
199d73364a Fix ControlLora on lowvram. 2023-08-21 00:54:04 -04:00
d08e53de2e Remove autocast from controlnet code. 2023-08-20 21:47:32 -04:00
cdaf65ceb1 remove log 2023-08-20 20:01:25 +01:00
0d7b0a4dc7 Small cleanups. 2023-08-20 14:56:47 -04:00
9b1d5a587c Allow loading js extensions without copying to /web folder 2023-08-20 19:55:48 +01:00
9225465975 Further tuning and fix mem_free_total. 2023-08-20 14:19:53 -04:00
2c096e4260 Add ipex optimize and other enhancements for Intel GPUs based on recent memory changes. 2023-08-20 14:19:51 -04:00
8ee0473687 Merge branch 'parallel-extensions-load' of https://github.com/NoCrypt/ComfyUI 2023-08-20 14:14:01 -04:00
e9469e732d --disable-smart-memory now disables loading model directly to vram. 2023-08-20 04:00:53 -04:00
c9b562aed1 Free more memory before VAE encode/decode. 2023-08-19 12:13:13 -04:00
81ccacaa7c Make the extensions loads in parallel instead of waiting one by one 2023-08-19 17:36:13 +07:00
b80c3276dc Fix issue with gligen. 2023-08-18 16:32:23 -04:00
d6e4b342e6 Support for Control Loras.
Control loras are controlnets where some of the weights are stored in
"lora" format: an up and a down low rank matrice that when multiplied
together and added to the unet weight give the controlnet weight.

This allows a much smaller memory footprint depending on the rank of the
matrices.

These controlnets are used just like regular ones.
2023-08-18 11:59:51 -04:00
39ac856a33 ReVision support: unclip nodes can now be used with SDXL. 2023-08-18 11:59:36 -04:00
76d53c4622 Add support for clip g vision model to CLIPVisionLoader. 2023-08-18 11:13:29 -04:00
fc99fa56a9 Add node to scale image to a total amount of pixels keeping aspect. 2023-08-18 02:32:39 -04:00
eb5c991a8c Merge branch 'add-user-css' of https://github.com/pythongosssss/ComfyUI 2023-08-17 16:41:54 -04:00
bd7321c8ac Update aiohttp in nightly workflow. 2023-08-17 16:41:24 -04:00
e59fe0537a Fix referenced before assignment
For https://github.com/BlenderNeko/ComfyUI_TiledKSampler/issues/13
2023-08-17 22:30:07 +02:00
be9c5e25bc Fix issue with not freeing enough memory when sampling. 2023-08-17 15:59:56 -04:00
ac0758a1a4 Fix bug with lowvram and controlnet advanced node. 2023-08-17 13:38:51 -04:00
c28db1f315 Fix potential issues with patching models when saving checkpoints. 2023-08-17 11:07:08 -04:00
c828543a77 Allow user customizable css 2023-08-17 13:36:55 +01:00
1498f1a342 Merge branch 'add-growmask-node' of https://github.com/coreyryanhanson/ComfyUI 2023-08-17 03:21:20 -04:00
3aee33b54e Add --disable-smart-memory for those that want the old behaviour. 2023-08-17 03:12:37 -04:00
2be2742711 Fix issue with regular torch version. 2023-08-17 01:58:54 -04:00
89a0767abf Smarter memory management.
Try to keep models on the vram when possible.

Better lowvram mode for controlnets.
2023-08-17 01:06:34 -04:00
2c97c30256 Support small diffusers controlnet so both types are now supported. 2023-08-16 12:45:56 -04:00
53f326a3d8 Support diffusers mini controlnets. 2023-08-16 12:28:01 -04:00
58f0c616ed Fix clip vision issue with old transformers versions. 2023-08-16 11:36:22 -04:00
ae270f79bc Fix potential issue with batch size and clip vision. 2023-08-16 11:05:11 -04:00
18e86a4010 add a node to allow growing of masks through dilation 2023-08-16 10:57:14 -04:00
27b87c25a1 Add an EmptyImage node.
TODO: implement color picker in the frontend.
2023-08-15 17:53:10 -04:00
6dc02c7bac Add a "resize_source" option to Image and Latent CompositeMasked. 2023-08-15 17:51:52 -04:00
7567c4ac8f Add bypass to readme and add a Bypass menu option to the nodes. 2023-08-15 13:28:34 -04:00
a2ce9655ca Refactor unclip code. 2023-08-14 23:48:47 -04:00
94fceb8700 Make Blur node use the image device for processing. 2023-08-14 21:08:45 -04:00
e7d88855f4 Add node to batch images together. 2023-08-14 20:23:38 -04:00
d4380f3aa3 Add option to use different xformers version in the github workflow. 2023-08-14 18:13:11 -04:00
06681ee035 Add codeowners file. 2023-08-14 16:54:30 -04:00
9cc12c833d CLIPVisionEncode can now encode multiple images. 2023-08-14 16:54:05 -04:00
0cb6dac943 Remove 3m from PR #1213 because of some small issues. 2023-08-14 00:48:45 -04:00
e244b2df83 Add sgm_uniform scheduler that acts like the default one in sgm. 2023-08-14 00:29:03 -04:00
58c7da3665 Gpu variant of dpmpp_3m_sde. Note: use 3m with exponential or karras. 2023-08-14 00:28:50 -04:00
ba319a34e4 Merge branch 'dpmpp3m' of https://github.com/FizzleDorf/ComfyUI 2023-08-14 00:23:15 -04:00
3cfad03a68 dpmpp 3m + dpmpp 3m sde added 2023-08-13 22:29:04 -04:00
192ca0676c Add some more cards to the cuda malloc blacklist. 2023-08-13 16:08:11 -04:00
861fd58819 Add a warning if a card that doesn't support cuda malloc has it enabled. 2023-08-13 12:37:53 -04:00
585a062910 Print unet config when model isn't detected. 2023-08-13 01:39:48 -04:00
8c730dc4a7 Add an ImageCompositeMasked node. 2023-08-12 01:02:36 -04:00
c8a23ce9e8 Support for yet another lora type based on diffusers. 2023-08-11 13:04:21 -04:00
2bc12d3d22 Add --temp-directory argument to set temp directory. 2023-08-11 05:13:03 -04:00
00877b0363 Don't ignore extra paths that don't exist. 2023-08-11 02:41:04 -04:00
c20583286f Support diffuser text encoder loras. 2023-08-10 20:28:28 -04:00
f7e6a5ed07 Fix litegraph button being black on light theme. 2023-08-10 12:29:56 -04:00
cf10c5592c Disable calculating uncond when CFG is 1.0 2023-08-09 20:55:03 -04:00
5ac96897e9 Images can now be uploaded by dragging from another window in chromium. 2023-08-09 11:31:27 -04:00
af32197067 feat(extensions): Allow hiding link connectors
Thank you for adding this feature (linksRenderMode) to core. I would like to add the "Hidden" option (invalid number 3 will just hide the connector lines), so that I can remove that extension from my extension pack to prevent conflicts

https://github.com/failfa-st/failfast-comfyui-extensions
2023-08-09 13:03:30 +02:00
a5599ed42c Add missing direct dep that gets pulled in by another. 2023-08-08 10:45:35 -04:00
5e2b4893da Fix path issue. 2023-08-07 19:29:36 -04:00
285ea7b790 Add "display" to custom node example. 2023-08-07 08:29:50 -04:00
1f0f4cc0bd Add argument to disable auto launching the browser. 2023-08-07 02:25:12 -04:00
0ce8a540ce Update litegraph to latest. 2023-08-06 14:36:43 -04:00
d8e58f0a7e Detect hint_channels from controlnet. 2023-08-06 14:08:59 -04:00
0cb14a33f6 Fix issue with logging missing nodes. 2023-08-05 21:54:58 -04:00
fc71cf656e Add some 800M gpus to cuda malloc blacklist. 2023-08-05 21:54:52 -04:00
c9ef919e29 Formatting issue. 2023-08-05 17:20:35 -04:00
435577457a Add a way to use cloudflared tunnel to the colab notebook. 2023-08-05 17:18:45 -04:00
b948b2cf41 handle value missing 2023-08-05 11:04:04 +01:00
32e115b818 prevent crashing if the widget cant be found 2023-08-05 11:00:18 +01:00
c5d7593ccf Support loras in diffusers format. 2023-08-05 01:40:24 -04:00
5a90d3cea5 GeForce MX110 + MX130 are maxwell. 2023-08-04 21:44:37 -04:00
8918f1085c Add setting to change link render mode
Add support for combo settings
2023-08-04 21:26:11 +01:00
cb25b88329 Merge branch 'logging' of https://github.com/pythongosssss/ComfyUI 2023-08-04 12:12:39 -04:00
1ce0d8ad68 Add CMP 30HX card to the nvidia_16_series list. 2023-08-04 12:08:45 -04:00
3d614dde49 Fix bug with reroutes and bypass. 2023-08-04 03:47:45 -04:00
b2ea0cbd5c add logging 2023-08-04 08:30:01 +01:00
43ae9fe721 add system stats function 2023-08-04 08:29:51 +01:00
0bbd9dd4d9 add system info to stats endpoint 2023-08-04 08:29:25 +01:00
d7638c47fc Fix ui inconsistency. 2023-08-04 03:22:47 -04:00
fa962e86c1 Make LatentBlend more consistent with other nodes. 2023-08-04 02:51:28 -04:00
11ad6060fc Merge branch 'LatentBlend' of https://github.com/fuami/ComfyUI 2023-08-04 02:35:53 -04:00
c99d8002f8 Make sure the pooled output stays at the EOS token with added embeddings. 2023-08-03 20:27:50 -04:00
9534f0f8a5 allows convert to widget for boolean type (#1063) 2023-08-03 20:24:52 -04:00
d1347544bc Make context menu filter import from relative path. 2023-08-03 16:51:37 -04:00
077617e8c9 Fix bypassed nodes with no inputs. 2023-08-03 02:57:40 -04:00
19fbab6ce3 Fix reroute nodes not working with bypassed nodes. 2023-08-03 02:38:11 -04:00
05321fd947 Add an experimental CTRL-B shortcut to bypass nodes. 2023-08-03 01:57:00 -04:00
9ccc965899 Merge branch 'fix/no-required-input' of https://github.com/M1kep/ComfyUI into prs 2023-08-02 15:06:09 -04:00
e4a3e9e54c Add an option in the UI to disable sliders. 2023-08-01 18:50:06 -04:00
90b0163524 fix(execution): Fix support for input-less nodes 2023-08-01 12:29:01 -07:00
7785d073f0 chore: Fix typo 2023-08-01 12:27:50 -07:00
834ab278d2 Update instructions for mac. 2023-08-01 03:17:04 -04:00
38cfba0430 Rename toggle to boolean. 2023-08-01 03:08:35 -04:00
d712193885 Add LatentBlend node, allowing for blending between two Latent inputs. 2023-08-01 01:23:14 -05:00
eb5191f911 0.0.0.0 doesn't work on windows. 2023-08-01 01:15:18 -04:00
076d2db60f display_as -> display. 2023-07-31 22:41:54 -04:00
730a5d170f Merge branch 'slider_toggle' of https://github.com/Guillaume-Fgt/ComfyUI into prs 2023-07-31 15:24:09 -04:00
41cf43f89e Merge branch 'SaveLatent_outputs' of https://github.com/fuami/ComfyUI 2023-07-31 15:23:02 -04:00
6cdc9afc7c pass slider type as option 2023-07-31 08:48:44 +02:00
4a77fcd6ab Only shift text encoder to vram when CPU cores are under 8. 2023-07-31 00:08:54 -04:00
3dcad78fe1 SaveLatent reports its outputs so they are visible to API 2023-07-30 16:36:55 -05:00
3cd31d0e24 Lower CPU thread check for running the text encoder on the CPU vs GPU. 2023-07-30 17:18:24 -04:00
2b13939044 Remove some useless code. 2023-07-30 14:13:33 -04:00
95d796fc85 Faster VAE loading. 2023-07-29 16:28:30 -04:00
4b957a0010 Initialize the unet directly on the target device. 2023-07-29 14:51:56 -04:00
ad5866b02b Fix ROCm nightly install command. 2023-07-29 14:48:29 -04:00
d3d9ad00d8 added slider and toggle widget 2023-07-29 14:48:00 +02:00
c910b4a01c Remove unused code and torchdiffeq dependency. 2023-07-28 21:32:27 -04:00
1141029a4a Add --disable-metadata argument to disable saving metadata in files. 2023-07-28 12:31:41 -04:00
fbf5c51c1c Merge branch 'fix_batch_timesteps' of https://github.com/asagi4/ComfyUI 2023-07-27 16:13:48 -04:00
68be24eead Remove some prints. 2023-07-27 16:12:43 -04:00
1ea4d84691 Fix timestep ranges when batch_size > 1 2023-07-27 21:14:09 +03:00
4ab75d9cb8 Update colab notebook with SDXL links. 2023-07-26 21:50:44 -04:00
5379051d16 Fix diffusers VAE loading. 2023-07-26 18:26:39 -04:00
00da9b3268 Merge branch 'fix/types' of https://github.com/melMass/ComfyUI 2023-07-26 01:55:55 -04:00
5e3ac1928a Implement modelspec metadata in CheckpointSave for SDXL and refiner. 2023-07-25 22:02:34 -04:00
727588d076 Fix some new loras. 2023-07-25 16:39:15 -04:00
315ba30c81 Update nightly ROCm pytorch command in readme to 5.6 2023-07-25 15:48:26 -04:00
4f9b6f39d1 Fix potential issue with Save Checkpoint. 2023-07-25 00:45:20 -04:00
7c0a5a3e0e Disable cuda malloc on a bunch of quadro cards. 2023-07-25 00:09:01 -04:00
a51f33ee49 Use bigger tiles when upscaling with model and fallback on OOM. 2023-07-24 19:47:32 -04:00
5f75d784a1 Start is now 0.0 and end is now 1.0 for the timestep ranges. 2023-07-24 18:38:17 -04:00
7ff14b62f8 ControlNetApplyAdvanced can now define when controlnet gets applied. 2023-07-24 17:50:49 -04:00
d191c4f9ed Add a ControlNetApplyAdvanced node.
The controlnet can be applied to the positive or negative prompt only by
connecting it correctly.
2023-07-24 13:35:20 -04:00
0240946ecf Add a way to set which range of timesteps the cond gets applied to. 2023-07-24 09:25:02 -04:00
30de083dd0 Disable cuda malloc on all the 9xx series. 2023-07-23 13:29:14 -04:00
22f29d66ca Try to fix memory issue with lora. 2023-07-22 21:38:56 -04:00
67be7eb81d Nodes can now patch the unet function. 2023-07-22 17:01:12 -04:00
12a6e93171 Del the right object when applying lora. 2023-07-22 11:25:49 -04:00
85a8900a14 Disable cuda malloc on regular GTX 960. 2023-07-22 11:05:33 -04:00
78e7958d17 Support controlnet in diffusers format. 2023-07-21 22:58:16 -04:00
09386a3697 Fix issue with lora in some cases when combined with model merging. 2023-07-21 21:27:27 -04:00
58b2364f58 Properly support SDXL diffusers unet with UNETLoader node. 2023-07-21 14:38:56 -04:00
5190aa284d fix: ️ small type fix
getCustomWidgets expects a plain record and not an array of records
2023-07-21 13:19:05 +02:00
0115018695 Print errors and continue when lora weights are not compatible. 2023-07-20 19:56:22 -04:00
4760c29380 Merge branch 'fix-AttributeError-module-'torch'-has-no-attribute-'mps'' of https://github.com/KarryCharon/ComfyUI 2023-07-20 00:34:54 -04:00
ccb6b70de1 Move image encoding outside of sampling loop for better preview perf. 2023-07-19 18:06:58 -04:00
39c58b227f Disable cuda malloc on GTX 750 Ti. 2023-07-19 15:14:10 -04:00
d5c0765f4e Update how to get the prompt in api format in the example. 2023-07-19 15:07:12 -04:00
799c08a4ce Auto disable cuda malloc on some GPUs on windows. 2023-07-19 14:43:55 -04:00
0b284f650b Fix typo. 2023-07-19 10:20:32 -04:00
e032ca6138 Fix ddim issue with older torch versions. 2023-07-19 10:16:00 -04:00
18885f803a Add MX450 and MX550 to list of cards with broken fp16. 2023-07-19 03:08:30 -04:00
9ba440995a It's actually possible to torch.compile the unet now. 2023-07-18 21:36:35 -04:00
51d5477579 Add key to indicate checkpoint is v_prediction when saving. 2023-07-18 00:25:53 -04:00
ff6b047a74 Fix device print on old torch version. 2023-07-17 15:18:58 -04:00
9871a15cf9 Enable --cuda-malloc by default on torch 2.0 and up.
Add --disable-cuda-malloc to disable it.
2023-07-17 15:12:10 -04:00
55d0fca9fa --windows-standalone-build now enables --cuda-malloc 2023-07-17 14:10:36 -04:00
1679abd86d Add a command line argument to enable backend:cudaMallocAsync 2023-07-17 11:00:14 -04:00
3a150bad15 Only calculate randn in some samplers when it's actually being used. 2023-07-17 10:11:08 -04:00
ee8f8ee07f Fix regression with ddim and uni_pc when batch size > 1. 2023-07-17 09:35:19 -04:00
3ded1a3a04 Refactor of sampler code to deal more easily with different model types. 2023-07-17 01:22:12 -04:00
ac9c038ac2 Merge branch 'master' of https://github.com/ComfyUI-Community/ComfyUI 2023-07-16 03:04:45 -04:00
5f57362613 Lower lora ram usage when in normal vram mode. 2023-07-16 02:59:04 -04:00
a8f3bbc35d Patch del self.loaded_lora to prevent error with persistent lora_name swapping 2023-07-15 17:11:12 -07:00
490771b7f4 Speed up lora loading a bit. 2023-07-15 13:25:22 -04:00
50b1180dde Fix CLIPSetLastLayer not reverting when removed. 2023-07-15 01:41:21 -04:00
6fb084f39d Reduce floating point rounding errors in loras. 2023-07-15 00:53:00 -04:00
91ed2815d5 Add a node to merge CLIP models. 2023-07-14 02:41:18 -04:00
907c9fbf0d Refactor to make it easier to set the api path. 2023-07-14 00:50:49 -04:00
30ea187160 Merge branch 'use-relative-paths' of https://github.com/mcmonkey4eva/ComfyUI 2023-07-13 23:56:29 -04:00
eed3042830 Move conditioning concat node to conditioning section. 2023-07-13 21:44:56 -04:00
8a577966c5 Enables a way to save workflows in api format in frontend.
Enable the dev mode in the settings to see it.
2023-07-13 21:08:54 -04:00
bdba394290 Add a canny preprocessor node. 2023-07-13 13:26:48 -04:00
6f914fb77d Print prestartup times for custom nodes. 2023-07-13 13:01:45 -04:00
3bc8be33e4 Don't let custom nodes overwrite base nodes. 2023-07-13 12:56:38 -04:00
876dadca84 Highlight nodes with errors in red even when workflow works fine. 2023-07-13 10:07:50 -04:00
b2f03164c7 Prevent the clip_g position_ids key from being saved in the checkpoint.
This is to make it match the official checkpoint.
2023-07-12 20:15:02 -04:00
46dc050c9f Fix potential tensors being on different devices issues. 2023-07-12 19:29:27 -04:00
90aa597099 Add back roundRect to fix issue on firefox ESR. 2023-07-12 02:07:48 -04:00
3e2309f149 fix mps miss import 2023-07-12 10:06:34 +08:00
f4b9390623 Add a random string to the temp prefix for PreviewImage. 2023-07-11 17:35:55 -04:00
2b2a1474f7 Move to litegraph. 2023-07-11 03:12:00 -04:00
cef30cc6b6 Merge branch 'hidpi-canvas' of https://github.com/EHfive/ComfyUI 2023-07-11 03:04:10 -04:00
880c9b928b Update litegraph to latest. 2023-07-11 03:00:52 -04:00
05e6eac7b3 Scale graph canvas based on DPI factor
Similar to fixes in litegraph.js editor demo:
3ef215cf11/editor/js/code.js (L19-L28)

Also workarounds to address viewpoint problem of lightgrapgh.js in DPI scaling scenario.

Fixes #161
2023-07-11 14:47:58 +08:00
99abcbef41 feat/startup-script: Feature to avoid package installation errors when installing custom nodes. (#856)
* support startup script for installation without locking on windows

* modified: Instead of executing scripts from the startup-scripts directory, I will change it to execute the prestartup_script.py for each custom node.
2023-07-11 02:33:21 -04:00
606a537090 Support SDXL embedding format with 2 CLIP. 2023-07-10 10:34:59 -04:00
5797ff89b0 use relative paths for all web connections
This enables local reverse-proxies to host ComfyUI on a path, eg "http://example.com/ComfyUI/" in such a way that at least everything I tested works. Without this patch, proxying ComfyUI in this way will yield errors.
2023-07-10 02:09:03 -07:00
6ad0a6d7e2 Don't patch weights when multiplier is zero. 2023-07-09 17:46:56 -04:00
af15add967 Fix annoyance with textbox unselecting in chromium. 2023-07-09 15:41:19 -04:00
d5323d16e0 latent2rgb matrix for SDXL. 2023-07-09 13:59:09 -04:00
0ae81c03bb Empty cache after model unloading for normal vram and lower. 2023-07-09 09:56:03 -04:00
d3f5998218 Support loading clip_g from diffusers in CLIP Loader nodes. 2023-07-09 09:33:53 -04:00
a9a4ba7574 Fix merging not working when model2 of model merge node was a merge. 2023-07-08 22:31:10 -04:00
febea8c101 Merge branch 'bugfix/img-offset' of https://github.com/ltdrdata/ComfyUI 2023-07-08 03:45:37 -04:00
9caab9380d fix: Image.ANTIALIAS is no longer available. (#847)
* modify deprecated api call

* prevent breaking old Pillow users

* change LANCZOS to BILINEAR
2023-07-08 02:36:48 -04:00
d43cff2105 bugfix: image widget's was mis-aligned when node has multiline widget 2023-07-08 01:42:33 +09:00
c2d407b0f7 Merge branch 'Yaruze66-patch-1' of https://github.com/Yaruze66/ComfyUI 2023-07-07 01:55:10 -04:00
bb5fbd29e9 Merge branch 'condmask-fix' of https://github.com/vmedea/ComfyUI 2023-07-07 01:52:25 -04:00
2c9d98f3e6 CLIPTextEncodeSDXL now works when prompts are of very different sizes. 2023-07-06 23:23:54 -04:00
e7bee85df8 Add arguments to run the VAE in fp16 or bf16 for testing. 2023-07-06 23:23:46 -04:00
f5232c4869 Fix 7z error when extracting package. 2023-07-06 04:18:36 -04:00
608fcc2591 Fix bug with weights when prompt is long. 2023-07-06 02:43:40 -04:00
ddc6f12ad5 Disable autocast in unet for increased speed. 2023-07-05 21:58:29 -04:00
603f02d613 Fix loras not working when loading checkpoint with config. 2023-07-05 19:42:24 -04:00
ccb1b25908 Add a conditioning concat node. 2023-07-05 17:40:22 -04:00
af7a49916b Support loading unet files in diffusers format. 2023-07-05 17:38:59 -04:00
e57cba4c61 Add gpu variations of the sde samplers that are less deterministic
but faster.
2023-07-05 01:39:38 -04:00
f81b192944 Add logit scale parameter so it's present when saving the checkpoint. 2023-07-04 23:01:28 -04:00
acf95191ff Properly support SDXL diffusers loras for unet. 2023-07-04 21:15:23 -04:00
c61a95f9f7 Fix size check for conditioning mask
The wrong dimensions were being checked, [1] and [2] are the image size.
not [2] and [3]. This results in an out-of-bounds error if one of them
actually matches.
2023-07-04 16:34:42 +02:00
8d694cc450 Fix issue with OSX. 2023-07-04 02:09:02 -04:00
c02f3baeaf Now the model merge blocks node will use the longest match. 2023-07-04 00:51:17 -04:00
3a09fac835 ConditioningAverage now also averages the pooled output. 2023-07-03 21:44:37 -04:00
d94ddd8548 Add text encode nodes to control the extra parameters in SDXL. 2023-07-03 19:11:36 -04:00
c3e96e637d Pass device to CLIP model. 2023-07-03 16:09:37 -04:00
5e6bc824aa Allow passing custom path to clip-g and clip-h. 2023-07-03 15:45:04 -04:00
dc9d1f31c8 Improvements for OSX. 2023-07-03 00:08:30 -04:00
9ae6ff65bc Update extra_model_paths.yaml.example: add RealESRGAN path 2023-07-02 22:59:55 +05:00
103c487a89 Cleanup. 2023-07-02 11:58:23 -04:00
ae948b42fa Add taesd weights to standalones. 2023-07-02 11:47:30 -04:00
2c4e0b49b7 Switch to fp16 on some cards when the model is too big. 2023-07-02 10:00:57 -04:00
6f3d9f52db Add a --force-fp16 argument to force fp16 for testing. 2023-07-01 22:42:35 -04:00
1c1b0e7299 --gpu-only now keeps the VAE on the device. 2023-07-01 15:22:40 -04:00
ce35d8c659 Lower latency by batching some text encoder inputs. 2023-07-01 15:07:39 -04:00
3b6fe51c1d Leave text_encoder on the CPU when it can handle it. 2023-07-01 14:38:51 -04:00
b6a60fa696 Try to keep text encoders loaded and patched to increase speed.
load_model_gpu() is now used with the text encoder models instead of just
the unet.
2023-07-01 13:28:07 -04:00
97ee230682 Make highvram and normalvram shift the text encoders to vram and back.
This is faster on big text encoder models than running it on the CPU.
2023-07-01 12:37:23 -04:00
fa1959e3ef Fix nightly packaging. 2023-07-01 01:31:03 -04:00
9f2986318f Move model merging nodes to advanced and add to readme. 2023-06-30 15:21:55 -04:00
5a9ddf94eb LoraLoader node now caches the lora file between executions. 2023-06-29 23:40:51 -04:00
6e9f28401f Persist node instances between executions instead of deleting them.
If the same node id with the same class exists between two executions the
same instance will be used.

This means you can now cache things in nodes for more efficiency.
2023-06-29 23:38:56 -04:00
9920367d3c Fix embeddings not working with --gpu-only 2023-06-29 20:43:06 -04:00
62db11683b Move unet to device right after loading on highvram mode. 2023-06-29 20:43:06 -04:00
e7ed507d3d Add link to 7z in README (#809)
* Add link to 7z in README

* Change 7z to 7-Zip
2023-06-29 04:09:59 -04:00
4376b125eb Remove useless code. 2023-06-29 00:26:33 -04:00
89120f1fbe This is unused but it should be 1280. 2023-06-28 18:04:23 -04:00
2c7c14de56 Support for SDXL text encoder lora. 2023-06-28 02:22:49 -04:00
fcef47f06e Fix bug. 2023-06-28 00:38:07 -04:00
2d880fec3a Add a node to zero out the cond to advanced/conditioning
The stability streamlit example passes a zero cond as the negative input
so using this for the negative input makes outputs match the streamlit.
2023-06-27 23:30:52 -04:00
50abf7c938 Merge branch 'patch-1' of https://github.com/jjangga0214/ComfyUI 2023-06-27 01:42:16 -04:00
8248babd44 Use pytorch attention by default on nvidia when xformers isn't present.
Add a new argument --use-quad-cross-attention
2023-06-26 13:03:44 -04:00
9b93b920be Add CheckpointSave node to save checkpoints.
The created checkpoints contain workflow metadata that can be loaded by
dragging them on top of the UI or loading them with the "Load" button.

Checkpoints will be saved in fp16 or fp32 depending on the format ComfyUI
is using for inference on your hardware. To force fp32 use: --force-fp32

Anything that patches the model weights like merging or loras will be
saved.

The output directory is currently set to: output/checkpoints but that might
change in the future.
2023-06-26 12:22:27 -04:00
b72a7a835a Support loras based on the stability unet implementation. 2023-06-26 02:56:11 -04:00
c71a7e6b20 Fix ddim + inpainting not working. 2023-06-26 00:48:48 -04:00
530e408ab8 docs(extra model paths): add LyCORIS path 2023-06-25 20:11:28 +09:00
4eab00e14b Set the seed in the SDE samplers to make them more reproducible. 2023-06-25 03:04:57 -04:00
cef6aa62b2 Add support for TAESD decoder for SDXL. 2023-06-25 02:38:14 -04:00
20f579d91d Add DualClipLoader to load clip models for SDXL.
Update LoadClip to load clip models for SDXL refiner.
2023-06-25 01:40:38 -04:00
b7933960bb Fix CLIPLoader node. 2023-06-24 13:56:46 -04:00
78d8035f73 Fix bug with controlnet. 2023-06-24 11:02:38 -04:00
c9f5d5b2e1 optimize: support preview mode for mask editor. (#755)
* support preview mode for mask editor.
* use original file reference instead of loaded frontend blob

bugfix:
* prevent file open dialog when save to load image

* bugfix: cannot clear previous mask painted image's alpha

* bugfix

* bugfix

---------

Co-authored-by: Lt.Dr.Data <lt.dr.data@gmail.com>
2023-06-24 03:45:41 -04:00
05676942b7 Add some more transformer hooks and move tomesd to comfy_extras.
Tomesd now uses q instead of x to decide which tokens to merge because
it seems to give better results.
2023-06-24 03:30:22 -04:00
fa28d7334b Remove useless code. 2023-06-23 12:35:26 -04:00
8607c2d42d Move latent scale factor from VAE to model. 2023-06-23 02:33:31 -04:00
30a3861946 Fix bug when yaml config has no clip params. 2023-06-23 01:12:59 -04:00
9e37f4c7d5 Fix error with ClipVision loader node. 2023-06-23 01:08:05 -04:00
3e0686ce94 Add SDXL support to readme and improve the Running section. 2023-06-22 19:33:48 -04:00
7573897a3e Merge branch 'master' of https://github.com/VladislavNekto/ComfyUI 2023-06-22 19:28:18 -04:00
9f83b098c9 Don't merge weights when shapes don't match and print a warning. 2023-06-22 19:08:31 -04:00
f87ec10a97 Support base SDXL and SDXL refiner models.
Large refactor of the model detection and loading code.
2023-06-22 13:03:50 -04:00
ca485d2328 Update README.md
Information about running at RX7600
2023-06-22 22:23:47 +06:00
9fccf4aa03 Add original_shape parameter to transformer patch extra_options. 2023-06-21 13:22:01 -04:00
852cf4db99 Merge branch 'widget-input-overlapping' of https://github.com/ssitu/ComfyUI 2023-06-21 02:45:59 -04:00
6f0f8aa7aa Merge branch 'reroute-disconnect-fix' of https://github.com/ssitu/ComfyUI 2023-06-21 02:45:11 -04:00
51581dbfa9 Fix last commits causing an issue with the text encoder lora. 2023-06-20 19:44:39 -04:00
bf3f271775 Add some nodes for basic model merging. 2023-06-20 19:17:03 -04:00
8125b51a62 Keep a set of model_keys for faster add_patches. 2023-06-20 19:08:48 -04:00
45beebd33c Add a type of model patch useful for model merging. 2023-06-20 17:34:11 -04:00
6f54b01954 Fix reroute node connecting different types 2023-06-20 15:25:56 -04:00
8c3d24434a Fix overlapping when converting widgets to inputs 2023-06-20 12:03:46 -04:00
186f92042b Merge branch 'improve-keyboard' of https://github.com/reaper47/ComfyUI 2023-06-20 00:54:04 -04:00
96e8307bd3 Clean keybinds extension 2023-06-19 21:32:21 +02:00
036a22077c Fix k_diffusion math being off by a tiny bit during txt2img. 2023-06-19 15:28:54 -04:00
8883cb0f67 Add a way to set patches that modify the attn2 output.
Change the transformer patches function format to be more future proof.
2023-06-18 22:58:22 -04:00
cd930d4e7f pop clip vision keys after loading them. 2023-06-18 21:21:17 -04:00
c9e4a8c9e5 Not needed anymore. 2023-06-18 13:06:59 -04:00
fb4bf7f591 This is not needed anymore and causes issues with alphas_cumprod. 2023-06-18 03:18:25 -04:00
45be2e92c1 Fix DDIM v-prediction. 2023-06-17 20:48:21 -04:00
e619278730 Merge branch 'html5-dialog' of https://github.com/reaper47/ComfyUI 2023-06-17 18:39:55 -04:00
8c9c94b5f3 Add bicubic upscale method. 2023-06-17 01:54:33 -04:00
e6e50ab2dd Fix an issue when alphas_comprod are half floats. 2023-06-16 17:16:51 -04:00
ae43f09ef7 All the unet weights should now be initialized with the right dtype. 2023-06-15 18:42:30 -04:00
cf3974c829 Update readme with command to install pytorch with ROCm5.5.
Remove mentions of python 3.10 since 3.11 works fine now.
2023-06-15 18:11:28 -04:00
f7edcfd927 Add a --gpu-only argument to keep and run everything on the GPU.
Make the CLIP model work on the GPU.
2023-06-15 15:38:52 -04:00
7bf89ba923 Initialize more unet weights as the right dtype. 2023-06-15 15:00:10 -04:00
e21d9ad445 Initialize transformer unet block weights in right dtype at the start. 2023-06-15 14:29:26 -04:00
3fbd0abc5f Add missed .comfy-table in CSS 2023-06-15 18:39:18 +02:00
34ddbfdc8a Beautify settings dialog 2023-06-15 18:36:52 +02:00
6253ec4aef Fix server crashing because of terminated websocket connection. 2023-06-15 11:01:56 -04:00
bb1f45d6e8 Properly disable weight initialization in clip models. 2023-06-14 20:13:08 -04:00
21f04fe632 Disable default weight values in unet conv2d for faster loading. 2023-06-14 19:46:08 -04:00
9d54066ebc This isn't needed for inference. 2023-06-14 13:05:08 -04:00
fa2cca056c Don't initialize CLIPVision weights to default values. 2023-06-14 12:57:02 -04:00
6b774589a5 Set model to fp16 before loading the state dict to lower ram bump. 2023-06-14 12:48:02 -04:00
0c7cad404c Don't initialize clip weights to default values. 2023-06-14 12:47:36 -04:00
6971646b8b Speed up model loading a bit.
Default pytorch Linear initializes the weights which is useless and slow.
2023-06-14 12:09:41 -04:00
84f13f828a Merge branch 'issue-752' of https://github.com/reaper47/ComfyUI 2023-06-14 00:17:25 -04:00
388567f20b sampler_cfg_function now uses a dict for the argument.
This means arguments can be added without issues.
2023-06-13 16:10:36 -04:00
d52ed407a7 Send websocket message only when prompt is actually done executing. 2023-06-13 13:38:43 -04:00
ff9b22d79e Turn on safe load for a few models. 2023-06-13 10:12:03 -04:00
735ac4cf81 Remove pytorch_lightning dependency. 2023-06-13 10:11:33 -04:00
cb180b9998 Add some missing direct dependencies that were getting pulled indirectly. 2023-06-13 02:45:26 -04:00
2b14041d4b Remove useless code. 2023-06-13 02:40:58 -04:00
aba886e9da Issue 741: Darken white background 2023-06-13 08:27:26 +02:00
274dff3257 Remove more useless files. 2023-06-13 02:22:19 -04:00
f0a2b81cd0 Cleanup: Remove a bunch of useless files. 2023-06-13 02:19:08 -04:00
74297f5f9d Merge branch 'master' of https://github.com/ssitu/ComfyUI 2023-06-13 01:41:27 -04:00
0c874e604c Fix unhandled message "execution_cached" 2023-06-12 17:16:03 -04:00
2803e78bd0 Add a note to script about which websocket library is used. 2023-06-12 17:05:28 -04:00
f5d8aadb22 Add script example that downloads the images after a prompt is executed. 2023-06-12 14:36:45 -04:00
af91df85c2 Add a /history/{prompt_id} endpoint. 2023-06-12 14:34:30 -04:00
3402ec0c0d Issue 752: Fix background 2023-06-12 15:58:05 +02:00
67833c83d8 Add ImageScaleBy node. 2023-06-12 01:14:04 -04:00
f8c5931053 Split the batch in VAEEncode if there's not enough memory. 2023-06-12 00:21:50 -04:00
c069fc0730 Auto switch to tiled VAE encode if regular one runs out of memory. 2023-06-11 23:25:39 -04:00
c64ca8c0b2 Refactor unCLIP noise augment out of samplers.py 2023-06-11 04:01:18 -04:00
7b2f09b5fa Issue 742: Extension folder should be ignored 2023-06-10 21:53:49 +02:00
656f62569d Make the sections in the others install section more clearly separate. 2023-06-10 04:19:33 -04:00
b18946c53b Merge branch 'next-task' of https://github.com/reaper47/ComfyUI 2023-06-10 03:23:25 -04:00
ba23753670 DirectML is for Windows. 2023-06-10 03:23:01 -04:00
2bcdd6c7d4 Add install instructions for Apple silicon 2023-06-09 22:25:33 +02:00
de142eaad5 Simpler base model code. 2023-06-09 12:31:16 -04:00
bfebe2d6c3 Improve ContextMenuFilter extension 2023-06-09 13:29:15 +02:00
4b0b516544 Add code to handle primitive nodes connected to reroute nodes.
Revert last commit because I noticed it broke a few things.
2023-06-09 02:49:13 -04:00
8e14c46a38 allows connect primitive node to reroute if primitive node has type (#751)
Co-authored-by: Lt.Dr.Data <lt.dr.data@gmail.com>
2023-06-09 02:21:30 -04:00
8b82f79cb2 Merge branch 'comment-syntax' of https://github.com/space-nuko/ComfyUI 2023-06-09 02:15:44 -04:00
23cf8ca7c5 Fix bug when embedding gets ignored because of mismatched size. 2023-06-08 23:48:14 -04:00
65922419e2 Add comment note in README 2023-06-08 12:12:07 -05:00
eed4f62cc5 Add comment support to dynamic prompts nodes 2023-06-08 12:08:00 -05:00
29c50954ea Add some quick instructions how to use directml. 2023-06-08 02:00:44 -04:00
631132c8c5 Merge branch 'bugfix/paste-clipspace' of https://github.com/ltdrdata/ComfyUI 2023-06-08 01:23:35 -04:00
28677342c1 robust paste for image 2023-06-08 00:06:56 +09:00
70e02b443f robust patch on pasteFromClipspace 2023-06-07 22:56:08 +09:00
5cf4079923 Give linux some love 2023-06-07 15:15:38 +02:00
ee62b4ecc2 Merge branch 'bugfix/widget_size_conflict' of https://github.com/ltdrdata/ComfyUI 2023-06-07 02:08:07 -04:00
4f1d8c3370 Merge branch 'update-gitignore' of https://github.com/reaper47/ComfyUI 2023-06-07 02:07:52 -04:00
0e425603fb Small refactor. 2023-06-06 13:23:01 -04:00
3b5b095d04 Add .idea/ to .gitignore 2023-06-06 17:40:07 +02:00
422163c2ba bugfix: Fixing the calculation issue when an image widget is added to the size calculation of the text widget. 2023-06-06 22:29:19 +09:00
a3a713b6c5 Refactor previews into one command line argument.
Clean up a few things.
2023-06-06 02:13:05 -04:00
081134f5c8 Merge branch 'taesd-preview' of https://github.com/space-nuko/ComfyUI 2023-06-05 23:53:36 -04:00
2b2ea5194e Add readme note 2023-06-05 19:16:51 -05:00
8b4a6c19c2 Fix 2023-06-05 19:00:51 -05:00
3e17971acb preview method autodetection 2023-06-05 18:59:10 -05:00
d5a28fadaa Add latent2rgb preview 2023-06-05 18:39:56 -05:00
70d72c4336 Slightly less vibrant sample 2023-06-05 15:26:56 -05:00
48f7ec750c Make previews into cli option 2023-06-05 13:19:02 -05:00
f326a0a468 Make new LATENT_PREVIEWER type for declaring KSampler preview methods 2023-06-05 09:20:20 -05:00
a9fa2d3727 Fix 2023-06-05 09:20:20 -05:00
38bc02bb40 Fix 2023-06-05 09:20:20 -05:00
1c40296d74 Fix 2023-06-05 09:20:20 -05:00
b4f434ee66 Preview sampled images with TAESD 2023-06-05 09:20:17 -05:00
2ec980bb9f Limit preview to webp and RGB jpeg. 2023-06-05 01:50:14 -04:00
9f3a19b728 improve: lightweight preview to reduce network traffic (#733)
* To reduce bandwidth traffic in a remote environment, a lossy compression-based preview mode is provided for displaying simple visualizations in node-based widgets.

* Added 'preview=[image format]' option to the '/view' API.
* Updated node to use preview for displaying images as widgets.
* Excluded preview usage in the open image, save image, mask editor where the original data is required.

* Made preview_format parameterizable for extensibility.

* default preview format changed: jpeg -> webp

* Support advanced preview_format option.
- grayscale option for visual debugging
- quality option for aggressive reducing

L?;format;quality?

ex)
jpeg => rgb, jpeg, quality 90
L;webp;80 => grayscale, webp, quality 80
L;png => grayscale, png, quality 90
webp;50 => rgb, webp, quality 50

* move comment

* * add settings for preview_format
* default value is ''(= don't reencode)

---------

Co-authored-by: Lt.Dr.Data <lt.dr.data@gmail.com>
2023-06-05 01:49:43 -04:00
fed0a4dd29 Some comments to say what the vram state options mean. 2023-06-04 17:51:04 -04:00
126b4050dc Crash fix for intermittent crashes that occur when opening MaskEditor. (#732) 2023-06-03 12:25:49 -04:00
0764bb5218 Move node properties panel from double click to menu option. 2023-06-03 11:47:20 -04:00
c092ffcc18 Latest litegraph from upstream. 2023-06-03 11:46:52 -04:00
32f282c861 Search box style fix. 2023-06-03 11:19:10 -04:00
0a5fefd621 Cleanups and fixes for model_management.py
Hopefully fix regression on MPS and CPU.
2023-06-03 11:05:37 -04:00
700491d81a Implement global average pooling for controlnet. 2023-06-03 01:49:03 -04:00
66e588d837 Ignore folder path directories that don't exist. 2023-06-02 16:48:56 -04:00
871a86593a Smarter filename list caching. 2023-06-02 16:34:47 -04:00
67892b5ac5 Refactor and improve model_management code related to free memory. 2023-06-02 15:21:33 -04:00
499641ebf1 More accurate total 2023-06-02 00:14:41 -05:00
b5dd15c67a System stats endpoint 2023-06-01 23:26:23 -05:00
1bbd3f7fe1 Send back prompt number from prompt/ endpoint 2023-06-01 22:15:06 -05:00
5c38958e49 Tweak lowvram model memory so it's closer to what it was before. 2023-06-01 04:04:35 -04:00
94680732d3 Empty cache on mps. 2023-06-01 03:52:51 -04:00
d200fa1314 Prevent callers from mutating folder lists 2023-05-31 21:07:27 -04:00
b06c5259db Merge branch 'refactor/registerNodes' of https://github.com/ltdrdata/ComfyUI 2023-05-31 13:26:28 -04:00
03da8a3426 This is useless for inference. 2023-05-31 13:03:24 -04:00
8e8d6070f2 race condition patch 2023-05-31 23:26:56 +09:00
1f34bf08f0 To support dynamic custom loading, separate the node registration
process based on the defs in the registerNodes function.
2023-05-31 22:01:25 +09:00
606446d030 Merge branch 'fix-litegraph-css' of https://github.com/space-nuko/ComfyUI 2023-05-30 23:42:00 -04:00
8ef197f028 Keep list of filenames and only refresh it when something changes. 2023-05-30 18:48:50 -04:00
468c27afea Fix litegraph dialog z-index/font 2023-05-30 16:06:17 -05:00
04f4fba013 Fix litegraph dialog CSS 2023-05-30 16:01:49 -05:00
2260802d90 Check if folder_name is valid instead of just throwing exception. 2023-05-30 16:44:09 -04:00
9af7033c5e Merge branch 'hotfix/refresh-primitive-conflict' of https://github.com/ltdrdata/ComfyUI 2023-05-30 12:38:26 -04:00
eb448dd8e1 Auto load model in lowvram if not enough memory. 2023-05-30 12:36:41 -04:00
08abd838b8 HOTFIX: Patched the conflict issue between the Combo Refresh feature and PrimitiveNodes. 2023-05-30 15:26:45 +09:00
560e9f7a43 Disable repo owner validation in update.py 2023-05-29 11:29:00 -04:00
b9818eb910 Add route to get safetensors metadata:
/view_metadata/loras?filename=lora.safetensors
2023-05-29 02:48:50 -04:00
23ffafeb5d typo fix: field name in error message 2023-05-28 23:31:40 +09:00
a532888846 Support VAEs in diffusers format. 2023-05-28 02:02:09 -04:00
0fc483dcfd Refactor diffusers model convert code to be able to reuse it. 2023-05-28 01:55:40 -04:00
f3ac938b4a Round the mask values for bitwise operations. 2023-05-28 00:42:53 -04:00
ad81fd682a Fix issue with cancelling prompt. 2023-05-28 00:32:26 -04:00
1cfb2a733f Merge branch 'error-improvements' of https://github.com/space-nuko/ComfyUI 2023-05-27 23:09:40 -04:00
00646b0813 Bitwise operations for masks 2023-05-27 21:48:49 -05:00
03f2d0a764 Rename exception message field 2023-05-27 21:06:07 -05:00
52c9590b7b Exception message 2023-05-27 21:06:07 -05:00
62bdd9d26a Catch typecast errors 2023-05-27 21:06:07 -05:00
a9e7e23724 Fix 2023-05-27 21:06:07 -05:00
e2d080b694 Return null for value format 2023-05-27 21:06:07 -05:00
6b2a8a3845 Show message in the frontend if prompt execution raises an exception 2023-05-27 21:06:07 -05:00
ffec815257 Send back more information about exceptions that happen during execution 2023-05-27 21:06:07 -05:00
0d834e3a2b Add missing input name/config 2023-05-27 21:06:07 -05:00
c33b7c5549 Improve invalid prompt error message 2023-05-27 21:06:07 -05:00
cc4d3435d3 Highlight failing nodes/inputs in frontend 2023-05-27 21:06:07 -05:00
73e85fb3f4 Improve error output for failed nodes 2023-05-27 21:06:07 -05:00
9144947244 Merge branch 'zero-lora-weights' of https://github.com/space-nuko/ComfyUI 2023-05-26 22:32:10 -04:00
679bd2845a Safetensors isn't optional anymore. 2023-05-26 21:46:11 -04:00
4d1ed829d9 Don't load some model types if weight is zero 2023-05-26 19:33:30 -05:00
eb4bd7711a Remove einops. 2023-05-25 18:42:56 -04:00
87ab25fac7 Do operations in same order as the one it replaces. 2023-05-25 18:31:27 -04:00
2b1fac9708 Merge branch 'master' of https://github.com/BlenderNeko/ComfyUI 2023-05-25 14:44:16 -04:00
e1278fa925 Support old pytorch versions that don't have weights_only. 2023-05-25 13:30:59 -04:00
8b4b0c3188 vecorized bislerp 2023-05-25 19:23:47 +02:00
9b1396e93a Fix issue importing other ui prompts. 2023-05-24 14:01:11 -04:00
7310290f17 Pull in latest upscale model code from chainner. 2023-05-23 22:26:50 -04:00
c00bb1a0b7 Add a latent upscale by node. 2023-05-23 12:53:38 -04:00
b8ccbec6d8 Various improvements to bislerp. 2023-05-23 11:40:24 -04:00
451fb4169a Fix 'git pull' not working on the standalones. 2023-05-23 11:35:32 -04:00
34887b8885 Add experimental bislerp algorithm for latent upscaling.
It's like bilinear but with slerp.
2023-05-23 03:12:56 -04:00
48fcc5b777 Parsing error crash. 2023-05-22 20:51:30 -04:00
bfb13f5eee Remove useless call to /object_info 2023-05-22 17:05:23 -04:00
db27b0405a object_info now returns if node is an output_node or not. 2023-05-22 13:25:50 -04:00
ffc56c53c9 Add a node_errors to the /prompt error json response.
"node_errors" contains a dict keyed by node ids. The contents are a message
and a list of dependent outputs.
2023-05-22 13:22:38 -04:00
6cc450579b Auto transpose images from exif data. 2023-05-22 00:22:24 -04:00
dc198650c0 sample_dpmpp_2m_sde no longer crashes when step == 1. 2023-05-21 11:34:29 -04:00
4796e615dd Revert DPI fix since it caused more issues than it solved. 2023-05-21 10:34:26 -04:00
069657fbf3 Add DPM-Solver++(2M) SDE and exponential scheduler.
exponential scheduler is the one recommended with this sampler.
2023-05-21 01:46:03 -04:00
516119ad83 Print min and max values in validation error message. 2023-05-21 00:24:28 -04:00
3c76f43057 Cleaner code. 2023-05-20 23:06:33 -04:00
b8636a44aa Make scaled_dot_product switch to sliced attention on OOM. 2023-05-20 16:01:02 -04:00
797c4e8d3b Simplify and improve some vae attention code. 2023-05-20 15:07:21 -04:00
71666f248f Fix padding in Blur. 2023-05-20 10:08:47 -04:00
36af98d755 improve sharpen and blur nodes 2023-05-20 15:23:28 +02:00
b9daf4e30f Add a /object_info/{node_class} route to get only the info of one node. 2023-05-19 22:40:28 -04:00
e6e1999f96 Render UI at a higher resolution when viewing with a higher pixel ratio 2023-05-19 20:04:36 +01:00
2998e232cb Make multiline widget work with different canvas dimensions.
It now scales the textarea positioning using the canvas height/width.
2023-05-19 19:57:15 +01:00
8bbd9815a9 Support loading fp16 latent files. 2023-05-19 02:15:32 -04:00
62a371e12b Load workflow from latent file. 2023-05-18 02:41:21 -04:00
faf899ad5a LoadLatent and SaveLatent should behave like the LoadImage and SaveImage. 2023-05-18 00:09:12 -04:00
a7375103b9 Some small changes to Load/SaveLatent. 2023-05-17 23:41:57 -04:00
e7f2816c6f feat:Latent Save/Load (#662)
* wip

* latent dir

* fix

* fix

* now working

* mark todo

* remove server.py changes to separate PRt

---------

Co-authored-by: Lt.Dr.Data <lt.dr.data@gmail.com>
2023-05-17 23:40:28 -04:00
4088e61aa6 Update litegraph from upstream. 2023-05-16 15:35:07 -04:00
6a12094345 Merge branch 'patch/touch' of https://github.com/ltdrdata/ComfyUI 2023-05-16 11:55:20 -04:00
11e7168d56 Remove print. 2023-05-16 11:55:16 -04:00
7ada9e7d85 allows touch drag 2023-05-16 22:55:00 +09:00
13d94caf49 Add control_after_generate to combo primitive. 2023-05-16 03:18:11 -04:00
5f7968f1fa Print the endpoint ip for localtunnel in the colab notebook. 2023-05-16 01:12:44 -04:00
381 changed files with 322740 additions and 53613 deletions

View File

@ -1,3 +0,0 @@
..\python_embeded\python.exe .\update.py ..\ComfyUI\
..\python_embeded\python.exe -s -m pip install --upgrade --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cu121 -r ../ComfyUI/requirements.txt pygit2
pause

View File

@ -1,6 +1,9 @@
import pygit2
from datetime import datetime
import sys
import os
import shutil
import filecmp
def pull(repo, remote_name='origin', branch='master'):
for remote in repo.remotes:
@ -41,8 +44,9 @@ def pull(repo, remote_name='origin', branch='master'):
else:
raise AssertionError('Unknown merge analysis result')
repo = pygit2.Repository(str(sys.argv[1]))
pygit2.option(pygit2.GIT_OPT_SET_OWNER_VALIDATION, 0)
repo_path = str(sys.argv[1])
repo = pygit2.Repository(repo_path)
ident = pygit2.Signature('comfyui', 'comfy@ui')
try:
print("stashing current changes")
@ -51,15 +55,92 @@ except KeyError:
print("nothing to stash")
backup_branch_name = 'backup_branch_{}'.format(datetime.today().strftime('%Y-%m-%d_%H_%M_%S'))
print("creating backup branch: {}".format(backup_branch_name))
repo.branches.local.create(backup_branch_name, repo.head.peel())
try:
repo.branches.local.create(backup_branch_name, repo.head.peel())
except:
pass
print("checking out master branch")
branch = repo.lookup_branch('master')
ref = repo.lookup_reference(branch.name)
repo.checkout(ref)
if branch is None:
ref = repo.lookup_reference('refs/remotes/origin/master')
repo.checkout(ref)
branch = repo.lookup_branch('master')
if branch is None:
repo.create_branch('master', repo.get(ref.target))
else:
ref = repo.lookup_reference(branch.name)
repo.checkout(ref)
print("pulling latest changes")
pull(repo)
if "--stable" in sys.argv:
def latest_tag(repo):
versions = []
for k in repo.references:
try:
prefix = "refs/tags/v"
if k.startswith(prefix):
version = list(map(int, k[len(prefix):].split(".")))
versions.append((version[0] * 10000000000 + version[1] * 100000 + version[2], k))
except:
pass
versions.sort()
if len(versions) > 0:
return versions[-1][1]
return None
latest_tag = latest_tag(repo)
if latest_tag is not None:
repo.checkout(latest_tag)
print("Done!")
self_update = True
if len(sys.argv) > 2:
self_update = '--skip_self_update' not in sys.argv
update_py_path = os.path.realpath(__file__)
repo_update_py_path = os.path.join(repo_path, ".ci/update_windows/update.py")
cur_path = os.path.dirname(update_py_path)
req_path = os.path.join(cur_path, "current_requirements.txt")
repo_req_path = os.path.join(repo_path, "requirements.txt")
def files_equal(file1, file2):
try:
return filecmp.cmp(file1, file2, shallow=False)
except:
return False
def file_size(f):
try:
return os.path.getsize(f)
except:
return 0
if self_update and not files_equal(update_py_path, repo_update_py_path) and file_size(repo_update_py_path) > 10:
shutil.copy(repo_update_py_path, os.path.join(cur_path, "update_new.py"))
exit()
if not os.path.exists(req_path) or not files_equal(repo_req_path, req_path):
import subprocess
try:
subprocess.check_call([sys.executable, '-s', '-m', 'pip', 'install', '-r', repo_req_path])
shutil.copy(repo_req_path, req_path)
except:
pass
stable_update_script = os.path.join(repo_path, ".ci/update_windows/update_comfyui_stable.bat")
stable_update_script_to = os.path.join(cur_path, "update_comfyui_stable.bat")
try:
if not file_size(stable_update_script_to) > 10:
shutil.copy(stable_update_script, stable_update_script_to)
except:
pass

View File

@ -1,2 +1,8 @@
@echo off
..\python_embeded\python.exe .\update.py ..\ComfyUI\
pause
if exist update_new.py (
move /y update_new.py update.py
echo Running updater again since it got updated.
..\python_embeded\python.exe .\update.py ..\ComfyUI\ --skip_self_update
)
if "%~1"=="" pause

View File

@ -1,3 +0,0 @@
..\python_embeded\python.exe .\update.py ..\ComfyUI\
..\python_embeded\python.exe -s -m pip install --upgrade torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu117 xformers -r ../ComfyUI/requirements.txt pygit2
pause

View File

@ -0,0 +1,8 @@
@echo off
..\python_embeded\python.exe .\update.py ..\ComfyUI\ --stable
if exist update_new.py (
move /y update_new.py update.py
echo Running updater again since it got updated.
..\python_embeded\python.exe .\update.py ..\ComfyUI\ --skip_self_update --stable
)
if "%~1"=="" pause

View File

@ -1,11 +0,0 @@
@echo off
..\python_embeded\python.exe .\update.py ..\ComfyUI\
echo
echo This will try to update pytorch and all python dependencies, if you get an error wait for pytorch/xformers to fix their stuff
echo You should not be running this anyways unless you really have to
echo
echo If you just want to update normally, close this and run update_comfyui.bat instead.
echo
pause
..\python_embeded\python.exe -s -m pip install --upgrade torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118 xformers -r ../ComfyUI/requirements.txt pygit2
pause

View File

@ -14,7 +14,7 @@ run_cpu.bat
IF YOU GET A RED ERROR IN THE UI MAKE SURE YOU HAVE A MODEL/CHECKPOINT IN: ComfyUI\models\checkpoints
You can download the stable diffusion 1.5 one from: https://huggingface.co/runwayml/stable-diffusion-v1-5/blob/main/v1-5-pruned-emaonly.ckpt
You can download the stable diffusion 1.5 one from: https://huggingface.co/Comfy-Org/stable-diffusion-v1-5-archive/blob/main/v1-5-pruned-emaonly-fp16.safetensors
RECOMMENDED WAY TO UPDATE:

View File

@ -1,2 +1,2 @@
.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --use-pytorch-cross-attention
.\python_embeded\python.exe -s ComfyUI\main.py --windows-standalone-build --fast
pause

2
.gitattributes vendored Normal file
View File

@ -0,0 +1,2 @@
/web/assets/** linguist-generated
/web/** linguist-vendored

48
.github/ISSUE_TEMPLATE/bug-report.yml vendored Normal file
View File

@ -0,0 +1,48 @@
name: Bug Report
description: "Something is broken inside of ComfyUI. (Do not use this if you're just having issues and need help, or if the issue relates to a custom node)"
labels: ["Potential Bug"]
body:
- type: markdown
attributes:
value: |
Before submitting a **Bug Report**, please ensure the following:
- **1:** You are running the latest version of ComfyUI.
- **2:** You have looked at the existing bug reports and made sure this isn't already reported.
- **3:** You confirmed that the bug is not caused by a custom node. You can disable all custom nodes by passing
`--disable-all-custom-nodes` command line argument.
- **4:** This is an actual bug in ComfyUI, not just a support question. A bug is when you can specify exact
steps to replicate what went wrong and others will be able to repeat your steps and see the same issue happen.
If unsure, ask on the [ComfyUI Matrix Space](https://app.element.io/#/room/%23comfyui_space%3Amatrix.org) or the [Comfy Org Discord](https://discord.gg/comfyorg) first.
- type: textarea
attributes:
label: Expected Behavior
description: "What you expected to happen."
validations:
required: true
- type: textarea
attributes:
label: Actual Behavior
description: "What actually happened. Please include a screenshot of the issue if possible."
validations:
required: true
- type: textarea
attributes:
label: Steps to Reproduce
description: "Describe how to reproduce the issue. Please be sure to attach a workflow JSON or PNG, ideally one that doesn't require custom nodes to test. If the bug open happens when certain custom nodes are used, most likely that custom node is what has the bug rather than ComfyUI, in which case it should be reported to the node's author."
validations:
required: true
- type: textarea
attributes:
label: Debug Logs
description: "Please copy the output from your terminal logs here."
render: powershell
validations:
required: true
- type: textarea
attributes:
label: Other
description: "Any other additional information you think might be helpful."
validations:
required: false

11
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@ -0,0 +1,11 @@
blank_issues_enabled: true
contact_links:
- name: ComfyUI Frontend Issues
url: https://github.com/Comfy-Org/ComfyUI_frontend/issues
about: Issues related to the ComfyUI frontend (display issues, user interaction bugs), please go to the frontend repo to file the issue
- name: ComfyUI Matrix Space
url: https://app.element.io/#/room/%23comfyui_space%3Amatrix.org
about: The ComfyUI Matrix Space is available for support and general discussion related to ComfyUI (Matrix is like Discord but open source).
- name: Comfy Org Discord
url: https://discord.gg/comfyorg
about: The Comfy Org Discord is available for support and general discussion related to ComfyUI.

View File

@ -0,0 +1,32 @@
name: Feature Request
description: "You have an idea for something new you would like to see added to ComfyUI's core."
labels: [ "Feature" ]
body:
- type: markdown
attributes:
value: |
Before submitting a **Feature Request**, please ensure the following:
**1:** You are running the latest version of ComfyUI.
**2:** You have looked to make sure there is not already a feature that does what you need, and there is not already a Feature Request listed for the same idea.
**3:** This is something that makes sense to add to ComfyUI Core, and wouldn't make more sense as a custom node.
If unsure, ask on the [ComfyUI Matrix Space](https://app.element.io/#/room/%23comfyui_space%3Amatrix.org) or the [Comfy Org Discord](https://discord.gg/comfyorg) first.
- type: textarea
attributes:
label: Feature Idea
description: "Describe the feature you want to see."
validations:
required: true
- type: textarea
attributes:
label: Existing Solutions
description: "Please search through available custom nodes / extensions to see if there are existing custom solutions for this. If so, please link the options you found here as a reference."
validations:
required: false
- type: textarea
attributes:
label: Other
description: "Any other additional information you think might be helpful."
validations:
required: false

32
.github/ISSUE_TEMPLATE/user-support.yml vendored Normal file
View File

@ -0,0 +1,32 @@
name: User Support
description: "Use this if you need help with something, or you're experiencing an issue."
labels: [ "User Support" ]
body:
- type: markdown
attributes:
value: |
Before submitting a **User Report** issue, please ensure the following:
**1:** You are running the latest version of ComfyUI.
**2:** You have made an effort to find public answers to your question before asking here. In other words, you googled it first, and scrolled through recent help topics.
If unsure, ask on the [ComfyUI Matrix Space](https://app.element.io/#/room/%23comfyui_space%3Amatrix.org) or the [Comfy Org Discord](https://discord.gg/comfyorg) first.
- type: textarea
attributes:
label: Your question
description: "Post your question here. Please be as detailed as possible."
validations:
required: true
- type: textarea
attributes:
label: Logs
description: "If your question relates to an issue you're experiencing, please go to `Server` -> `Logs` -> potentially set `View Type` to `Debug` as well, then copypaste all the text into here."
render: powershell
validations:
required: false
- type: textarea
attributes:
label: Other
description: "Any other additional information you think might be helpful."
validations:
required: false

View File

@ -0,0 +1,53 @@
# This is the GitHub Workflow that drives full-GPU-enabled tests of pull requests to ComfyUI, when the 'Run-CI-Test' label is added
# Results are reported as checkmarks on the commits, as well as onto https://ci.comfy.org/
name: Pull Request CI Workflow Runs
on:
pull_request_target:
types: [labeled]
jobs:
pr-test-stable:
if: ${{ github.event.label.name == 'Run-CI-Test' }}
strategy:
fail-fast: false
matrix:
os: [macos, linux, windows]
python_version: ["3.9", "3.10", "3.11", "3.12"]
cuda_version: ["12.1"]
torch_version: ["stable"]
include:
- os: macos
runner_label: [self-hosted, macOS]
flags: "--use-pytorch-cross-attention"
- os: linux
runner_label: [self-hosted, Linux]
flags: ""
- os: windows
runner_label: [self-hosted, win]
flags: ""
runs-on: ${{ matrix.runner_label }}
steps:
- name: Test Workflows
uses: comfy-org/comfy-action@main
with:
os: ${{ matrix.os }}
python_version: ${{ matrix.python_version }}
torch_version: ${{ matrix.torch_version }}
google_credentials: ${{ secrets.GCS_SERVICE_ACCOUNT_JSON }}
comfyui_flags: ${{ matrix.flags }}
use_prior_commit: 'true'
comment:
if: ${{ github.event.label.name == 'Run-CI-Test' }}
runs-on: ubuntu-latest
permissions:
pull-requests: write
steps:
- uses: actions/github-script@v6
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: '(Automated Bot Message) CI Tests are running, you can view the results at https://ci.comfy.org/?branch=${{ github.event.pull_request.number }}%2Fmerge'
})

23
.github/workflows/pylint.yml vendored Normal file
View File

@ -0,0 +1,23 @@
name: Python Linting
on: [push, pull_request]
jobs:
pylint:
name: Run Pylint
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.x
- name: Install Pylint
run: pip install pylint
- name: Run Pylint
run: pylint --rcfile=.pylintrc $(find . -type f -name "*.py")

104
.github/workflows/stable-release.yml vendored Normal file
View File

@ -0,0 +1,104 @@
name: "Release Stable Version"
on:
workflow_dispatch:
inputs:
git_tag:
description: 'Git tag'
required: true
type: string
cu:
description: 'CUDA version'
required: true
type: string
default: "121"
python_minor:
description: 'Python minor version'
required: true
type: string
default: "11"
python_patch:
description: 'Python patch version'
required: true
type: string
default: "9"
jobs:
package_comfy_windows:
permissions:
contents: "write"
packages: "write"
pull-requests: "read"
runs-on: windows-latest
steps:
- uses: actions/checkout@v4
with:
ref: ${{ inputs.git_tag }}
fetch-depth: 0
persist-credentials: false
- uses: actions/cache/restore@v4
id: cache
with:
path: |
cu${{ inputs.cu }}_python_deps.tar
update_comfyui_and_python_dependencies.bat
key: ${{ runner.os }}-build-cu${{ inputs.cu }}-${{ inputs.python_minor }}
- shell: bash
run: |
mv cu${{ inputs.cu }}_python_deps.tar ../
mv update_comfyui_and_python_dependencies.bat ../
cd ..
tar xf cu${{ inputs.cu }}_python_deps.tar
pwd
ls
- shell: bash
run: |
cd ..
cp -r ComfyUI ComfyUI_copy
curl https://www.python.org/ftp/python/3.${{ inputs.python_minor }}.${{ inputs.python_patch }}/python-3.${{ inputs.python_minor }}.${{ inputs.python_patch }}-embed-amd64.zip -o python_embeded.zip
unzip python_embeded.zip -d python_embeded
cd python_embeded
echo ${{ env.MINOR_VERSION }}
echo 'import site' >> ./python3${{ inputs.python_minor }}._pth
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
./python.exe get-pip.py
./python.exe -s -m pip install ../cu${{ inputs.cu }}_python_deps/*
sed -i '1i../ComfyUI' ./python3${{ inputs.python_minor }}._pth
cd ..
git clone --depth 1 https://github.com/comfyanonymous/taesd
cp taesd/*.pth ./ComfyUI_copy/models/vae_approx/
mkdir ComfyUI_windows_portable
mv python_embeded ComfyUI_windows_portable
mv ComfyUI_copy ComfyUI_windows_portable/ComfyUI
cd ComfyUI_windows_portable
mkdir update
cp -r ComfyUI/.ci/update_windows/* ./update/
cp -r ComfyUI/.ci/windows_base_files/* ./
cp ../update_comfyui_and_python_dependencies.bat ./update/
cd ..
"C:\Program Files\7-Zip\7z.exe" a -t7z -m0=lzma2 -mx=8 -mfb=64 -md=32m -ms=on -mf=BCJ2 ComfyUI_windows_portable.7z ComfyUI_windows_portable
mv ComfyUI_windows_portable.7z ComfyUI/ComfyUI_windows_portable_nvidia.7z
cd ComfyUI_windows_portable
python_embeded/python.exe -s ComfyUI/main.py --quick-test-for-ci --cpu
ls
- name: Upload binaries to release
uses: svenstaro/upload-release-action@v2
with:
repo_token: ${{ secrets.GITHUB_TOKEN }}
file: ComfyUI_windows_portable_nvidia.7z
tag: ${{ inputs.git_tag }}
overwrite: true
prerelease: true
make_latest: false

21
.github/workflows/stale-issues.yml vendored Normal file
View File

@ -0,0 +1,21 @@
name: 'Close stale issues'
on:
schedule:
# Run daily at 430 am PT
- cron: '30 11 * * *'
permissions:
issues: write
jobs:
stale:
runs-on: ubuntu-latest
steps:
- uses: actions/stale@v9
with:
stale-issue-message: "This issue is being marked stale because it has not had any activity for 30 days. Reply below within 7 days if your issue still isn't solved, and it will be left open. Otherwise, the issue will be closed automatically."
days-before-stale: 30
days-before-close: 7
stale-issue-label: 'Stale'
only-labels: 'User Support'
exempt-all-assignees: true
exempt-all-milestones: true

31
.github/workflows/test-build.yml vendored Normal file
View File

@ -0,0 +1,31 @@
name: Build package
#
# This workflow is a test of the python package build.
# Install Python dependencies across different Python versions.
#
on:
push:
paths:
- "requirements.txt"
- ".github/workflows/test-build.yml"
jobs:
build:
name: Build Test
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.8", "3.9", "3.10", "3.11"]
steps:
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt

95
.github/workflows/test-ci.yml vendored Normal file
View File

@ -0,0 +1,95 @@
# This is the GitHub Workflow that drives automatic full-GPU-enabled tests of all new commits to the master branch of ComfyUI
# Results are reported as checkmarks on the commits, as well as onto https://ci.comfy.org/
name: Full Comfy CI Workflow Runs
on:
push:
branches:
- master
paths-ignore:
- 'app/**'
- 'input/**'
- 'output/**'
- 'notebooks/**'
- 'script_examples/**'
- '.github/**'
- 'web/**'
workflow_dispatch:
jobs:
test-stable:
strategy:
fail-fast: false
matrix:
os: [macos, linux, windows]
python_version: ["3.9", "3.10", "3.11", "3.12"]
cuda_version: ["12.1"]
torch_version: ["stable"]
include:
- os: macos
runner_label: [self-hosted, macOS]
flags: "--use-pytorch-cross-attention"
- os: linux
runner_label: [self-hosted, Linux]
flags: ""
- os: windows
runner_label: [self-hosted, win]
flags: ""
runs-on: ${{ matrix.runner_label }}
steps:
- name: Test Workflows
uses: comfy-org/comfy-action@main
with:
os: ${{ matrix.os }}
python_version: ${{ matrix.python_version }}
torch_version: ${{ matrix.torch_version }}
google_credentials: ${{ secrets.GCS_SERVICE_ACCOUNT_JSON }}
comfyui_flags: ${{ matrix.flags }}
test-win-nightly:
strategy:
fail-fast: true
matrix:
os: [windows]
python_version: ["3.9", "3.10", "3.11", "3.12"]
cuda_version: ["12.1"]
torch_version: ["nightly"]
include:
- os: windows
runner_label: [self-hosted, win]
flags: ""
runs-on: ${{ matrix.runner_label }}
steps:
- name: Test Workflows
uses: comfy-org/comfy-action@main
with:
os: ${{ matrix.os }}
python_version: ${{ matrix.python_version }}
torch_version: ${{ matrix.torch_version }}
google_credentials: ${{ secrets.GCS_SERVICE_ACCOUNT_JSON }}
comfyui_flags: ${{ matrix.flags }}
test-unix-nightly:
strategy:
fail-fast: false
matrix:
os: [macos, linux]
python_version: ["3.11"]
cuda_version: ["12.1"]
torch_version: ["nightly"]
include:
- os: macos
runner_label: [self-hosted, macOS]
flags: "--use-pytorch-cross-attention"
- os: linux
runner_label: [self-hosted, Linux]
flags: ""
runs-on: ${{ matrix.runner_label }}
steps:
- name: Test Workflows
uses: comfy-org/comfy-action@main
with:
os: ${{ matrix.os }}
python_version: ${{ matrix.python_version }}
torch_version: ${{ matrix.torch_version }}
google_credentials: ${{ secrets.GCS_SERVICE_ACCOUNT_JSON }}
comfyui_flags: ${{ matrix.flags }}

45
.github/workflows/test-launch.yml vendored Normal file
View File

@ -0,0 +1,45 @@
name: Test server launches without errors
on:
push:
branches: [ main, master ]
pull_request:
branches: [ main, master ]
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout ComfyUI
uses: actions/checkout@v4
with:
repository: "comfyanonymous/ComfyUI"
path: "ComfyUI"
- uses: actions/setup-python@v4
with:
python-version: '3.8'
- name: Install requirements
run: |
python -m pip install --upgrade pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt
pip install wait-for-it
working-directory: ComfyUI
- name: Start ComfyUI server
run: |
python main.py --cpu 2>&1 | tee console_output.log &
wait-for-it --service 127.0.0.1:8188 -t 600
working-directory: ComfyUI
- name: Check for unhandled exceptions in server log
run: |
if grep -qE "Exception|Error" console_output.log; then
echo "Unhandled exception/error found in server log."
exit 1
fi
working-directory: ComfyUI
- uses: actions/upload-artifact@v4
if: always()
with:
name: console-output
path: ComfyUI/console_output.log
retention-days: 30

View File

@ -1,71 +0,0 @@
name: "Windows Release cu118 dependencies"
on:
workflow_dispatch:
# push:
# branches:
# - master
jobs:
build_dependencies:
env:
# you need at least cuda 5.0 for some of the stuff compiled here.
TORCH_CUDA_ARCH_LIST: "5.0+PTX 6.0 6.1 7.0 7.5 8.0 8.6 8.9"
FORCE_CUDA: 1
MAX_JOBS: 1 # will crash otherwise
DISTUTILS_USE_SDK: 1 # otherwise distutils will complain on windows about multiple versions of msvc
XFORMERS_BUILD_TYPE: "Release"
runs-on: windows-latest
steps:
- name: Cache Built Dependencies
uses: actions/cache@v3
id: cache-cu118_python_stuff
with:
path: cu118_python_deps.tar
key: ${{ runner.os }}-build-cu118
- if: steps.cache-cu118_python_stuff.outputs.cache-hit != 'true'
uses: actions/checkout@v3
- if: steps.cache-cu118_python_stuff.outputs.cache-hit != 'true'
uses: actions/setup-python@v4
with:
python-version: '3.10.9'
- if: steps.cache-cu118_python_stuff.outputs.cache-hit != 'true'
uses: comfyanonymous/cuda-toolkit@test
id: cuda-toolkit
with:
cuda: '11.8.0'
# copied from xformers github
- name: Setup MSVC
uses: ilammy/msvc-dev-cmd@v1
- name: Configure Pagefile
# windows runners will OOM with many CUDA architectures
# we cheat here with a page file
uses: al-cheb/configure-pagefile-action@v1.3
with:
minimum-size: 2GB
# really unfortunate: https://github.com/ilammy/msvc-dev-cmd#name-conflicts-with-shell-bash
- name: Remove link.exe
shell: bash
run: rm /usr/bin/link
- if: steps.cache-cu118_python_stuff.outputs.cache-hit != 'true'
shell: bash
run: |
python -m pip wheel --no-cache-dir torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118 -r requirements.txt pygit2 -w ./temp_wheel_dir
python -m pip install --no-cache-dir ./temp_wheel_dir/*
echo installed basic
git clone --recurse-submodules https://github.com/facebookresearch/xformers.git
cd xformers
python -m pip install --no-cache-dir wheel setuptools twine
echo building xformers
python setup.py bdist_wheel -d ../temp_wheel_dir/
cd ..
rm -rf xformers
ls -lah temp_wheel_dir
mv temp_wheel_dir cu118_python_deps
tar cf cu118_python_deps.tar cu118_python_deps

View File

@ -1,30 +0,0 @@
name: "Windows Release cu118 dependencies 2"
on:
workflow_dispatch:
# push:
# branches:
# - master
jobs:
build_dependencies:
runs-on: windows-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.10.9'
- shell: bash
run: |
python -m pip wheel --no-cache-dir torch torchvision torchaudio xformers --extra-index-url https://download.pytorch.org/whl/cu118 -r requirements.txt pygit2 -w ./temp_wheel_dir
python -m pip install --no-cache-dir ./temp_wheel_dir/*
echo installed basic
ls -lah temp_wheel_dir
mv temp_wheel_dir cu118_python_deps
tar cf cu118_python_deps.tar cu118_python_deps
- uses: actions/cache/save@v3
with:
path: cu118_python_deps.tar
key: ${{ runner.os }}-build-cu118

View File

@ -1,76 +0,0 @@
name: "Windows Release cu118 packaging"
on:
workflow_dispatch:
# push:
# branches:
# - master
jobs:
package_comfyui:
permissions:
contents: "write"
packages: "write"
pull-requests: "read"
runs-on: windows-latest
steps:
- uses: actions/cache/restore@v3
id: cache
with:
path: cu118_python_deps.tar
key: ${{ runner.os }}-build-cu118
- shell: bash
run: |
mv cu118_python_deps.tar ../
cd ..
tar xf cu118_python_deps.tar
pwd
ls
- uses: actions/checkout@v3
with:
fetch-depth: 0
- shell: bash
run: |
cd ..
cp -r ComfyUI ComfyUI_copy
curl https://www.python.org/ftp/python/3.10.9/python-3.10.9-embed-amd64.zip -o python_embeded.zip
unzip python_embeded.zip -d python_embeded
cd python_embeded
echo 'import site' >> ./python310._pth
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
./python.exe get-pip.py
./python.exe -s -m pip install ../cu118_python_deps/*
sed -i '1i../ComfyUI' ./python310._pth
cd ..
mkdir ComfyUI_windows_portable
mv python_embeded ComfyUI_windows_portable
mv ComfyUI_copy ComfyUI_windows_portable/ComfyUI
cd ComfyUI_windows_portable
mkdir update
cp -r ComfyUI/.ci/update_windows/* ./update/
cp -r ComfyUI/.ci/update_windows_cu118/* ./update/
cp -r ComfyUI/.ci/windows_base_files/* ./
cd ..
"C:\Program Files\7-Zip\7z.exe" a -t7z -m0=lzma -mx=8 -mfb=64 -md=32m -ms=on ComfyUI_windows_portable.7z ComfyUI_windows_portable
mv ComfyUI_windows_portable.7z ComfyUI/new_ComfyUI_windows_portable_nvidia_cu118_or_cpu.7z
cd ComfyUI_windows_portable
python_embeded/python.exe -s ComfyUI/main.py --quick-test-for-ci --cpu
ls
- name: Upload binaries to release
uses: svenstaro/upload-release-action@v2
with:
repo_token: ${{ secrets.GITHUB_TOKEN }}
file: new_ComfyUI_windows_portable_nvidia_cu118_or_cpu.7z
tag: "latest"
overwrite: true

View File

@ -0,0 +1,71 @@
name: "Windows Release dependencies"
on:
workflow_dispatch:
inputs:
xformers:
description: 'xformers version'
required: false
type: string
default: ""
extra_dependencies:
description: 'extra dependencies'
required: false
type: string
default: "\"numpy<2\""
cu:
description: 'cuda version'
required: true
type: string
default: "124"
python_minor:
description: 'python minor version'
required: true
type: string
default: "11"
python_patch:
description: 'python patch version'
required: true
type: string
default: "9"
# push:
# branches:
# - master
jobs:
build_dependencies:
runs-on: windows-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: 3.${{ inputs.python_minor }}.${{ inputs.python_patch }}
- shell: bash
run: |
echo "@echo off
call update_comfyui.bat nopause
echo -
echo This will try to update pytorch and all python dependencies.
echo -
echo If you just want to update normally, close this and run update_comfyui.bat instead.
echo -
pause
..\python_embeded\python.exe -s -m pip install --upgrade torch torchvision torchaudio ${{ inputs.xformers }} --extra-index-url https://download.pytorch.org/whl/cu${{ inputs.cu }} -r ../ComfyUI/requirements.txt pygit2
pause" > update_comfyui_and_python_dependencies.bat
python -m pip wheel --no-cache-dir torch torchvision torchaudio ${{ inputs.xformers }} ${{ inputs.extra_dependencies }} --extra-index-url https://download.pytorch.org/whl/cu${{ inputs.cu }} -r requirements.txt pygit2 -w ./temp_wheel_dir
python -m pip install --no-cache-dir ./temp_wheel_dir/*
echo installed basic
ls -lah temp_wheel_dir
mv temp_wheel_dir cu${{ inputs.cu }}_python_deps
tar cf cu${{ inputs.cu }}_python_deps.tar cu${{ inputs.cu }}_python_deps
- uses: actions/cache/save@v4
with:
path: |
cu${{ inputs.cu }}_python_deps.tar
update_comfyui_and_python_dependencies.bat
key: ${{ runner.os }}-build-cu${{ inputs.cu }}-${{ inputs.python_minor }}

View File

@ -2,6 +2,24 @@ name: "Windows Release Nightly pytorch"
on:
workflow_dispatch:
inputs:
cu:
description: 'cuda version'
required: true
type: string
default: "124"
python_minor:
description: 'python minor version'
required: true
type: string
default: "12"
python_patch:
description: 'python patch version'
required: true
type: string
default: "4"
# push:
# branches:
# - master
@ -14,28 +32,31 @@ jobs:
pull-requests: "read"
runs-on: windows-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: actions/setup-python@v4
persist-credentials: false
- uses: actions/setup-python@v5
with:
python-version: '3.11.3'
python-version: 3.${{ inputs.python_minor }}.${{ inputs.python_patch }}
- shell: bash
run: |
cd ..
cp -r ComfyUI ComfyUI_copy
curl https://www.python.org/ftp/python/3.11.3/python-3.11.3-embed-amd64.zip -o python_embeded.zip
curl https://www.python.org/ftp/python/3.${{ inputs.python_minor }}.${{ inputs.python_patch }}/python-3.${{ inputs.python_minor }}.${{ inputs.python_patch }}-embed-amd64.zip -o python_embeded.zip
unzip python_embeded.zip -d python_embeded
cd python_embeded
echo 'import site' >> ./python311._pth
echo 'import site' >> ./python3${{ inputs.python_minor }}._pth
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
./python.exe get-pip.py
python -m pip wheel torch torchvision torchaudio --pre --extra-index-url https://download.pytorch.org/whl/nightly/cu121 -r ../ComfyUI/requirements.txt pygit2 -w ../temp_wheel_dir
python -m pip wheel torch torchvision torchaudio --pre --extra-index-url https://download.pytorch.org/whl/nightly/cu${{ inputs.cu }} -r ../ComfyUI/requirements.txt pygit2 -w ../temp_wheel_dir
ls ../temp_wheel_dir
./python.exe -s -m pip install --pre ../temp_wheel_dir/*
sed -i '1i../ComfyUI' ./python311._pth
sed -i '1i../ComfyUI' ./python3${{ inputs.python_minor }}._pth
cd ..
git clone --depth 1 https://github.com/comfyanonymous/taesd
cp taesd/*.pth ./ComfyUI_copy/models/vae_approx/
mkdir ComfyUI_windows_portable_nightly_pytorch
mv python_embeded ComfyUI_windows_portable_nightly_pytorch
@ -46,12 +67,14 @@ jobs:
mkdir update
cp -r ComfyUI/.ci/update_windows/* ./update/
cp -r ComfyUI/.ci/windows_base_files/* ./
cp -r ComfyUI/.ci/nightly/update_windows/* ./update/
cp -r ComfyUI/.ci/nightly/windows_base_files/* ./
cp -r ComfyUI/.ci/windows_nightly_base_files/* ./
echo "call update_comfyui.bat nopause
..\python_embeded\python.exe -s -m pip install --upgrade --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cu${{ inputs.cu }} -r ../ComfyUI/requirements.txt pygit2
pause" > ./update/update_comfyui_and_python_dependencies.bat
cd ..
"C:\Program Files\7-Zip\7z.exe" a -t7z -m0=lzma -mx=8 -mfb=64 -md=32m -ms=on ComfyUI_windows_portable_nightly_pytorch.7z ComfyUI_windows_portable_nightly_pytorch
"C:\Program Files\7-Zip\7z.exe" a -t7z -m0=lzma2 -mx=8 -mfb=64 -md=32m -ms=on -mf=BCJ2 ComfyUI_windows_portable_nightly_pytorch.7z ComfyUI_windows_portable_nightly_pytorch
mv ComfyUI_windows_portable_nightly_pytorch.7z ComfyUI/ComfyUI_windows_portable_nvidia_or_cpu_nightly_pytorch.7z
cd ComfyUI_windows_portable_nightly_pytorch

View File

@ -0,0 +1,100 @@
name: "Windows Release packaging"
on:
workflow_dispatch:
inputs:
cu:
description: 'cuda version'
required: true
type: string
default: "124"
python_minor:
description: 'python minor version'
required: true
type: string
default: "11"
python_patch:
description: 'python patch version'
required: true
type: string
default: "9"
# push:
# branches:
# - master
jobs:
package_comfyui:
permissions:
contents: "write"
packages: "write"
pull-requests: "read"
runs-on: windows-latest
steps:
- uses: actions/cache/restore@v4
id: cache
with:
path: |
cu${{ inputs.cu }}_python_deps.tar
update_comfyui_and_python_dependencies.bat
key: ${{ runner.os }}-build-cu${{ inputs.cu }}-${{ inputs.python_minor }}
- shell: bash
run: |
mv cu${{ inputs.cu }}_python_deps.tar ../
mv update_comfyui_and_python_dependencies.bat ../
cd ..
tar xf cu${{ inputs.cu }}_python_deps.tar
pwd
ls
- uses: actions/checkout@v4
with:
fetch-depth: 0
persist-credentials: false
- shell: bash
run: |
cd ..
cp -r ComfyUI ComfyUI_copy
curl https://www.python.org/ftp/python/3.${{ inputs.python_minor }}.${{ inputs.python_patch }}/python-3.${{ inputs.python_minor }}.${{ inputs.python_patch }}-embed-amd64.zip -o python_embeded.zip
unzip python_embeded.zip -d python_embeded
cd python_embeded
echo 'import site' >> ./python3${{ inputs.python_minor }}._pth
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
./python.exe get-pip.py
./python.exe -s -m pip install ../cu${{ inputs.cu }}_python_deps/*
sed -i '1i../ComfyUI' ./python3${{ inputs.python_minor }}._pth
cd ..
git clone --depth 1 https://github.com/comfyanonymous/taesd
cp taesd/*.pth ./ComfyUI_copy/models/vae_approx/
mkdir ComfyUI_windows_portable
mv python_embeded ComfyUI_windows_portable
mv ComfyUI_copy ComfyUI_windows_portable/ComfyUI
cd ComfyUI_windows_portable
mkdir update
cp -r ComfyUI/.ci/update_windows/* ./update/
cp -r ComfyUI/.ci/windows_base_files/* ./
cp ../update_comfyui_and_python_dependencies.bat ./update/
cd ..
"C:\Program Files\7-Zip\7z.exe" a -t7z -m0=lzma2 -mx=8 -mfb=64 -md=32m -ms=on -mf=BCJ2 ComfyUI_windows_portable.7z ComfyUI_windows_portable
mv ComfyUI_windows_portable.7z ComfyUI/new_ComfyUI_windows_portable_nvidia_cu${{ inputs.cu }}_or_cpu.7z
cd ComfyUI_windows_portable
python_embeded/python.exe -s ComfyUI/main.py --quick-test-for-ci --cpu
ls
- name: Upload binaries to release
uses: svenstaro/upload-release-action@v2
with:
repo_token: ${{ secrets.GITHUB_TOKEN }}
file: new_ComfyUI_windows_portable_nvidia_cu${{ inputs.cu }}_or_cpu.7z
tag: "latest"
overwrite: true

23
.gitignore vendored
View File

@ -1,11 +1,22 @@
__pycache__/
*.py[cod]
output/
input/
!input/example.png
models/
temp/
custom_nodes/
/output/
/input/
!/input/example.png
/models/
/temp/
/custom_nodes/
!custom_nodes/example_node.py.example
extra_model_paths.yaml
/.vs
.vscode/
.idea/
venv/
/web/extensions/*
!/web/extensions/logging.js.example
!/web/extensions/core/
/tests-ui/data/object_info.json
/user/
*.log
web_custom_versions/
.DS_Store

3
.pylintrc Normal file
View File

@ -0,0 +1,3 @@
[MESSAGES CONTROL]
disable=all
enable=eval-used

1
CODEOWNERS Normal file
View File

@ -0,0 +1 @@
* @comfyanonymous

41
CONTRIBUTING.md Normal file
View File

@ -0,0 +1,41 @@
# Contributing to ComfyUI
Welcome, and thank you for your interest in contributing to ComfyUI!
There are several ways in which you can contribute, beyond writing code. The goal of this document is to provide a high-level overview of how you can get involved.
## Asking Questions
Have a question? Instead of opening an issue, please ask on [Discord](https://comfy.org/discord) or [Matrix](https://app.element.io/#/room/%23comfyui_space%3Amatrix.org) channels. Our team and the community will help you.
## Providing Feedback
Your comments and feedback are welcome, and the development team is available via a handful of different channels.
See the `#bug-report`, `#feature-request` and `#feedback` channels on Discord.
## Reporting Issues
Have you identified a reproducible problem in ComfyUI? Do you have a feature request? We want to hear about it! Here's how you can report your issue as effectively as possible.
### Look For an Existing Issue
Before you create a new issue, please do a search in [open issues](https://github.com/comfyanonymous/ComfyUI/issues) to see if the issue or feature request has already been filed.
If you find your issue already exists, make relevant comments and add your [reaction](https://github.com/blog/2119-add-reactions-to-pull-requests-issues-and-comments). Use a reaction in place of a "+1" comment:
* 👍 - upvote
* 👎 - downvote
If you cannot find an existing issue that describes your bug or feature, create a new issue. We have an issue template in place to organize new issues.
### Creating Pull Requests
* Please refer to the article on [creating pull requests](https://github.com/comfyanonymous/ComfyUI/wiki/How-to-Contribute-Code) and contributing to this project.
## Thank You
Your contributions to open source, large or small, make great projects like this possible. Thank you for taking the time to contribute.

227
README.md
View File

@ -1,8 +1,35 @@
ComfyUI
=======
A powerful and modular stable diffusion GUI and backend.
-----------
<div align="center">
# ComfyUI
**The most powerful and modular diffusion model GUI and backend.**
[![Website][website-shield]][website-url]
[![Dynamic JSON Badge][discord-shield]][discord-url]
[![Matrix][matrix-shield]][matrix-url]
<br>
[![][github-release-shield]][github-release-link]
[![][github-release-date-shield]][github-release-link]
[![][github-downloads-shield]][github-downloads-link]
[![][github-downloads-latest-shield]][github-downloads-link]
[matrix-shield]: https://img.shields.io/badge/Matrix-000000?style=flat&logo=matrix&logoColor=white
[matrix-url]: https://app.element.io/#/room/%23comfyui_space%3Amatrix.org
[website-shield]: https://img.shields.io/badge/ComfyOrg-4285F4?style=flat
[website-url]: https://www.comfy.org/
<!-- Workaround to display total user from https://github.com/badges/shields/issues/4500#issuecomment-2060079995 -->
[discord-shield]: https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fdiscord.com%2Fapi%2Finvites%2Fcomfyorg%3Fwith_counts%3Dtrue&query=%24.approximate_member_count&logo=discord&logoColor=white&label=Discord&color=green&suffix=%20total
[discord-url]: https://www.comfy.org/discord
[github-release-shield]: https://img.shields.io/github/v/release/comfyanonymous/ComfyUI?style=flat&sort=semver
[github-release-link]: https://github.com/comfyanonymous/ComfyUI/releases
[github-release-date-shield]: https://img.shields.io/github/release-date/comfyanonymous/ComfyUI?style=flat
[github-downloads-shield]: https://img.shields.io/github/downloads/comfyanonymous/ComfyUI/total?style=flat
[github-downloads-latest-shield]: https://img.shields.io/github/downloads/comfyanonymous/ComfyUI/latest/total?style=flat&label=downloads%40latest
[github-downloads-link]: https://github.com/comfyanonymous/ComfyUI/releases
![ComfyUI Screenshot](comfyui_screenshot.png)
</div>
This ui will let you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart based interface. For some workflow examples and see what ComfyUI can do you can check out:
### [ComfyUI Examples](https://comfyanonymous.github.io/ComfyUI_examples/)
@ -11,16 +38,17 @@ This ui will let you design and execute advanced stable diffusion pipelines usin
## Features
- Nodes/graph/flowchart interface to experiment and create complex Stable Diffusion workflows without needing to code anything.
- Fully supports SD1.x and SD2.x
- Fully supports SD1.x, SD2.x, [SDXL](https://comfyanonymous.github.io/ComfyUI_examples/sdxl/), [Stable Video Diffusion](https://comfyanonymous.github.io/ComfyUI_examples/video/), [Stable Cascade](https://comfyanonymous.github.io/ComfyUI_examples/stable_cascade/), [SD3](https://comfyanonymous.github.io/ComfyUI_examples/sd3/) and [Stable Audio](https://comfyanonymous.github.io/ComfyUI_examples/audio/)
- [Flux](https://comfyanonymous.github.io/ComfyUI_examples/flux/)
- Asynchronous Queue system
- Many optimizations: Only re-executes the parts of the workflow that changes between executions.
- Command line option: ```--lowvram``` to make it work on GPUs with less than 3GB vram (enabled automatically on GPUs with low vram)
- Smart memory management: can automatically run models on GPUs with as low as 1GB vram.
- Works even if you don't have a GPU with: ```--cpu``` (slow)
- Can load ckpt, safetensors and diffusers models/checkpoints. Standalone VAEs and CLIP models.
- Embeddings/Textual inversion
- [Loras (regular, locon and loha)](https://comfyanonymous.github.io/ComfyUI_examples/lora/)
- [Hypernetworks](https://comfyanonymous.github.io/ComfyUI_examples/hypernetworks/)
- Loading full workflows (with seeds) from generated PNG files.
- Loading full workflows (with seeds) from generated PNG, WebP and FLAC files.
- Saving/Loading workflows as Json files.
- Nodes interface can be used to create complex workflows like one for [Hires fix](https://comfyanonymous.github.io/ComfyUI_examples/2_pass_txt2img/) or much more advanced ones.
- [Area Composition](https://comfyanonymous.github.io/ComfyUI_examples/area_composition/)
@ -29,6 +57,12 @@ This ui will let you design and execute advanced stable diffusion pipelines usin
- [Upscale Models (ESRGAN, ESRGAN variants, SwinIR, Swin2SR, etc...)](https://comfyanonymous.github.io/ComfyUI_examples/upscale_models/)
- [unCLIP Models](https://comfyanonymous.github.io/ComfyUI_examples/unclip/)
- [GLIGEN](https://comfyanonymous.github.io/ComfyUI_examples/gligen/)
- [Model Merging](https://comfyanonymous.github.io/ComfyUI_examples/model_merging/)
- [LCM models and Loras](https://comfyanonymous.github.io/ComfyUI_examples/lcm/)
- [SDXL Turbo](https://comfyanonymous.github.io/ComfyUI_examples/sdturbo/)
- [AuraFlow](https://comfyanonymous.github.io/ComfyUI_examples/aura_flow/)
- [HunyuanDiT](https://comfyanonymous.github.io/ComfyUI_examples/hunyuan_dit/)
- Latent previews with [TAESD](#how-to-show-high-quality-previews)
- Starts up very fast.
- Works fully offline: will never download anything.
- [Config file](extra_model_paths.yaml.example) to set the search paths for models.
@ -37,28 +71,37 @@ Workflow examples can be found on the [Examples page](https://comfyanonymous.git
## Shortcuts
| Keybind | Explanation |
| - | - |
| Ctrl + Enter | Queue up current graph for generation |
| Ctrl + Shift + Enter | Queue up current graph as first for generation |
| Ctrl + S | Save workflow |
| Ctrl + O | Load workflow |
| Ctrl + A | Select all nodes |
| Ctrl + M | Mute/unmute selected nodes |
| Delete/Backspace | Delete selected nodes |
| Ctrl + Delete/Backspace | Delete the current graph |
| Space | Move the canvas around when held and moving the cursor |
| Ctrl/Shift + Click | Add clicked node to selection |
| Ctrl + C/Ctrl + V | Copy and paste selected nodes (without maintaining connections to outputs of unselected nodes) |
| Ctrl + C/Ctrl + Shift + V| Copy and paste selected nodes (maintaining connections from outputs of unselected nodes to inputs of pasted nodes) |
| Shift + Drag | Move multiple selected nodes at the same time |
| Ctrl + D | Load default graph |
| Q | Toggle visibility of the queue |
| H | Toggle visibility of history |
| R | Refresh graph |
| Double-Click LMB | Open node quick search palette |
| Keybind | Explanation |
|------------------------------------|--------------------------------------------------------------------------------------------------------------------|
| Ctrl + Enter | Queue up current graph for generation |
| Ctrl + Shift + Enter | Queue up current graph as first for generation |
| Ctrl + Alt + Enter | Cancel current generation |
| Ctrl + Z/Ctrl + Y | Undo/Redo |
| Ctrl + S | Save workflow |
| Ctrl + O | Load workflow |
| Ctrl + A | Select all nodes |
| Alt + C | Collapse/uncollapse selected nodes |
| Ctrl + M | Mute/unmute selected nodes |
| Ctrl + B | Bypass selected nodes (acts like the node was removed from the graph and the wires reconnected through) |
| Delete/Backspace | Delete selected nodes |
| Ctrl + Backspace | Delete the current graph |
| Space | Move the canvas around when held and moving the cursor |
| Ctrl/Shift + Click | Add clicked node to selection |
| Ctrl + C/Ctrl + V | Copy and paste selected nodes (without maintaining connections to outputs of unselected nodes) |
| Ctrl + C/Ctrl + Shift + V | Copy and paste selected nodes (maintaining connections from outputs of unselected nodes to inputs of pasted nodes) |
| Shift + Drag | Move multiple selected nodes at the same time |
| Ctrl + D | Load default graph |
| Alt + `+` | Canvas Zoom in |
| Alt + `-` | Canvas Zoom out |
| Ctrl + Shift + LMB + Vertical drag | Canvas Zoom in/out |
| Q | Toggle visibility of the queue |
| H | Toggle visibility of history |
| R | Refresh graph |
| Double-Click LMB | Open node quick search palette |
| Shift + Drag | Move multiple wires at once |
| Ctrl + Alt + LMB | Disconnect all wires from clicked slot |
Ctrl can also be replaced with Cmd instead for MacOS users
Ctrl can also be replaced with Cmd instead for macOS users
# Installing
@ -66,17 +109,19 @@ Ctrl can also be replaced with Cmd instead for MacOS users
There is a portable standalone build for Windows that should work for running on Nvidia GPUs or for running on your CPU only on the [releases page](https://github.com/comfyanonymous/ComfyUI/releases).
### [Direct link to download](https://github.com/comfyanonymous/ComfyUI/releases/download/latest/ComfyUI_windows_portable_nvidia_cu118_or_cpu.7z)
### [Direct link to download](https://github.com/comfyanonymous/ComfyUI/releases/latest/download/ComfyUI_windows_portable_nvidia.7z)
Just download, extract and run. Make sure you put your Stable Diffusion checkpoints/models (the huge ckpt/safetensors files) in: ComfyUI\models\checkpoints
Simply download, extract with [7-Zip](https://7-zip.org) and run. Make sure you put your Stable Diffusion checkpoints/models (the huge ckpt/safetensors files) in: ComfyUI\models\checkpoints
If you have trouble extracting it, right click the file -> properties -> unblock
#### How do I share models between another UI and ComfyUI?
See the [Config file](extra_model_paths.yaml.example) to set the search paths for models. In the standalone windows build you can find this file in the ComfyUI directory. Rename this file to extra_model_paths.yaml and edit it with your favorite text editor.
## Colab Notebook
## Jupyter Notebook
To run it on colab or paperspace you can use my [Colab Notebook](notebooks/comfyui_colab.ipynb) here: [Link to open with google colab](https://colab.research.google.com/github/comfyanonymous/ComfyUI/blob/master/notebooks/comfyui_colab.ipynb)
To run it on services like paperspace, kaggle or colab you can use my [Jupyter Notebook](notebooks/comfyui_colab.ipynb)
## Manual Install (Windows, Linux)
@ -86,19 +131,25 @@ Put your SD checkpoints (the huge ckpt/safetensors files) in: models/checkpoints
Put your VAE in: models/vae
At the time of writing this pytorch has issues with python versions higher than 3.10 so make sure your python/pip versions are 3.10.
### AMD GPUs (Linux only)
AMD users can install rocm and pytorch with pip if you don't have it already installed, this is the command to install the stable version:
```pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/rocm5.4.2```
```pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1```
This is the command to install the nightly with ROCm 6.2 which might have some performance improvements:
```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.2```
### NVIDIA
Nvidia users should install torch and xformers using this command:
Nvidia users should install stable pytorch using this command:
```pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118 xformers```
```pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu124```
This is the command to install pytorch nightly instead which might have performance improvements:
```pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124```
#### Troubleshooting
@ -118,33 +169,43 @@ After this you should have everything installed and can proceed to running Comfy
### Others:
[Intel Arc](https://github.com/comfyanonymous/ComfyUI/discussions/476)
#### Intel GPUs
Mac/MPS: There is basic support in the code but until someone makes some install instruction you are on your own.
Intel GPU support is available for all Intel GPUs supported by Intel's Extension for Pytorch (IPEX) with the support requirements listed in the [Installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu) page. Choose your platform and method of install and follow the instructions. The steps are as follows:
### I already have another UI for Stable Diffusion installed do I really have to install all of these dependencies?
1. Start by installing the drivers or kernel listed or newer in the Installation page of IPEX linked above for Windows and Linux if needed.
1. Follow the instructions to install [Intel's oneAPI Basekit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html) for your platform.
1. Install the packages for IPEX using the instructions provided in the Installation page for your platform.
1. Follow the [ComfyUI manual installation](#manual-install-windows-linux) instructions for Windows and Linux and run ComfyUI normally as described above after everything is installed.
You don't. If you have another UI installed and working with it's own python venv you can use that venv to run ComfyUI. You can open up your favorite terminal and activate it:
Additional discussion and help can be found [here](https://github.com/comfyanonymous/ComfyUI/discussions/476).
```source path_to_other_sd_gui/venv/bin/activate```
#### Apple Mac silicon
or on Windows:
You can install ComfyUI in Apple Mac silicon (M1 or M2) with any recent macOS version.
With Powershell: ```"path_to_other_sd_gui\venv\Scripts\Activate.ps1"```
1. Install pytorch nightly. For instructions, read the [Accelerated PyTorch training on Mac](https://developer.apple.com/metal/pytorch/) Apple Developer guide (make sure to install the latest pytorch nightly).
1. Follow the [ComfyUI manual installation](#manual-install-windows-linux) instructions for Windows and Linux.
1. Install the ComfyUI [dependencies](#dependencies). If you have another Stable Diffusion UI [you might be able to reuse the dependencies](#i-already-have-another-ui-for-stable-diffusion-installed-do-i-really-have-to-install-all-of-these-dependencies).
1. Launch ComfyUI by running `python main.py`
With cmd.exe: ```"path_to_other_sd_gui\venv\Scripts\activate.bat"```
> **Note**: Remember to add your models, VAE, LoRAs etc. to the corresponding Comfy folders, as discussed in [ComfyUI manual installation](#manual-install-windows-linux).
And then you can use that terminal to run Comfyui without installing any dependencies. Note that the venv folder might be called something else depending on the SD UI.
#### DirectML (AMD Cards on Windows)
```pip install torch-directml``` Then you can launch ComfyUI with: ```python main.py --directml```
# Running
```python main.py```
### For AMD 6700, 6600 and maybe others
### For AMD cards not officially supported by ROCm
Try running it with this command if you have issues:
```HSA_OVERRIDE_GFX_VERSION=10.3.0 python main.py```
For 6700, 6600 and maybe other RDNA2 or older: ```HSA_OVERRIDE_GFX_VERSION=10.3.0 python main.py```
For AMD 7600 and maybe other RDNA3 cards: ```HSA_OVERRIDE_GFX_VERSION=11.0.0 python main.py```
# Notes
@ -158,39 +219,77 @@ You can use () to change emphasis of a word or phrase like: (good code:1.2) or (
You can use {day|night}, for wildcard/dynamic prompts. With this syntax "{wild|card|test}" will be randomly replaced by either "wild", "card" or "test" by the frontend every time you queue the prompt. To use {} characters in your actual prompt escape them like: \\{ or \\}.
Dynamic prompts also support C-style comments, like `// comment` or `/* comment */`.
To use a textual inversion concepts/embeddings in a text prompt put them in the models/embeddings directory and use them in the CLIPTextEncode node like this (you can omit the .pt extension):
```embedding:embedding_filename.pt```
### Fedora
To get python 3.10 on fedora:
```dnf install python3.10```
## How to show high-quality previews?
Then you can:
Use ```--preview-method auto``` to enable previews.
```python3.10 -m ensurepip```
The default installation includes a fast latent preview method that's low-resolution. To enable higher-quality previews with [TAESD](https://github.com/madebyollin/taesd), download the [taesd_decoder.pth, taesdxl_decoder.pth, taesd3_decoder.pth and taef1_decoder.pth](https://github.com/madebyollin/taesd/) and place them in the `models/vae_approx` folder. Once they're installed, restart ComfyUI and launch it with `--preview-method taesd` to enable high-quality previews.
This will let you use: pip3.10 to install all the dependencies.
## How to use TLS/SSL?
Generate a self-signed certificate (not appropriate for shared/production use) and key by running the command: `openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 3650 -nodes -subj "/C=XX/ST=StateName/L=CityName/O=CompanyName/OU=CompanySectionName/CN=CommonNameOrHostname"`
## How to increase generation speed?
Use `--tls-keyfile key.pem --tls-certfile cert.pem` to enable TLS/SSL, the app will now be accessible with `https://...` instead of `http://...`.
Make sure you use the regular loaders/Load Checkpoint node to load checkpoints. It will auto pick the right settings depending on your GPU.
You can set this command line setting to disable the upcasting to fp32 in some cross attention operations which will increase your speed. Note that this will very likely give you black images on SD2.x models. If you use xformers this option does not do anything.
```--dont-upcast-attention```
> Note: Windows users can use [alexisrolland/docker-openssl](https://github.com/alexisrolland/docker-openssl) or one of the [3rd party binary distributions](https://wiki.openssl.org/index.php/Binaries) to run the command example above.
<br/><br/>If you use a container, note that the volume mount `-v` can be a relative path so `... -v ".\:/openssl-certs" ...` would create the key & cert files in the current directory of your command prompt or powershell terminal.
## Support and dev channel
[Matrix space: #comfyui_space:matrix.org](https://app.element.io/#/room/%23comfyui_space%3Amatrix.org) (it's like discord but open source).
See also: [https://www.comfy.org/](https://www.comfy.org/)
## Frontend Development
As of August 15, 2024, we have transitioned to a new frontend, which is now hosted in a separate repository: [ComfyUI Frontend](https://github.com/Comfy-Org/ComfyUI_frontend). This repository now hosts the compiled JS (from TS/Vue) under the `web/` directory.
### Reporting Issues and Requesting Features
For any bugs, issues, or feature requests related to the frontend, please use the [ComfyUI Frontend repository](https://github.com/Comfy-Org/ComfyUI_frontend). This will help us manage and address frontend-specific concerns more efficiently.
### Using the Latest Frontend
The new frontend is now the default for ComfyUI. However, please note:
1. The frontend in the main ComfyUI repository is updated weekly.
2. Daily releases are available in the separate frontend repository.
To use the most up-to-date frontend version:
1. For the latest daily release, launch ComfyUI with this command line argument:
```
--front-end-version Comfy-Org/ComfyUI_frontend@latest
```
2. For a specific version, replace `latest` with the desired version number:
```
--front-end-version Comfy-Org/ComfyUI_frontend@1.2.2
```
This approach allows you to easily switch between the stable weekly release and the cutting-edge daily updates, or even specific versions for testing purposes.
### Accessing the Legacy Frontend
If you need to use the legacy frontend for any reason, you can access it using the following command line argument:
```
--front-end-version Comfy-Org/ComfyUI_legacy_frontend@latest
```
This will use a snapshot of the legacy frontend preserved in the [ComfyUI Legacy Frontend repository](https://github.com/Comfy-Org/ComfyUI_legacy_frontend).
# QA
### Why did you make this?
### Which GPU should I buy for this?
I wanted to learn how Stable Diffusion worked in detail. I also wanted something clean and powerful that would let me experiment with SD without restrictions.
[See this page for some recommendations](https://github.com/comfyanonymous/ComfyUI/wiki/Which-GPU-should-I-buy-for-ComfyUI)
### Who is this for?
This is for anyone that wants to make complex workflows with SD or that wants to learn more how SD works. The interface follows closely how SD works and the code should be much more simple to understand than other SD UIs.

View File

@ -0,0 +1,3 @@
# ComfyUI Internal Routes
All routes under the `/internal` path are designated for **internal use by ComfyUI only**. These routes are not intended for use by external applications may change at any time without notice.

View File

@ -0,0 +1,44 @@
from aiohttp import web
from typing import Optional
from folder_paths import models_dir, user_directory, output_directory
from api_server.services.file_service import FileService
import app.logger
class InternalRoutes:
'''
The top level web router for internal routes: /internal/*
The endpoints here should NOT be depended upon. It is for ComfyUI frontend use only.
Check README.md for more information.
'''
def __init__(self):
self.routes: web.RouteTableDef = web.RouteTableDef()
self._app: Optional[web.Application] = None
self.file_service = FileService({
"models": models_dir,
"user": user_directory,
"output": output_directory
})
def setup_routes(self):
@self.routes.get('/files')
async def list_files(request):
directory_key = request.query.get('directory', '')
try:
file_list = self.file_service.list_files(directory_key)
return web.json_response({"files": file_list})
except ValueError as e:
return web.json_response({"error": str(e)}, status=400)
except Exception as e:
return web.json_response({"error": str(e)}, status=500)
@self.routes.get('/logs')
async def get_logs(request):
return web.json_response(app.logger.get_logs())
def get_app(self):
if self._app is None:
self._app = web.Application()
self.setup_routes()
self._app.add_routes(self.routes)
return self._app

View File

@ -0,0 +1,13 @@
from typing import Dict, List, Optional
from api_server.utils.file_operations import FileSystemOperations, FileSystemItem
class FileService:
def __init__(self, allowed_directories: Dict[str, str], file_system_ops: Optional[FileSystemOperations] = None):
self.allowed_directories: Dict[str, str] = allowed_directories
self.file_system_ops: FileSystemOperations = file_system_ops or FileSystemOperations()
def list_files(self, directory_key: str) -> List[FileSystemItem]:
if directory_key not in self.allowed_directories:
raise ValueError("Invalid directory key")
directory_path: str = self.allowed_directories[directory_key]
return self.file_system_ops.walk_directory(directory_path)

View File

@ -0,0 +1,42 @@
import os
from typing import List, Union, TypedDict, Literal
from typing_extensions import TypeGuard
class FileInfo(TypedDict):
name: str
path: str
type: Literal["file"]
size: int
class DirectoryInfo(TypedDict):
name: str
path: str
type: Literal["directory"]
FileSystemItem = Union[FileInfo, DirectoryInfo]
def is_file_info(item: FileSystemItem) -> TypeGuard[FileInfo]:
return item["type"] == "file"
class FileSystemOperations:
@staticmethod
def walk_directory(directory: str) -> List[FileSystemItem]:
file_list: List[FileSystemItem] = []
for root, dirs, files in os.walk(directory):
for name in files:
file_path = os.path.join(root, name)
relative_path = os.path.relpath(file_path, directory)
file_list.append({
"name": name,
"path": relative_path,
"type": "file",
"size": os.path.getsize(file_path)
})
for name in dirs:
dir_path = os.path.join(root, name)
relative_path = os.path.relpath(dir_path, directory)
file_list.append({
"name": name,
"path": relative_path,
"type": "directory"
})
return file_list

54
app/app_settings.py Normal file
View File

@ -0,0 +1,54 @@
import os
import json
from aiohttp import web
class AppSettings():
def __init__(self, user_manager):
self.user_manager = user_manager
def get_settings(self, request):
file = self.user_manager.get_request_user_filepath(
request, "comfy.settings.json")
if os.path.isfile(file):
with open(file) as f:
return json.load(f)
else:
return {}
def save_settings(self, request, settings):
file = self.user_manager.get_request_user_filepath(
request, "comfy.settings.json")
with open(file, "w") as f:
f.write(json.dumps(settings, indent=4))
def add_routes(self, routes):
@routes.get("/settings")
async def get_settings(request):
return web.json_response(self.get_settings(request))
@routes.get("/settings/{id}")
async def get_setting(request):
value = None
settings = self.get_settings(request)
setting_id = request.match_info.get("id", None)
if setting_id and setting_id in settings:
value = settings[setting_id]
return web.json_response(value)
@routes.post("/settings")
async def post_settings(request):
settings = self.get_settings(request)
new_settings = await request.json()
self.save_settings(request, {**settings, **new_settings})
return web.Response(status=200)
@routes.post("/settings/{id}")
async def post_setting(request):
setting_id = request.match_info.get("id", None)
if not setting_id:
return web.Response(status=400)
settings = self.get_settings(request)
settings[setting_id] = await request.json()
self.save_settings(request, settings)
return web.Response(status=200)

195
app/frontend_management.py Normal file
View File

@ -0,0 +1,195 @@
from __future__ import annotations
import argparse
import logging
import os
import re
import tempfile
import zipfile
from dataclasses import dataclass
from functools import cached_property
from pathlib import Path
from typing import TypedDict, Optional
import requests
from typing_extensions import NotRequired
from comfy.cli_args import DEFAULT_VERSION_STRING
REQUEST_TIMEOUT = 10 # seconds
class Asset(TypedDict):
url: str
class Release(TypedDict):
id: int
tag_name: str
name: str
prerelease: bool
created_at: str
published_at: str
body: str
assets: NotRequired[list[Asset]]
@dataclass
class FrontEndProvider:
owner: str
repo: str
@property
def folder_name(self) -> str:
return f"{self.owner}_{self.repo}"
@property
def release_url(self) -> str:
return f"https://api.github.com/repos/{self.owner}/{self.repo}/releases"
@cached_property
def all_releases(self) -> list[Release]:
releases = []
api_url = self.release_url
while api_url:
response = requests.get(api_url, timeout=REQUEST_TIMEOUT)
response.raise_for_status() # Raises an HTTPError if the response was an error
releases.extend(response.json())
# GitHub uses the Link header to provide pagination links. Check if it exists and update api_url accordingly.
if "next" in response.links:
api_url = response.links["next"]["url"]
else:
api_url = None
return releases
@cached_property
def latest_release(self) -> Release:
latest_release_url = f"{self.release_url}/latest"
response = requests.get(latest_release_url, timeout=REQUEST_TIMEOUT)
response.raise_for_status() # Raises an HTTPError if the response was an error
return response.json()
def get_release(self, version: str) -> Release:
if version == "latest":
return self.latest_release
else:
for release in self.all_releases:
if release["tag_name"] in [version, f"v{version}"]:
return release
raise ValueError(f"Version {version} not found in releases")
def download_release_asset_zip(release: Release, destination_path: str) -> None:
"""Download dist.zip from github release."""
asset_url = None
for asset in release.get("assets", []):
if asset["name"] == "dist.zip":
asset_url = asset["url"]
break
if not asset_url:
raise ValueError("dist.zip not found in the release assets")
# Use a temporary file to download the zip content
with tempfile.TemporaryFile() as tmp_file:
headers = {"Accept": "application/octet-stream"}
response = requests.get(
asset_url, headers=headers, allow_redirects=True, timeout=REQUEST_TIMEOUT
)
response.raise_for_status() # Ensure we got a successful response
# Write the content to the temporary file
tmp_file.write(response.content)
# Go back to the beginning of the temporary file
tmp_file.seek(0)
# Extract the zip file content to the destination path
with zipfile.ZipFile(tmp_file, "r") as zip_ref:
zip_ref.extractall(destination_path)
class FrontendManager:
DEFAULT_FRONTEND_PATH = str(Path(__file__).parents[1] / "web")
CUSTOM_FRONTENDS_ROOT = str(Path(__file__).parents[1] / "web_custom_versions")
@classmethod
def parse_version_string(cls, value: str) -> tuple[str, str, str]:
"""
Args:
value (str): The version string to parse.
Returns:
tuple[str, str]: A tuple containing provider name and version.
Raises:
argparse.ArgumentTypeError: If the version string is invalid.
"""
VERSION_PATTERN = r"^([a-zA-Z0-9][a-zA-Z0-9-]{0,38})/([a-zA-Z0-9_.-]+)@(v?\d+\.\d+\.\d+|latest)$"
match_result = re.match(VERSION_PATTERN, value)
if match_result is None:
raise argparse.ArgumentTypeError(f"Invalid version string: {value}")
return match_result.group(1), match_result.group(2), match_result.group(3)
@classmethod
def init_frontend_unsafe(cls, version_string: str, provider: Optional[FrontEndProvider] = None) -> str:
"""
Initializes the frontend for the specified version.
Args:
version_string (str): The version string.
provider (FrontEndProvider, optional): The provider to use. Defaults to None.
Returns:
str: The path to the initialized frontend.
Raises:
Exception: If there is an error during the initialization process.
main error source might be request timeout or invalid URL.
"""
if version_string == DEFAULT_VERSION_STRING:
return cls.DEFAULT_FRONTEND_PATH
repo_owner, repo_name, version = cls.parse_version_string(version_string)
provider = provider or FrontEndProvider(repo_owner, repo_name)
release = provider.get_release(version)
semantic_version = release["tag_name"].lstrip("v")
web_root = str(
Path(cls.CUSTOM_FRONTENDS_ROOT) / provider.folder_name / semantic_version
)
if not os.path.exists(web_root):
try:
os.makedirs(web_root, exist_ok=True)
logging.info(
"Downloading frontend(%s) version(%s) to (%s)",
provider.folder_name,
semantic_version,
web_root,
)
logging.debug(release)
download_release_asset_zip(release, destination_path=web_root)
finally:
# Clean up the directory if it is empty, i.e. the download failed
if not os.listdir(web_root):
os.rmdir(web_root)
return web_root
@classmethod
def init_frontend(cls, version_string: str) -> str:
"""
Initializes the frontend with the specified version string.
Args:
version_string (str): The version string to initialize the frontend with.
Returns:
str: The path of the initialized frontend.
"""
try:
return cls.init_frontend_unsafe(version_string)
except Exception as e:
logging.error("Failed to initialize frontend: %s", e)
logging.info("Falling back to the default frontend.")
return cls.DEFAULT_FRONTEND_PATH

31
app/logger.py Normal file
View File

@ -0,0 +1,31 @@
import logging
from logging.handlers import MemoryHandler
from collections import deque
logs = None
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
def get_logs():
return "\n".join([formatter.format(x) for x in logs])
def setup_logger(verbose: bool = False, capacity: int = 300):
global logs
if logs:
return
# Setup default global logger
logger = logging.getLogger()
logger.setLevel(logging.DEBUG if verbose else logging.INFO)
stream_handler = logging.StreamHandler()
stream_handler.setFormatter(logging.Formatter("%(message)s"))
logger.addHandler(stream_handler)
# Create a memory handler with a deque as its buffer
logs = deque(maxlen=capacity)
memory_handler = MemoryHandler(capacity, flushLevel=logging.INFO)
memory_handler.buffer = logs
memory_handler.setFormatter(formatter)
logger.addHandler(memory_handler)

205
app/user_manager.py Normal file
View File

@ -0,0 +1,205 @@
import json
import os
import re
import uuid
import glob
import shutil
from aiohttp import web
from comfy.cli_args import args
from folder_paths import user_directory
from .app_settings import AppSettings
default_user = "default"
users_file = os.path.join(user_directory, "users.json")
class UserManager():
def __init__(self):
global user_directory
self.settings = AppSettings(self)
if not os.path.exists(user_directory):
os.mkdir(user_directory)
if not args.multi_user:
print("****** User settings have been changed to be stored on the server instead of browser storage. ******")
print("****** For multi-user setups add the --multi-user CLI argument to enable multiple user profiles. ******")
if args.multi_user:
if os.path.isfile(users_file):
with open(users_file) as f:
self.users = json.load(f)
else:
self.users = {}
else:
self.users = {"default": "default"}
def get_request_user_id(self, request):
user = "default"
if args.multi_user and "comfy-user" in request.headers:
user = request.headers["comfy-user"]
if user not in self.users:
raise KeyError("Unknown user: " + user)
return user
def get_request_user_filepath(self, request, file, type="userdata", create_dir=True):
global user_directory
if type == "userdata":
root_dir = user_directory
else:
raise KeyError("Unknown filepath type:" + type)
user = self.get_request_user_id(request)
path = user_root = os.path.abspath(os.path.join(root_dir, user))
# prevent leaving /{type}
if os.path.commonpath((root_dir, user_root)) != root_dir:
return None
if file is not None:
# prevent leaving /{type}/{user}
path = os.path.abspath(os.path.join(user_root, file))
if os.path.commonpath((user_root, path)) != user_root:
return None
parent = os.path.split(path)[0]
if create_dir and not os.path.exists(parent):
os.makedirs(parent, exist_ok=True)
return path
def add_user(self, name):
name = name.strip()
if not name:
raise ValueError("username not provided")
user_id = re.sub("[^a-zA-Z0-9-_]+", '-', name)
user_id = user_id + "_" + str(uuid.uuid4())
self.users[user_id] = name
global users_file
with open(users_file, "w") as f:
json.dump(self.users, f)
return user_id
def add_routes(self, routes):
self.settings.add_routes(routes)
@routes.get("/users")
async def get_users(request):
if args.multi_user:
return web.json_response({"storage": "server", "users": self.users})
else:
user_dir = self.get_request_user_filepath(request, None, create_dir=False)
return web.json_response({
"storage": "server",
"migrated": os.path.exists(user_dir)
})
@routes.post("/users")
async def post_users(request):
body = await request.json()
username = body["username"]
if username in self.users.values():
return web.json_response({"error": "Duplicate username."}, status=400)
user_id = self.add_user(username)
return web.json_response(user_id)
@routes.get("/userdata")
async def listuserdata(request):
directory = request.rel_url.query.get('dir', '')
if not directory:
return web.Response(status=400)
path = self.get_request_user_filepath(request, directory)
if not path:
return web.Response(status=403)
if not os.path.exists(path):
return web.Response(status=404)
recurse = request.rel_url.query.get('recurse', '').lower() == "true"
results = glob.glob(os.path.join(
glob.escape(path), '**/*'), recursive=recurse)
results = [os.path.relpath(x, path) for x in results if os.path.isfile(x)]
split_path = request.rel_url.query.get('split', '').lower() == "true"
if split_path:
results = [[x] + x.split(os.sep) for x in results]
return web.json_response(results)
def get_user_data_path(request, check_exists = False, param = "file"):
file = request.match_info.get(param, None)
if not file:
return web.Response(status=400)
path = self.get_request_user_filepath(request, file)
if not path:
return web.Response(status=403)
if check_exists and not os.path.exists(path):
return web.Response(status=404)
return path
@routes.get("/userdata/{file}")
async def getuserdata(request):
path = get_user_data_path(request, check_exists=True)
if not isinstance(path, str):
return path
return web.FileResponse(path)
@routes.post("/userdata/{file}")
async def post_userdata(request):
path = get_user_data_path(request)
if not isinstance(path, str):
return path
overwrite = request.query["overwrite"] != "false"
if not overwrite and os.path.exists(path):
return web.Response(status=409)
body = await request.read()
with open(path, "wb") as f:
f.write(body)
resp = os.path.relpath(path, self.get_request_user_filepath(request, None))
return web.json_response(resp)
@routes.delete("/userdata/{file}")
async def delete_userdata(request):
path = get_user_data_path(request, check_exists=True)
if not isinstance(path, str):
return path
os.remove(path)
return web.Response(status=204)
@routes.post("/userdata/{file}/move/{dest}")
async def move_userdata(request):
source = get_user_data_path(request, check_exists=True)
if not isinstance(source, str):
return source
dest = get_user_data_path(request, check_exists=False, param="dest")
if not isinstance(source, str):
return dest
overwrite = request.query["overwrite"] != "false"
if not overwrite and os.path.exists(dest):
return web.Response(status=409)
print(f"moving '{source}' -> '{dest}'")
shutil.move(source, dest)
resp = os.path.relpath(dest, self.get_request_user_filepath(request, None))
return web.json_response(resp)

View File

@ -0,0 +1,13 @@
import pickle
load = pickle.load
class Empty:
pass
class Unpickler(pickle.Unpickler):
def find_class(self, module, name):
#TODO: safe unpickle
if module.startswith("pytorch_lightning"):
return Empty
return super().find_class(module, name)

View File

@ -6,17 +6,54 @@ import torch as th
import torch.nn as nn
from ..ldm.modules.diffusionmodules.util import (
conv_nd,
linear,
zero_module,
timestep_embedding,
)
from ..ldm.modules.attention import SpatialTransformer
from ..ldm.modules.diffusionmodules.openaimodel import UNetModel, TimestepEmbedSequential, ResBlock, Downsample, AttentionBlock
from ..ldm.models.diffusion.ddpm import LatentDiffusion
from ..ldm.util import log_txt_as_img, exists, instantiate_from_config
from ..ldm.modules.diffusionmodules.openaimodel import UNetModel, TimestepEmbedSequential, ResBlock, Downsample
from ..ldm.util import exists
from .control_types import UNION_CONTROLNET_TYPES
from collections import OrderedDict
import comfy.ops
from comfy.ldm.modules.attention import optimized_attention
class OptimizedAttention(nn.Module):
def __init__(self, c, nhead, dropout=0.0, dtype=None, device=None, operations=None):
super().__init__()
self.heads = nhead
self.c = c
self.in_proj = operations.Linear(c, c * 3, bias=True, dtype=dtype, device=device)
self.out_proj = operations.Linear(c, c, bias=True, dtype=dtype, device=device)
def forward(self, x):
x = self.in_proj(x)
q, k, v = x.split(self.c, dim=2)
out = optimized_attention(q, k, v, self.heads)
return self.out_proj(out)
class QuickGELU(nn.Module):
def forward(self, x: torch.Tensor):
return x * torch.sigmoid(1.702 * x)
class ResBlockUnionControlnet(nn.Module):
def __init__(self, dim, nhead, dtype=None, device=None, operations=None):
super().__init__()
self.attn = OptimizedAttention(dim, nhead, dtype=dtype, device=device, operations=operations)
self.ln_1 = operations.LayerNorm(dim, dtype=dtype, device=device)
self.mlp = nn.Sequential(
OrderedDict([("c_fc", operations.Linear(dim, dim * 4, dtype=dtype, device=device)), ("gelu", QuickGELU()),
("c_proj", operations.Linear(dim * 4, dim, dtype=dtype, device=device))]))
self.ln_2 = operations.LayerNorm(dim, dtype=dtype, device=device)
def attention(self, x: torch.Tensor):
return self.attn(x)
def forward(self, x: torch.Tensor):
x = x + self.attention(self.ln_1(x))
x = x + self.mlp(self.ln_2(x))
return x
class ControlledUnetModel(UNetModel):
#implemented in the ldm unet
@ -30,13 +67,13 @@ class ControlNet(nn.Module):
model_channels,
hint_channels,
num_res_blocks,
attention_resolutions,
dropout=0,
channel_mult=(1, 2, 4, 8),
conv_resample=True,
dims=2,
num_classes=None,
use_checkpoint=False,
use_fp16=False,
dtype=torch.float32,
num_heads=-1,
num_head_channels=-1,
num_heads_upsample=-1,
@ -52,8 +89,17 @@ class ControlNet(nn.Module):
num_attention_blocks=None,
disable_middle_self_attn=False,
use_linear_in_transformer=False,
adm_in_channels=None,
transformer_depth_middle=None,
transformer_depth_output=None,
attn_precision=None,
union_controlnet_num_control_type=None,
device=None,
operations=comfy.ops.disable_weight_init,
**kwargs,
):
super().__init__()
assert use_spatial_transformer == True, "use_spatial_transformer has to be true"
if use_spatial_transformer:
assert context_dim is not None, 'Fool!! You forgot to include the dimension of your cross-attention conditioning...'
@ -76,6 +122,7 @@ class ControlNet(nn.Module):
self.image_size = image_size
self.in_channels = in_channels
self.model_channels = model_channels
if isinstance(num_res_blocks, int):
self.num_res_blocks = len(channel_mult) * [num_res_blocks]
else:
@ -83,23 +130,22 @@ class ControlNet(nn.Module):
raise ValueError("provide num_res_blocks either as an int (globally constant) or "
"as a list/tuple (per-level) with the same length as channel_mult")
self.num_res_blocks = num_res_blocks
if disable_self_attentions is not None:
# should be a list of booleans, indicating whether to disable self-attention in TransformerBlocks or not
assert len(disable_self_attentions) == len(channel_mult)
if num_attention_blocks is not None:
assert len(num_attention_blocks) == len(self.num_res_blocks)
assert all(map(lambda i: self.num_res_blocks[i] >= num_attention_blocks[i], range(len(num_attention_blocks))))
print(f"Constructor of UNetModel received num_attention_blocks={num_attention_blocks}. "
f"This option has LESS priority than attention_resolutions {attention_resolutions}, "
f"i.e., in cases where num_attention_blocks[i] > 0 but 2**i not in attention_resolutions, "
f"attention will still not be set.")
self.attention_resolutions = attention_resolutions
transformer_depth = transformer_depth[:]
self.dropout = dropout
self.channel_mult = channel_mult
self.conv_resample = conv_resample
self.num_classes = num_classes
self.use_checkpoint = use_checkpoint
self.dtype = th.float16 if use_fp16 else th.float32
self.dtype = dtype
self.num_heads = num_heads
self.num_head_channels = num_head_channels
self.num_heads_upsample = num_heads_upsample
@ -107,36 +153,54 @@ class ControlNet(nn.Module):
time_embed_dim = model_channels * 4
self.time_embed = nn.Sequential(
linear(model_channels, time_embed_dim),
operations.Linear(model_channels, time_embed_dim, dtype=self.dtype, device=device),
nn.SiLU(),
linear(time_embed_dim, time_embed_dim),
operations.Linear(time_embed_dim, time_embed_dim, dtype=self.dtype, device=device),
)
if self.num_classes is not None:
if isinstance(self.num_classes, int):
self.label_emb = nn.Embedding(num_classes, time_embed_dim)
elif self.num_classes == "continuous":
print("setting up linear c_adm embedding layer")
self.label_emb = nn.Linear(1, time_embed_dim)
elif self.num_classes == "sequential":
assert adm_in_channels is not None
self.label_emb = nn.Sequential(
nn.Sequential(
operations.Linear(adm_in_channels, time_embed_dim, dtype=self.dtype, device=device),
nn.SiLU(),
operations.Linear(time_embed_dim, time_embed_dim, dtype=self.dtype, device=device),
)
)
else:
raise ValueError()
self.input_blocks = nn.ModuleList(
[
TimestepEmbedSequential(
conv_nd(dims, in_channels, model_channels, 3, padding=1)
operations.conv_nd(dims, in_channels, model_channels, 3, padding=1, dtype=self.dtype, device=device)
)
]
)
self.zero_convs = nn.ModuleList([self.make_zero_conv(model_channels)])
self.zero_convs = nn.ModuleList([self.make_zero_conv(model_channels, operations=operations, dtype=self.dtype, device=device)])
self.input_hint_block = TimestepEmbedSequential(
conv_nd(dims, hint_channels, 16, 3, padding=1),
operations.conv_nd(dims, hint_channels, 16, 3, padding=1, dtype=self.dtype, device=device),
nn.SiLU(),
conv_nd(dims, 16, 16, 3, padding=1),
operations.conv_nd(dims, 16, 16, 3, padding=1, dtype=self.dtype, device=device),
nn.SiLU(),
conv_nd(dims, 16, 32, 3, padding=1, stride=2),
operations.conv_nd(dims, 16, 32, 3, padding=1, stride=2, dtype=self.dtype, device=device),
nn.SiLU(),
conv_nd(dims, 32, 32, 3, padding=1),
operations.conv_nd(dims, 32, 32, 3, padding=1, dtype=self.dtype, device=device),
nn.SiLU(),
conv_nd(dims, 32, 96, 3, padding=1, stride=2),
operations.conv_nd(dims, 32, 96, 3, padding=1, stride=2, dtype=self.dtype, device=device),
nn.SiLU(),
conv_nd(dims, 96, 96, 3, padding=1),
operations.conv_nd(dims, 96, 96, 3, padding=1, dtype=self.dtype, device=device),
nn.SiLU(),
conv_nd(dims, 96, 256, 3, padding=1, stride=2),
operations.conv_nd(dims, 96, 256, 3, padding=1, stride=2, dtype=self.dtype, device=device),
nn.SiLU(),
zero_module(conv_nd(dims, 256, model_channels, 3, padding=1))
operations.conv_nd(dims, 256, model_channels, 3, padding=1, dtype=self.dtype, device=device)
)
self._feature_size = model_channels
@ -154,10 +218,14 @@ class ControlNet(nn.Module):
dims=dims,
use_checkpoint=use_checkpoint,
use_scale_shift_norm=use_scale_shift_norm,
dtype=self.dtype,
device=device,
operations=operations,
)
]
ch = mult * model_channels
if ds in attention_resolutions:
num_transformers = transformer_depth.pop(0)
if num_transformers > 0:
if num_head_channels == -1:
dim_head = ch // num_heads
else:
@ -173,20 +241,14 @@ class ControlNet(nn.Module):
if not exists(num_attention_blocks) or nr < num_attention_blocks[level]:
layers.append(
AttentionBlock(
ch,
use_checkpoint=use_checkpoint,
num_heads=num_heads,
num_head_channels=dim_head,
use_new_attention_order=use_new_attention_order,
) if not use_spatial_transformer else SpatialTransformer(
ch, num_heads, dim_head, depth=transformer_depth, context_dim=context_dim,
SpatialTransformer(
ch, num_heads, dim_head, depth=num_transformers, context_dim=context_dim,
disable_self_attn=disabled_sa, use_linear=use_linear_in_transformer,
use_checkpoint=use_checkpoint
use_checkpoint=use_checkpoint, attn_precision=attn_precision, dtype=self.dtype, device=device, operations=operations
)
)
self.input_blocks.append(TimestepEmbedSequential(*layers))
self.zero_convs.append(self.make_zero_conv(ch))
self.zero_convs.append(self.make_zero_conv(ch, operations=operations, dtype=self.dtype, device=device))
self._feature_size += ch
input_block_chans.append(ch)
if level != len(channel_mult) - 1:
@ -202,16 +264,19 @@ class ControlNet(nn.Module):
use_checkpoint=use_checkpoint,
use_scale_shift_norm=use_scale_shift_norm,
down=True,
dtype=self.dtype,
device=device,
operations=operations
)
if resblock_updown
else Downsample(
ch, conv_resample, dims=dims, out_channels=out_ch
ch, conv_resample, dims=dims, out_channels=out_ch, dtype=self.dtype, device=device, operations=operations
)
)
)
ch = out_ch
input_block_chans.append(ch)
self.zero_convs.append(self.make_zero_conv(ch))
self.zero_convs.append(self.make_zero_conv(ch, operations=operations, dtype=self.dtype, device=device))
ds *= 2
self._feature_size += ch
@ -223,7 +288,7 @@ class ControlNet(nn.Module):
if legacy:
#num_heads = 1
dim_head = ch // num_heads if use_spatial_transformer else num_head_channels
self.middle_block = TimestepEmbedSequential(
mid_block = [
ResBlock(
ch,
time_embed_dim,
@ -231,17 +296,15 @@ class ControlNet(nn.Module):
dims=dims,
use_checkpoint=use_checkpoint,
use_scale_shift_norm=use_scale_shift_norm,
),
AttentionBlock(
ch,
use_checkpoint=use_checkpoint,
num_heads=num_heads,
num_head_channels=dim_head,
use_new_attention_order=use_new_attention_order,
) if not use_spatial_transformer else SpatialTransformer( # always uses a self-attn
ch, num_heads, dim_head, depth=transformer_depth, context_dim=context_dim,
dtype=self.dtype,
device=device,
operations=operations
)]
if transformer_depth_middle >= 0:
mid_block += [SpatialTransformer( # always uses a self-attn
ch, num_heads, dim_head, depth=transformer_depth_middle, context_dim=context_dim,
disable_self_attn=disable_middle_self_attn, use_linear=use_linear_in_transformer,
use_checkpoint=use_checkpoint
use_checkpoint=use_checkpoint, attn_precision=attn_precision, dtype=self.dtype, device=device, operations=operations
),
ResBlock(
ch,
@ -250,23 +313,114 @@ class ControlNet(nn.Module):
dims=dims,
use_checkpoint=use_checkpoint,
use_scale_shift_norm=use_scale_shift_norm,
),
)
self.middle_block_out = self.make_zero_conv(ch)
dtype=self.dtype,
device=device,
operations=operations
)]
self.middle_block = TimestepEmbedSequential(*mid_block)
self.middle_block_out = self.make_zero_conv(ch, operations=operations, dtype=self.dtype, device=device)
self._feature_size += ch
def make_zero_conv(self, channels):
return TimestepEmbedSequential(zero_module(conv_nd(self.dims, channels, channels, 1, padding=0)))
if union_controlnet_num_control_type is not None:
self.num_control_type = union_controlnet_num_control_type
num_trans_channel = 320
num_trans_head = 8
num_trans_layer = 1
num_proj_channel = 320
# task_scale_factor = num_trans_channel ** 0.5
self.task_embedding = nn.Parameter(torch.empty(self.num_control_type, num_trans_channel, dtype=self.dtype, device=device))
def forward(self, x, hint, timesteps, context, **kwargs):
t_emb = timestep_embedding(timesteps, self.model_channels, repeat_only=False)
self.transformer_layes = nn.Sequential(*[ResBlockUnionControlnet(num_trans_channel, num_trans_head, dtype=self.dtype, device=device, operations=operations) for _ in range(num_trans_layer)])
self.spatial_ch_projs = operations.Linear(num_trans_channel, num_proj_channel, dtype=self.dtype, device=device)
#-----------------------------------------------------------------------------------------------------
control_add_embed_dim = 256
class ControlAddEmbedding(nn.Module):
def __init__(self, in_dim, out_dim, num_control_type, dtype=None, device=None, operations=None):
super().__init__()
self.num_control_type = num_control_type
self.in_dim = in_dim
self.linear_1 = operations.Linear(in_dim * num_control_type, out_dim, dtype=dtype, device=device)
self.linear_2 = operations.Linear(out_dim, out_dim, dtype=dtype, device=device)
def forward(self, control_type, dtype, device):
c_type = torch.zeros((self.num_control_type,), device=device)
c_type[control_type] = 1.0
c_type = timestep_embedding(c_type.flatten(), self.in_dim, repeat_only=False).to(dtype).reshape((-1, self.num_control_type * self.in_dim))
return self.linear_2(torch.nn.functional.silu(self.linear_1(c_type)))
self.control_add_embedding = ControlAddEmbedding(control_add_embed_dim, time_embed_dim, self.num_control_type, dtype=self.dtype, device=device, operations=operations)
else:
self.task_embedding = None
self.control_add_embedding = None
def union_controlnet_merge(self, hint, control_type, emb, context):
# Equivalent to: https://github.com/xinsir6/ControlNetPlus/tree/main
inputs = []
condition_list = []
for idx in range(min(1, len(control_type))):
controlnet_cond = self.input_hint_block(hint[idx], emb, context)
feat_seq = torch.mean(controlnet_cond, dim=(2, 3))
if idx < len(control_type):
feat_seq += self.task_embedding[control_type[idx]].to(dtype=feat_seq.dtype, device=feat_seq.device)
inputs.append(feat_seq.unsqueeze(1))
condition_list.append(controlnet_cond)
x = torch.cat(inputs, dim=1)
x = self.transformer_layes(x)
controlnet_cond_fuser = None
for idx in range(len(control_type)):
alpha = self.spatial_ch_projs(x[:, idx])
alpha = alpha.unsqueeze(-1).unsqueeze(-1)
o = condition_list[idx] + alpha
if controlnet_cond_fuser is None:
controlnet_cond_fuser = o
else:
controlnet_cond_fuser += o
return controlnet_cond_fuser
def make_zero_conv(self, channels, operations=None, dtype=None, device=None):
return TimestepEmbedSequential(operations.conv_nd(self.dims, channels, channels, 1, padding=0, dtype=dtype, device=device))
def forward(self, x, hint, timesteps, context, y=None, **kwargs):
t_emb = timestep_embedding(timesteps, self.model_channels, repeat_only=False).to(x.dtype)
emb = self.time_embed(t_emb)
guided_hint = self.input_hint_block(hint, emb, context)
guided_hint = None
if self.control_add_embedding is not None: #Union Controlnet
control_type = kwargs.get("control_type", [])
outs = []
if any([c >= self.num_control_type for c in control_type]):
max_type = max(control_type)
max_type_name = {
v: k for k, v in UNION_CONTROLNET_TYPES.items()
}[max_type]
raise ValueError(
f"Control type {max_type_name}({max_type}) is out of range for the number of control types" +
f"({self.num_control_type}) supported.\n" +
"Please consider using the ProMax ControlNet Union model.\n" +
"https://huggingface.co/xinsir/controlnet-union-sdxl-1.0/tree/main"
)
h = x.type(self.dtype)
emb += self.control_add_embedding(control_type, emb.dtype, emb.device)
if len(control_type) > 0:
if len(hint.shape) < 5:
hint = hint.unsqueeze(dim=0)
guided_hint = self.union_controlnet_merge(hint, control_type, emb, context)
if guided_hint is None:
guided_hint = self.input_hint_block(hint, emb, context)
out_output = []
out_middle = []
hs = []
if self.num_classes is not None:
assert y.shape[0] == x.shape[0]
emb = emb + self.label_emb(y)
h = x
for module, zero_conv in zip(self.input_blocks, self.zero_convs):
if guided_hint is not None:
h = module(h, emb, context)
@ -274,10 +428,10 @@ class ControlNet(nn.Module):
guided_hint = None
else:
h = module(h, emb, context)
outs.append(zero_conv(h, emb, context))
out_output.append(zero_conv(h, emb, context))
h = self.middle_block(h, emb, context)
outs.append(self.middle_block_out(h, emb, context))
out_middle.append(self.middle_block_out(h, emb, context))
return outs
return {"middle": out_middle, "output": out_output}

View File

@ -0,0 +1,10 @@
UNION_CONTROLNET_TYPES = {
"openpose": 0,
"depth": 1,
"hed/pidi/scribble/ted": 2,
"canny/lineart/anime_lineart/mlsd": 3,
"normal": 4,
"segment": 5,
"tile": 6,
"repaint": 7,
}

77
comfy/cldm/mmdit.py Normal file
View File

@ -0,0 +1,77 @@
import torch
from typing import Dict, Optional
import comfy.ldm.modules.diffusionmodules.mmdit
class ControlNet(comfy.ldm.modules.diffusionmodules.mmdit.MMDiT):
def __init__(
self,
num_blocks = None,
dtype = None,
device = None,
operations = None,
**kwargs,
):
super().__init__(dtype=dtype, device=device, operations=operations, final_layer=False, num_blocks=num_blocks, **kwargs)
# controlnet_blocks
self.controlnet_blocks = torch.nn.ModuleList([])
for _ in range(len(self.joint_blocks)):
self.controlnet_blocks.append(operations.Linear(self.hidden_size, self.hidden_size, device=device, dtype=dtype))
self.pos_embed_input = comfy.ldm.modules.diffusionmodules.mmdit.PatchEmbed(
None,
self.patch_size,
self.in_channels,
self.hidden_size,
bias=True,
strict_img_size=False,
dtype=dtype,
device=device,
operations=operations
)
def forward(
self,
x: torch.Tensor,
timesteps: torch.Tensor,
y: Optional[torch.Tensor] = None,
context: Optional[torch.Tensor] = None,
hint = None,
) -> torch.Tensor:
#weird sd3 controlnet specific stuff
y = torch.zeros_like(y)
if self.context_processor is not None:
context = self.context_processor(context)
hw = x.shape[-2:]
x = self.x_embedder(x) + self.cropped_pos_embed(hw, device=x.device).to(dtype=x.dtype, device=x.device)
x += self.pos_embed_input(hint)
c = self.t_embedder(timesteps, dtype=x.dtype)
if y is not None and self.y_embedder is not None:
y = self.y_embedder(y)
c = c + y
if context is not None:
context = self.context_embedder(context)
output = []
blocks = len(self.joint_blocks)
for i in range(blocks):
context, x = self.joint_blocks[i](
context,
x,
c=c,
use_checkpoint=self.use_checkpoint,
)
out = self.controlnet_blocks[i](x)
count = self.depth // blocks
if i == blocks - 1:
count -= 1
for j in range(count):
output.append(out)
return {"output": output}

View File

@ -1,36 +1,181 @@
import argparse
import enum
import os
from typing import Optional
import comfy.options
class EnumAction(argparse.Action):
"""
Argparse action for handling Enums
"""
def __init__(self, **kwargs):
# Pop off the type value
enum_type = kwargs.pop("type", None)
# Ensure an Enum subclass is provided
if enum_type is None:
raise ValueError("type must be assigned an Enum when using EnumAction")
if not issubclass(enum_type, enum.Enum):
raise TypeError("type must be an Enum when using EnumAction")
# Generate choices from the Enum
choices = tuple(e.value for e in enum_type)
kwargs.setdefault("choices", choices)
kwargs.setdefault("metavar", f"[{','.join(list(choices))}]")
super(EnumAction, self).__init__(**kwargs)
self._enum = enum_type
def __call__(self, parser, namespace, values, option_string=None):
# Convert value back into an Enum
value = self._enum(values)
setattr(namespace, self.dest, value)
parser = argparse.ArgumentParser()
parser.add_argument("--listen", type=str, default="127.0.0.1", metavar="IP", nargs="?", const="0.0.0.0", help="Specify the IP address to listen on (default: 127.0.0.1). If --listen is provided without an argument, it defaults to 0.0.0.0. (listens on all)")
parser.add_argument("--port", type=int, default=8188, help="Set the listen port.")
parser.add_argument("--tls-keyfile", type=str, help="Path to TLS (SSL) key file. Enables TLS, makes app accessible at https://... requires --tls-certfile to function")
parser.add_argument("--tls-certfile", type=str, help="Path to TLS (SSL) certificate file. Enables TLS, makes app accessible at https://... requires --tls-keyfile to function")
parser.add_argument("--enable-cors-header", type=str, default=None, metavar="ORIGIN", nargs="?", const="*", help="Enable CORS (Cross-Origin Resource Sharing) with optional origin or allow all with default '*'.")
parser.add_argument("--max-upload-size", type=float, default=100, help="Set the maximum upload size in MB.")
parser.add_argument("--extra-model-paths-config", type=str, default=None, metavar="PATH", nargs='+', action='append', help="Load one or more extra_model_paths.yaml files.")
parser.add_argument("--output-directory", type=str, default=None, help="Set the ComfyUI output directory.")
parser.add_argument("--temp-directory", type=str, default=None, help="Set the ComfyUI temp directory (default is in the ComfyUI directory).")
parser.add_argument("--input-directory", type=str, default=None, help="Set the ComfyUI input directory.")
parser.add_argument("--auto-launch", action="store_true", help="Automatically launch ComfyUI in the default browser.")
parser.add_argument("--disable-auto-launch", action="store_true", help="Disable auto launching the browser.")
parser.add_argument("--cuda-device", type=int, default=None, metavar="DEVICE_ID", help="Set the id of the cuda device this instance will use.")
parser.add_argument("--dont-upcast-attention", action="store_true", help="Disable upcasting of attention. Can boost speed but increase the chances of black images.")
parser.add_argument("--force-fp32", action="store_true", help="Force fp32 (If this makes your GPU work better please report it).")
cm_group = parser.add_mutually_exclusive_group()
cm_group.add_argument("--cuda-malloc", action="store_true", help="Enable cudaMallocAsync (enabled by default for torch 2.0 and up).")
cm_group.add_argument("--disable-cuda-malloc", action="store_true", help="Disable cudaMallocAsync.")
fp_group = parser.add_mutually_exclusive_group()
fp_group.add_argument("--force-fp32", action="store_true", help="Force fp32 (If this makes your GPU work better please report it).")
fp_group.add_argument("--force-fp16", action="store_true", help="Force fp16.")
fpunet_group = parser.add_mutually_exclusive_group()
fpunet_group.add_argument("--bf16-unet", action="store_true", help="Run the UNET in bf16. This should only be used for testing stuff.")
fpunet_group.add_argument("--fp16-unet", action="store_true", help="Store unet weights in fp16.")
fpunet_group.add_argument("--fp8_e4m3fn-unet", action="store_true", help="Store unet weights in fp8_e4m3fn.")
fpunet_group.add_argument("--fp8_e5m2-unet", action="store_true", help="Store unet weights in fp8_e5m2.")
fpvae_group = parser.add_mutually_exclusive_group()
fpvae_group.add_argument("--fp16-vae", action="store_true", help="Run the VAE in fp16, might cause black images.")
fpvae_group.add_argument("--fp32-vae", action="store_true", help="Run the VAE in full precision fp32.")
fpvae_group.add_argument("--bf16-vae", action="store_true", help="Run the VAE in bf16.")
parser.add_argument("--cpu-vae", action="store_true", help="Run the VAE on the CPU.")
fpte_group = parser.add_mutually_exclusive_group()
fpte_group.add_argument("--fp8_e4m3fn-text-enc", action="store_true", help="Store text encoder weights in fp8 (e4m3fn variant).")
fpte_group.add_argument("--fp8_e5m2-text-enc", action="store_true", help="Store text encoder weights in fp8 (e5m2 variant).")
fpte_group.add_argument("--fp16-text-enc", action="store_true", help="Store text encoder weights in fp16.")
fpte_group.add_argument("--fp32-text-enc", action="store_true", help="Store text encoder weights in fp32.")
parser.add_argument("--force-channels-last", action="store_true", help="Force channels last format when inferencing the models.")
parser.add_argument("--directml", type=int, nargs="?", metavar="DIRECTML_DEVICE", const=-1, help="Use torch-directml.")
parser.add_argument("--disable-ipex-optimize", action="store_true", help="Disables ipex.optimize when loading models with Intel GPUs.")
class LatentPreviewMethod(enum.Enum):
NoPreviews = "none"
Auto = "auto"
Latent2RGB = "latent2rgb"
TAESD = "taesd"
parser.add_argument("--preview-method", type=LatentPreviewMethod, default=LatentPreviewMethod.NoPreviews, help="Default preview method for sampler nodes.", action=EnumAction)
cache_group = parser.add_mutually_exclusive_group()
cache_group.add_argument("--cache-classic", action="store_true", help="Use the old style (aggressive) caching.")
cache_group.add_argument("--cache-lru", type=int, default=0, help="Use LRU caching with a maximum of N node results cached. May use more RAM/VRAM.")
attn_group = parser.add_mutually_exclusive_group()
attn_group.add_argument("--use-split-cross-attention", action="store_true", help="Use the split cross attention optimization instead of the sub-quadratic one. Ignored when xformers is used.")
attn_group.add_argument("--use-split-cross-attention", action="store_true", help="Use the split cross attention optimization. Ignored when xformers is used.")
attn_group.add_argument("--use-quad-cross-attention", action="store_true", help="Use the sub-quadratic cross attention optimization . Ignored when xformers is used.")
attn_group.add_argument("--use-pytorch-cross-attention", action="store_true", help="Use the new pytorch 2.0 cross attention function.")
parser.add_argument("--disable-xformers", action="store_true", help="Disable xformers.")
upcast = parser.add_mutually_exclusive_group()
upcast.add_argument("--force-upcast-attention", action="store_true", help="Force enable attention upcasting, please report if it fixes black images.")
upcast.add_argument("--dont-upcast-attention", action="store_true", help="Disable all upcasting of attention. Should be unnecessary except for debugging.")
vram_group = parser.add_mutually_exclusive_group()
vram_group.add_argument("--gpu-only", action="store_true", help="Store and run everything (text encoders/CLIP models, etc... on the GPU).")
vram_group.add_argument("--highvram", action="store_true", help="By default models will be unloaded to CPU memory after being used. This option keeps them in GPU memory.")
vram_group.add_argument("--normalvram", action="store_true", help="Used to force normal vram use if lowvram gets automatically enabled.")
vram_group.add_argument("--lowvram", action="store_true", help="Split the unet in parts to use less vram.")
vram_group.add_argument("--novram", action="store_true", help="When lowvram isn't enough.")
vram_group.add_argument("--cpu", action="store_true", help="To use the CPU for everything (slow).")
parser.add_argument("--reserve-vram", type=float, default=None, help="Set the amount of vram in GB you want to reserve for use by your OS/other software. By default some amount is reverved depending on your OS.")
parser.add_argument("--default-hashing-function", type=str, choices=['md5', 'sha1', 'sha256', 'sha512'], default='sha256', help="Allows you to choose the hash function to use for duplicate filename / contents comparison. Default is sha256.")
parser.add_argument("--disable-smart-memory", action="store_true", help="Force ComfyUI to agressively offload to regular ram instead of keeping models in vram when it can.")
parser.add_argument("--deterministic", action="store_true", help="Make pytorch use slower deterministic algorithms when it can. Note that this might not make images deterministic in all cases.")
parser.add_argument("--fast", action="store_true", help="Enable some untested and potentially quality deteriorating optimizations.")
parser.add_argument("--dont-print-server", action="store_true", help="Don't print server output.")
parser.add_argument("--quick-test-for-ci", action="store_true", help="Quick test for CI.")
parser.add_argument("--windows-standalone-build", action="store_true", help="Windows standalone build: Enable convenient things that most people using the standalone windows build will probably enjoy (like auto opening the page on startup).")
args = parser.parse_args()
parser.add_argument("--disable-metadata", action="store_true", help="Disable saving prompt metadata in files.")
parser.add_argument("--disable-all-custom-nodes", action="store_true", help="Disable loading all custom nodes.")
parser.add_argument("--multi-user", action="store_true", help="Enables per-user storage.")
parser.add_argument("--verbose", action="store_true", help="Enables more debug prints.")
# The default built-in provider hosted under web/
DEFAULT_VERSION_STRING = "comfyanonymous/ComfyUI@latest"
parser.add_argument(
"--front-end-version",
type=str,
default=DEFAULT_VERSION_STRING,
help="""
Specifies the version of the frontend to be used. This command needs internet connectivity to query and
download available frontend implementations from GitHub releases.
The version string should be in the format of:
[repoOwner]/[repoName]@[version]
where version is one of: "latest" or a valid version number (e.g. "1.0.0")
""",
)
def is_valid_directory(path: Optional[str]) -> Optional[str]:
"""Validate if the given path is a directory."""
if path is None:
return None
if not os.path.isdir(path):
raise argparse.ArgumentTypeError(f"{path} is not a valid directory.")
return path
parser.add_argument(
"--front-end-root",
type=is_valid_directory,
default=None,
help="The local filesystem path to the directory where the frontend is located. Overrides --front-end-version.",
)
if comfy.options.args_parsing:
args = parser.parse_args()
else:
args = parser.parse_args([])
if args.windows_standalone_build:
args.auto_launch = True
if args.disable_auto_launch:
args.auto_launch = False

View File

@ -0,0 +1,23 @@
{
"architectures": [
"CLIPTextModel"
],
"attention_dropout": 0.0,
"bos_token_id": 0,
"dropout": 0.0,
"eos_token_id": 49407,
"hidden_act": "gelu",
"hidden_size": 1280,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 5120,
"layer_norm_eps": 1e-05,
"max_position_embeddings": 77,
"model_type": "clip_text_model",
"num_attention_heads": 20,
"num_hidden_layers": 32,
"pad_token_id": 1,
"projection_dim": 1280,
"torch_dtype": "float32",
"vocab_size": 49408
}

196
comfy/clip_model.py Normal file
View File

@ -0,0 +1,196 @@
import torch
from comfy.ldm.modules.attention import optimized_attention_for_device
import comfy.ops
class CLIPAttention(torch.nn.Module):
def __init__(self, embed_dim, heads, dtype, device, operations):
super().__init__()
self.heads = heads
self.q_proj = operations.Linear(embed_dim, embed_dim, bias=True, dtype=dtype, device=device)
self.k_proj = operations.Linear(embed_dim, embed_dim, bias=True, dtype=dtype, device=device)
self.v_proj = operations.Linear(embed_dim, embed_dim, bias=True, dtype=dtype, device=device)
self.out_proj = operations.Linear(embed_dim, embed_dim, bias=True, dtype=dtype, device=device)
def forward(self, x, mask=None, optimized_attention=None):
q = self.q_proj(x)
k = self.k_proj(x)
v = self.v_proj(x)
out = optimized_attention(q, k, v, self.heads, mask)
return self.out_proj(out)
ACTIVATIONS = {"quick_gelu": lambda a: a * torch.sigmoid(1.702 * a),
"gelu": torch.nn.functional.gelu,
}
class CLIPMLP(torch.nn.Module):
def __init__(self, embed_dim, intermediate_size, activation, dtype, device, operations):
super().__init__()
self.fc1 = operations.Linear(embed_dim, intermediate_size, bias=True, dtype=dtype, device=device)
self.activation = ACTIVATIONS[activation]
self.fc2 = operations.Linear(intermediate_size, embed_dim, bias=True, dtype=dtype, device=device)
def forward(self, x):
x = self.fc1(x)
x = self.activation(x)
x = self.fc2(x)
return x
class CLIPLayer(torch.nn.Module):
def __init__(self, embed_dim, heads, intermediate_size, intermediate_activation, dtype, device, operations):
super().__init__()
self.layer_norm1 = operations.LayerNorm(embed_dim, dtype=dtype, device=device)
self.self_attn = CLIPAttention(embed_dim, heads, dtype, device, operations)
self.layer_norm2 = operations.LayerNorm(embed_dim, dtype=dtype, device=device)
self.mlp = CLIPMLP(embed_dim, intermediate_size, intermediate_activation, dtype, device, operations)
def forward(self, x, mask=None, optimized_attention=None):
x += self.self_attn(self.layer_norm1(x), mask, optimized_attention)
x += self.mlp(self.layer_norm2(x))
return x
class CLIPEncoder(torch.nn.Module):
def __init__(self, num_layers, embed_dim, heads, intermediate_size, intermediate_activation, dtype, device, operations):
super().__init__()
self.layers = torch.nn.ModuleList([CLIPLayer(embed_dim, heads, intermediate_size, intermediate_activation, dtype, device, operations) for i in range(num_layers)])
def forward(self, x, mask=None, intermediate_output=None):
optimized_attention = optimized_attention_for_device(x.device, mask=mask is not None, small_input=True)
if intermediate_output is not None:
if intermediate_output < 0:
intermediate_output = len(self.layers) + intermediate_output
intermediate = None
for i, l in enumerate(self.layers):
x = l(x, mask, optimized_attention)
if i == intermediate_output:
intermediate = x.clone()
return x, intermediate
class CLIPEmbeddings(torch.nn.Module):
def __init__(self, embed_dim, vocab_size=49408, num_positions=77, dtype=None, device=None, operations=None):
super().__init__()
self.token_embedding = operations.Embedding(vocab_size, embed_dim, dtype=dtype, device=device)
self.position_embedding = operations.Embedding(num_positions, embed_dim, dtype=dtype, device=device)
def forward(self, input_tokens, dtype=torch.float32):
return self.token_embedding(input_tokens, out_dtype=dtype) + comfy.ops.cast_to(self.position_embedding.weight, dtype=dtype, device=input_tokens.device)
class CLIPTextModel_(torch.nn.Module):
def __init__(self, config_dict, dtype, device, operations):
num_layers = config_dict["num_hidden_layers"]
embed_dim = config_dict["hidden_size"]
heads = config_dict["num_attention_heads"]
intermediate_size = config_dict["intermediate_size"]
intermediate_activation = config_dict["hidden_act"]
num_positions = config_dict["max_position_embeddings"]
self.eos_token_id = config_dict["eos_token_id"]
super().__init__()
self.embeddings = CLIPEmbeddings(embed_dim, num_positions=num_positions, dtype=dtype, device=device, operations=operations)
self.encoder = CLIPEncoder(num_layers, embed_dim, heads, intermediate_size, intermediate_activation, dtype, device, operations)
self.final_layer_norm = operations.LayerNorm(embed_dim, dtype=dtype, device=device)
def forward(self, input_tokens, attention_mask=None, intermediate_output=None, final_layer_norm_intermediate=True, dtype=torch.float32):
x = self.embeddings(input_tokens, dtype=dtype)
mask = None
if attention_mask is not None:
mask = 1.0 - attention_mask.to(x.dtype).reshape((attention_mask.shape[0], 1, -1, attention_mask.shape[-1])).expand(attention_mask.shape[0], 1, attention_mask.shape[-1], attention_mask.shape[-1])
mask = mask.masked_fill(mask.to(torch.bool), float("-inf"))
causal_mask = torch.empty(x.shape[1], x.shape[1], dtype=x.dtype, device=x.device).fill_(float("-inf")).triu_(1)
if mask is not None:
mask += causal_mask
else:
mask = causal_mask
x, i = self.encoder(x, mask=mask, intermediate_output=intermediate_output)
x = self.final_layer_norm(x)
if i is not None and final_layer_norm_intermediate:
i = self.final_layer_norm(i)
pooled_output = x[torch.arange(x.shape[0], device=x.device), (torch.round(input_tokens).to(dtype=torch.int, device=x.device) == self.eos_token_id).int().argmax(dim=-1),]
return x, i, pooled_output
class CLIPTextModel(torch.nn.Module):
def __init__(self, config_dict, dtype, device, operations):
super().__init__()
self.num_layers = config_dict["num_hidden_layers"]
self.text_model = CLIPTextModel_(config_dict, dtype, device, operations)
embed_dim = config_dict["hidden_size"]
self.text_projection = operations.Linear(embed_dim, embed_dim, bias=False, dtype=dtype, device=device)
self.dtype = dtype
def get_input_embeddings(self):
return self.text_model.embeddings.token_embedding
def set_input_embeddings(self, embeddings):
self.text_model.embeddings.token_embedding = embeddings
def forward(self, *args, **kwargs):
x = self.text_model(*args, **kwargs)
out = self.text_projection(x[2])
return (x[0], x[1], out, x[2])
class CLIPVisionEmbeddings(torch.nn.Module):
def __init__(self, embed_dim, num_channels=3, patch_size=14, image_size=224, dtype=None, device=None, operations=None):
super().__init__()
self.class_embedding = torch.nn.Parameter(torch.empty(embed_dim, dtype=dtype, device=device))
self.patch_embedding = operations.Conv2d(
in_channels=num_channels,
out_channels=embed_dim,
kernel_size=patch_size,
stride=patch_size,
bias=False,
dtype=dtype,
device=device
)
num_patches = (image_size // patch_size) ** 2
num_positions = num_patches + 1
self.position_embedding = operations.Embedding(num_positions, embed_dim, dtype=dtype, device=device)
def forward(self, pixel_values):
embeds = self.patch_embedding(pixel_values).flatten(2).transpose(1, 2)
return torch.cat([comfy.ops.cast_to_input(self.class_embedding, embeds).expand(pixel_values.shape[0], 1, -1), embeds], dim=1) + comfy.ops.cast_to_input(self.position_embedding.weight, embeds)
class CLIPVision(torch.nn.Module):
def __init__(self, config_dict, dtype, device, operations):
super().__init__()
num_layers = config_dict["num_hidden_layers"]
embed_dim = config_dict["hidden_size"]
heads = config_dict["num_attention_heads"]
intermediate_size = config_dict["intermediate_size"]
intermediate_activation = config_dict["hidden_act"]
self.embeddings = CLIPVisionEmbeddings(embed_dim, config_dict["num_channels"], config_dict["patch_size"], config_dict["image_size"], dtype=dtype, device=device, operations=operations)
self.pre_layrnorm = operations.LayerNorm(embed_dim)
self.encoder = CLIPEncoder(num_layers, embed_dim, heads, intermediate_size, intermediate_activation, dtype, device, operations)
self.post_layernorm = operations.LayerNorm(embed_dim)
def forward(self, pixel_values, attention_mask=None, intermediate_output=None):
x = self.embeddings(pixel_values)
x = self.pre_layrnorm(x)
#TODO: attention_mask?
x, i = self.encoder(x, mask=None, intermediate_output=intermediate_output)
pooled_output = self.post_layernorm(x[:, 0, :])
return x, i, pooled_output
class CLIPVisionModelProjection(torch.nn.Module):
def __init__(self, config_dict, dtype, device, operations):
super().__init__()
self.vision_model = CLIPVision(config_dict, dtype, device, operations)
self.visual_projection = operations.Linear(config_dict["hidden_size"], config_dict["projection_dim"], bias=False)
def forward(self, *args, **kwargs):
x = self.vision_model(*args, **kwargs)
out = self.visual_projection(x[2])
return (x[0], x[1], out)

View File

@ -1,64 +1,121 @@
from transformers import CLIPVisionModelWithProjection, CLIPVisionConfig, CLIPImageProcessor
from .utils import load_torch_file, transformers_convert
from .utils import load_torch_file, transformers_convert, state_dict_prefix_replace
import os
import torch
import json
import logging
import comfy.ops
import comfy.model_patcher
import comfy.model_management
import comfy.utils
import comfy.clip_model
class Output:
def __getitem__(self, key):
return getattr(self, key)
def __setitem__(self, key, item):
setattr(self, key, item)
def clip_preprocess(image, size=224):
mean = torch.tensor([ 0.48145466,0.4578275,0.40821073], device=image.device, dtype=image.dtype)
std = torch.tensor([0.26862954,0.26130258,0.27577711], device=image.device, dtype=image.dtype)
image = image.movedim(-1, 1)
if not (image.shape[2] == size and image.shape[3] == size):
scale = (size / min(image.shape[2], image.shape[3]))
image = torch.nn.functional.interpolate(image, size=(round(scale * image.shape[2]), round(scale * image.shape[3])), mode="bicubic", antialias=True)
h = (image.shape[2] - size)//2
w = (image.shape[3] - size)//2
image = image[:,:,h:h+size,w:w+size]
image = torch.clip((255. * image), 0, 255).round() / 255.0
return (image - mean.view([3,1,1])) / std.view([3,1,1])
class ClipVisionModel():
def __init__(self, json_config):
config = CLIPVisionConfig.from_json_file(json_config)
self.model = CLIPVisionModelWithProjection(config)
self.processor = CLIPImageProcessor(crop_size=224,
do_center_crop=True,
do_convert_rgb=True,
do_normalize=True,
do_resize=True,
image_mean=[ 0.48145466,0.4578275,0.40821073],
image_std=[0.26862954,0.26130258,0.27577711],
resample=3, #bicubic
size=224)
with open(json_config) as f:
config = json.load(f)
self.image_size = config.get("image_size", 224)
self.load_device = comfy.model_management.text_encoder_device()
offload_device = comfy.model_management.text_encoder_offload_device()
self.dtype = comfy.model_management.text_encoder_dtype(self.load_device)
self.model = comfy.clip_model.CLIPVisionModelProjection(config, self.dtype, offload_device, comfy.ops.manual_cast)
self.model.eval()
self.patcher = comfy.model_patcher.ModelPatcher(self.model, load_device=self.load_device, offload_device=offload_device)
def load_sd(self, sd):
self.model.load_state_dict(sd, strict=False)
return self.model.load_state_dict(sd, strict=False)
def get_sd(self):
return self.model.state_dict()
def encode_image(self, image):
img = torch.clip((255. * image[0]), 0, 255).round().int()
inputs = self.processor(images=[img], return_tensors="pt")
outputs = self.model(**inputs)
comfy.model_management.load_model_gpu(self.patcher)
pixel_values = clip_preprocess(image.to(self.load_device), size=self.image_size).float()
out = self.model(pixel_values=pixel_values, intermediate_output=-2)
outputs = Output()
outputs["last_hidden_state"] = out[0].to(comfy.model_management.intermediate_device())
outputs["image_embeds"] = out[2].to(comfy.model_management.intermediate_device())
outputs["penultimate_hidden_states"] = out[1].to(comfy.model_management.intermediate_device())
return outputs
def convert_to_transformers(sd):
def convert_to_transformers(sd, prefix):
sd_k = sd.keys()
if "embedder.model.visual.transformer.resblocks.0.attn.in_proj_weight" in sd_k:
if "{}transformer.resblocks.0.attn.in_proj_weight".format(prefix) in sd_k:
keys_to_replace = {
"embedder.model.visual.class_embedding": "vision_model.embeddings.class_embedding",
"embedder.model.visual.conv1.weight": "vision_model.embeddings.patch_embedding.weight",
"embedder.model.visual.positional_embedding": "vision_model.embeddings.position_embedding.weight",
"embedder.model.visual.ln_post.bias": "vision_model.post_layernorm.bias",
"embedder.model.visual.ln_post.weight": "vision_model.post_layernorm.weight",
"embedder.model.visual.ln_pre.bias": "vision_model.pre_layrnorm.bias",
"embedder.model.visual.ln_pre.weight": "vision_model.pre_layrnorm.weight",
"{}class_embedding".format(prefix): "vision_model.embeddings.class_embedding",
"{}conv1.weight".format(prefix): "vision_model.embeddings.patch_embedding.weight",
"{}positional_embedding".format(prefix): "vision_model.embeddings.position_embedding.weight",
"{}ln_post.bias".format(prefix): "vision_model.post_layernorm.bias",
"{}ln_post.weight".format(prefix): "vision_model.post_layernorm.weight",
"{}ln_pre.bias".format(prefix): "vision_model.pre_layrnorm.bias",
"{}ln_pre.weight".format(prefix): "vision_model.pre_layrnorm.weight",
}
for x in keys_to_replace:
if x in sd_k:
sd[keys_to_replace[x]] = sd.pop(x)
if "embedder.model.visual.proj" in sd_k:
sd['visual_projection.weight'] = sd.pop("embedder.model.visual.proj").transpose(0, 1)
if "{}proj".format(prefix) in sd_k:
sd['visual_projection.weight'] = sd.pop("{}proj".format(prefix)).transpose(0, 1)
sd = transformers_convert(sd, "embedder.model.visual", "vision_model", 32)
sd = transformers_convert(sd, prefix, "vision_model.", 48)
else:
replace_prefix = {prefix: ""}
sd = state_dict_prefix_replace(sd, replace_prefix)
return sd
def load_clipvision_from_sd(sd):
sd = convert_to_transformers(sd)
if "vision_model.encoder.layers.30.layer_norm1.weight" in sd:
def load_clipvision_from_sd(sd, prefix="", convert_keys=False):
if convert_keys:
sd = convert_to_transformers(sd, prefix)
if "vision_model.encoder.layers.47.layer_norm1.weight" in sd:
json_config = os.path.join(os.path.dirname(os.path.realpath(__file__)), "clip_vision_config_g.json")
elif "vision_model.encoder.layers.30.layer_norm1.weight" in sd:
json_config = os.path.join(os.path.dirname(os.path.realpath(__file__)), "clip_vision_config_h.json")
elif "vision_model.encoder.layers.22.layer_norm1.weight" in sd:
if sd["vision_model.embeddings.position_embedding.weight"].shape[0] == 577:
json_config = os.path.join(os.path.dirname(os.path.realpath(__file__)), "clip_vision_config_vitl_336.json")
else:
json_config = os.path.join(os.path.dirname(os.path.realpath(__file__)), "clip_vision_config_vitl.json")
else:
json_config = os.path.join(os.path.dirname(os.path.realpath(__file__)), "clip_vision_config_vitl.json")
return None
clip = ClipVisionModel(json_config)
clip.load_sd(sd)
m, u = clip.load_sd(sd)
if len(m) > 0:
logging.warning("missing clip vision: {}".format(m))
u = set(u)
keys = list(sd.keys())
for k in keys:
if k not in u:
t = sd.pop(k)
del t
return clip
def load(ckpt_path):
sd = load_torch_file(ckpt_path)
return load_clipvision_from_sd(sd)
if "visual.transformer.resblocks.0.attn.in_proj_weight" in sd:
return load_clipvision_from_sd(sd, prefix="visual.", convert_keys=True)
else:
return load_clipvision_from_sd(sd)

View File

@ -0,0 +1,18 @@
{
"attention_dropout": 0.0,
"dropout": 0.0,
"hidden_act": "gelu",
"hidden_size": 1664,
"image_size": 224,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 8192,
"layer_norm_eps": 1e-05,
"model_type": "clip_vision_model",
"num_attention_heads": 16,
"num_channels": 3,
"num_hidden_layers": 48,
"patch_size": 14,
"projection_dim": 1280,
"torch_dtype": "float32"
}

View File

@ -0,0 +1,18 @@
{
"attention_dropout": 0.0,
"dropout": 0.0,
"hidden_act": "quick_gelu",
"hidden_size": 1024,
"image_size": 336,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 4096,
"layer_norm_eps": 1e-5,
"model_type": "clip_vision_model",
"num_attention_heads": 16,
"num_channels": 3,
"num_hidden_layers": 24,
"patch_size": 14,
"projection_dim": 768,
"torch_dtype": "float32"
}

83
comfy/conds.py Normal file
View File

@ -0,0 +1,83 @@
import torch
import math
import comfy.utils
def lcm(a, b): #TODO: eventually replace by math.lcm (added in python3.9)
return abs(a*b) // math.gcd(a, b)
class CONDRegular:
def __init__(self, cond):
self.cond = cond
def _copy_with(self, cond):
return self.__class__(cond)
def process_cond(self, batch_size, device, **kwargs):
return self._copy_with(comfy.utils.repeat_to_batch_size(self.cond, batch_size).to(device))
def can_concat(self, other):
if self.cond.shape != other.cond.shape:
return False
return True
def concat(self, others):
conds = [self.cond]
for x in others:
conds.append(x.cond)
return torch.cat(conds)
class CONDNoiseShape(CONDRegular):
def process_cond(self, batch_size, device, area, **kwargs):
data = self.cond
if area is not None:
dims = len(area) // 2
for i in range(dims):
data = data.narrow(i + 2, area[i + dims], area[i])
return self._copy_with(comfy.utils.repeat_to_batch_size(data, batch_size).to(device))
class CONDCrossAttn(CONDRegular):
def can_concat(self, other):
s1 = self.cond.shape
s2 = other.cond.shape
if s1 != s2:
if s1[0] != s2[0] or s1[2] != s2[2]: #these 2 cases should not happen
return False
mult_min = lcm(s1[1], s2[1])
diff = mult_min // min(s1[1], s2[1])
if diff > 4: #arbitrary limit on the padding because it's probably going to impact performance negatively if it's too much
return False
return True
def concat(self, others):
conds = [self.cond]
crossattn_max_len = self.cond.shape[1]
for x in others:
c = x.cond
crossattn_max_len = lcm(crossattn_max_len, c.shape[1])
conds.append(c)
out = []
for c in conds:
if c.shape[1] < crossattn_max_len:
c = c.repeat(1, crossattn_max_len // c.shape[1], 1) #padding with repeat doesn't change result
out.append(c)
return torch.cat(out)
class CONDConstant(CONDRegular):
def __init__(self, cond):
self.cond = cond
def process_cond(self, batch_size, device, **kwargs):
return self._copy_with(self.cond)
def can_concat(self, other):
if self.cond != other.cond:
return False
return True
def concat(self, others):
return self.cond

705
comfy/controlnet.py Normal file
View File

@ -0,0 +1,705 @@
"""
This file is part of ComfyUI.
Copyright (C) 2024 Comfy
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
import torch
from enum import Enum
import math
import os
import logging
import comfy.utils
import comfy.model_management
import comfy.model_detection
import comfy.model_patcher
import comfy.ops
import comfy.latent_formats
import comfy.cldm.cldm
import comfy.t2i_adapter.adapter
import comfy.ldm.cascade.controlnet
import comfy.cldm.mmdit
import comfy.ldm.hydit.controlnet
import comfy.ldm.flux.controlnet
def broadcast_image_to(tensor, target_batch_size, batched_number):
current_batch_size = tensor.shape[0]
#print(current_batch_size, target_batch_size)
if current_batch_size == 1:
return tensor
per_batch = target_batch_size // batched_number
tensor = tensor[:per_batch]
if per_batch > tensor.shape[0]:
tensor = torch.cat([tensor] * (per_batch // tensor.shape[0]) + [tensor[:(per_batch % tensor.shape[0])]], dim=0)
current_batch_size = tensor.shape[0]
if current_batch_size == target_batch_size:
return tensor
else:
return torch.cat([tensor] * batched_number, dim=0)
class StrengthType(Enum):
CONSTANT = 1
LINEAR_UP = 2
class ControlBase:
def __init__(self, device=None):
self.cond_hint_original = None
self.cond_hint = None
self.strength = 1.0
self.timestep_percent_range = (0.0, 1.0)
self.latent_format = None
self.vae = None
self.global_average_pooling = False
self.timestep_range = None
self.compression_ratio = 8
self.upscale_algorithm = 'nearest-exact'
self.extra_args = {}
if device is None:
device = comfy.model_management.get_torch_device()
self.device = device
self.previous_controlnet = None
self.extra_conds = []
self.strength_type = StrengthType.CONSTANT
def set_cond_hint(self, cond_hint, strength=1.0, timestep_percent_range=(0.0, 1.0), vae=None):
self.cond_hint_original = cond_hint
self.strength = strength
self.timestep_percent_range = timestep_percent_range
if self.latent_format is not None:
self.vae = vae
return self
def pre_run(self, model, percent_to_timestep_function):
self.timestep_range = (percent_to_timestep_function(self.timestep_percent_range[0]), percent_to_timestep_function(self.timestep_percent_range[1]))
if self.previous_controlnet is not None:
self.previous_controlnet.pre_run(model, percent_to_timestep_function)
def set_previous_controlnet(self, controlnet):
self.previous_controlnet = controlnet
return self
def cleanup(self):
if self.previous_controlnet is not None:
self.previous_controlnet.cleanup()
if self.cond_hint is not None:
del self.cond_hint
self.cond_hint = None
self.timestep_range = None
def get_models(self):
out = []
if self.previous_controlnet is not None:
out += self.previous_controlnet.get_models()
return out
def copy_to(self, c):
c.cond_hint_original = self.cond_hint_original
c.strength = self.strength
c.timestep_percent_range = self.timestep_percent_range
c.global_average_pooling = self.global_average_pooling
c.compression_ratio = self.compression_ratio
c.upscale_algorithm = self.upscale_algorithm
c.latent_format = self.latent_format
c.extra_args = self.extra_args.copy()
c.vae = self.vae
c.extra_conds = self.extra_conds.copy()
c.strength_type = self.strength_type
def inference_memory_requirements(self, dtype):
if self.previous_controlnet is not None:
return self.previous_controlnet.inference_memory_requirements(dtype)
return 0
def control_merge(self, control, control_prev, output_dtype):
out = {'input':[], 'middle':[], 'output': []}
for key in control:
control_output = control[key]
applied_to = set()
for i in range(len(control_output)):
x = control_output[i]
if x is not None:
if self.global_average_pooling:
x = torch.mean(x, dim=(2, 3), keepdim=True).repeat(1, 1, x.shape[2], x.shape[3])
if x not in applied_to: #memory saving strategy, allow shared tensors and only apply strength to shared tensors once
applied_to.add(x)
if self.strength_type == StrengthType.CONSTANT:
x *= self.strength
elif self.strength_type == StrengthType.LINEAR_UP:
x *= (self.strength ** float(len(control_output) - i))
if output_dtype is not None and x.dtype != output_dtype:
x = x.to(output_dtype)
out[key].append(x)
if control_prev is not None:
for x in ['input', 'middle', 'output']:
o = out[x]
for i in range(len(control_prev[x])):
prev_val = control_prev[x][i]
if i >= len(o):
o.append(prev_val)
elif prev_val is not None:
if o[i] is None:
o[i] = prev_val
else:
if o[i].shape[0] < prev_val.shape[0]:
o[i] = prev_val + o[i]
else:
o[i] = prev_val + o[i] #TODO: change back to inplace add if shared tensors stop being an issue
return out
def set_extra_arg(self, argument, value=None):
self.extra_args[argument] = value
class ControlNet(ControlBase):
def __init__(self, control_model=None, global_average_pooling=False, compression_ratio=8, latent_format=None, device=None, load_device=None, manual_cast_dtype=None, extra_conds=["y"], strength_type=StrengthType.CONSTANT):
super().__init__(device)
self.control_model = control_model
self.load_device = load_device
if control_model is not None:
self.control_model_wrapped = comfy.model_patcher.ModelPatcher(self.control_model, load_device=load_device, offload_device=comfy.model_management.unet_offload_device())
self.compression_ratio = compression_ratio
self.global_average_pooling = global_average_pooling
self.model_sampling_current = None
self.manual_cast_dtype = manual_cast_dtype
self.latent_format = latent_format
self.extra_conds += extra_conds
self.strength_type = strength_type
def get_control(self, x_noisy, t, cond, batched_number):
control_prev = None
if self.previous_controlnet is not None:
control_prev = self.previous_controlnet.get_control(x_noisy, t, cond, batched_number)
if self.timestep_range is not None:
if t[0] > self.timestep_range[0] or t[0] < self.timestep_range[1]:
if control_prev is not None:
return control_prev
else:
return None
dtype = self.control_model.dtype
if self.manual_cast_dtype is not None:
dtype = self.manual_cast_dtype
if self.cond_hint is None or x_noisy.shape[2] * self.compression_ratio != self.cond_hint.shape[2] or x_noisy.shape[3] * self.compression_ratio != self.cond_hint.shape[3]:
if self.cond_hint is not None:
del self.cond_hint
self.cond_hint = None
compression_ratio = self.compression_ratio
if self.vae is not None:
compression_ratio *= self.vae.downscale_ratio
self.cond_hint = comfy.utils.common_upscale(self.cond_hint_original, x_noisy.shape[3] * compression_ratio, x_noisy.shape[2] * compression_ratio, self.upscale_algorithm, "center")
if self.vae is not None:
loaded_models = comfy.model_management.loaded_models(only_currently_used=True)
self.cond_hint = self.vae.encode(self.cond_hint.movedim(1, -1))
comfy.model_management.load_models_gpu(loaded_models)
if self.latent_format is not None:
self.cond_hint = self.latent_format.process_in(self.cond_hint)
self.cond_hint = self.cond_hint.to(device=self.device, dtype=dtype)
if x_noisy.shape[0] != self.cond_hint.shape[0]:
self.cond_hint = broadcast_image_to(self.cond_hint, x_noisy.shape[0], batched_number)
context = cond.get('crossattn_controlnet', cond['c_crossattn'])
extra = self.extra_args.copy()
for c in self.extra_conds:
temp = cond.get(c, None)
if temp is not None:
extra[c] = temp.to(dtype)
timestep = self.model_sampling_current.timestep(t)
x_noisy = self.model_sampling_current.calculate_input(t, x_noisy)
control = self.control_model(x=x_noisy.to(dtype), hint=self.cond_hint, timesteps=timestep.to(dtype), context=context.to(dtype), **extra)
return self.control_merge(control, control_prev, output_dtype=None)
def copy(self):
c = ControlNet(None, global_average_pooling=self.global_average_pooling, load_device=self.load_device, manual_cast_dtype=self.manual_cast_dtype)
c.control_model = self.control_model
c.control_model_wrapped = self.control_model_wrapped
self.copy_to(c)
return c
def get_models(self):
out = super().get_models()
out.append(self.control_model_wrapped)
return out
def pre_run(self, model, percent_to_timestep_function):
super().pre_run(model, percent_to_timestep_function)
self.model_sampling_current = model.model_sampling
def cleanup(self):
self.model_sampling_current = None
super().cleanup()
class ControlLoraOps:
class Linear(torch.nn.Module, comfy.ops.CastWeightBiasOp):
def __init__(self, in_features: int, out_features: int, bias: bool = True,
device=None, dtype=None) -> None:
factory_kwargs = {'device': device, 'dtype': dtype}
super().__init__()
self.in_features = in_features
self.out_features = out_features
self.weight = None
self.up = None
self.down = None
self.bias = None
def forward(self, input):
weight, bias = comfy.ops.cast_bias_weight(self, input)
if self.up is not None:
return torch.nn.functional.linear(input, weight + (torch.mm(self.up.flatten(start_dim=1), self.down.flatten(start_dim=1))).reshape(self.weight.shape).type(input.dtype), bias)
else:
return torch.nn.functional.linear(input, weight, bias)
class Conv2d(torch.nn.Module, comfy.ops.CastWeightBiasOp):
def __init__(
self,
in_channels,
out_channels,
kernel_size,
stride=1,
padding=0,
dilation=1,
groups=1,
bias=True,
padding_mode='zeros',
device=None,
dtype=None
):
super().__init__()
self.in_channels = in_channels
self.out_channels = out_channels
self.kernel_size = kernel_size
self.stride = stride
self.padding = padding
self.dilation = dilation
self.transposed = False
self.output_padding = 0
self.groups = groups
self.padding_mode = padding_mode
self.weight = None
self.bias = None
self.up = None
self.down = None
def forward(self, input):
weight, bias = comfy.ops.cast_bias_weight(self, input)
if self.up is not None:
return torch.nn.functional.conv2d(input, weight + (torch.mm(self.up.flatten(start_dim=1), self.down.flatten(start_dim=1))).reshape(self.weight.shape).type(input.dtype), bias, self.stride, self.padding, self.dilation, self.groups)
else:
return torch.nn.functional.conv2d(input, weight, bias, self.stride, self.padding, self.dilation, self.groups)
class ControlLora(ControlNet):
def __init__(self, control_weights, global_average_pooling=False, device=None):
ControlBase.__init__(self, device)
self.control_weights = control_weights
self.global_average_pooling = global_average_pooling
self.extra_conds += ["y"]
def pre_run(self, model, percent_to_timestep_function):
super().pre_run(model, percent_to_timestep_function)
controlnet_config = model.model_config.unet_config.copy()
controlnet_config.pop("out_channels")
controlnet_config["hint_channels"] = self.control_weights["input_hint_block.0.weight"].shape[1]
self.manual_cast_dtype = model.manual_cast_dtype
dtype = model.get_dtype()
if self.manual_cast_dtype is None:
class control_lora_ops(ControlLoraOps, comfy.ops.disable_weight_init):
pass
else:
class control_lora_ops(ControlLoraOps, comfy.ops.manual_cast):
pass
dtype = self.manual_cast_dtype
controlnet_config["operations"] = control_lora_ops
controlnet_config["dtype"] = dtype
self.control_model = comfy.cldm.cldm.ControlNet(**controlnet_config)
self.control_model.to(comfy.model_management.get_torch_device())
diffusion_model = model.diffusion_model
sd = diffusion_model.state_dict()
cm = self.control_model.state_dict()
for k in sd:
weight = sd[k]
try:
comfy.utils.set_attr_param(self.control_model, k, weight)
except:
pass
for k in self.control_weights:
if k not in {"lora_controlnet"}:
comfy.utils.set_attr_param(self.control_model, k, self.control_weights[k].to(dtype).to(comfy.model_management.get_torch_device()))
def copy(self):
c = ControlLora(self.control_weights, global_average_pooling=self.global_average_pooling)
self.copy_to(c)
return c
def cleanup(self):
del self.control_model
self.control_model = None
super().cleanup()
def get_models(self):
out = ControlBase.get_models(self)
return out
def inference_memory_requirements(self, dtype):
return comfy.utils.calculate_parameters(self.control_weights) * comfy.model_management.dtype_size(dtype) + ControlBase.inference_memory_requirements(self, dtype)
def controlnet_config(sd):
model_config = comfy.model_detection.model_config_from_unet(sd, "", True)
supported_inference_dtypes = model_config.supported_inference_dtypes
controlnet_config = model_config.unet_config
unet_dtype = comfy.model_management.unet_dtype(supported_dtypes=supported_inference_dtypes)
load_device = comfy.model_management.get_torch_device()
manual_cast_dtype = comfy.model_management.unet_manual_cast(unet_dtype, load_device)
if manual_cast_dtype is not None:
operations = comfy.ops.manual_cast
else:
operations = comfy.ops.disable_weight_init
offload_device = comfy.model_management.unet_offload_device()
return model_config, operations, load_device, unet_dtype, manual_cast_dtype, offload_device
def controlnet_load_state_dict(control_model, sd):
missing, unexpected = control_model.load_state_dict(sd, strict=False)
if len(missing) > 0:
logging.warning("missing controlnet keys: {}".format(missing))
if len(unexpected) > 0:
logging.debug("unexpected controlnet keys: {}".format(unexpected))
return control_model
def load_controlnet_mmdit(sd):
new_sd = comfy.model_detection.convert_diffusers_mmdit(sd, "")
model_config, operations, load_device, unet_dtype, manual_cast_dtype, offload_device = controlnet_config(new_sd)
num_blocks = comfy.model_detection.count_blocks(new_sd, 'joint_blocks.{}.')
for k in sd:
new_sd[k] = sd[k]
control_model = comfy.cldm.mmdit.ControlNet(num_blocks=num_blocks, operations=operations, device=offload_device, dtype=unet_dtype, **model_config.unet_config)
control_model = controlnet_load_state_dict(control_model, new_sd)
latent_format = comfy.latent_formats.SD3()
latent_format.shift_factor = 0 #SD3 controlnet weirdness
control = ControlNet(control_model, compression_ratio=1, latent_format=latent_format, load_device=load_device, manual_cast_dtype=manual_cast_dtype)
return control
def load_controlnet_hunyuandit(controlnet_data):
model_config, operations, load_device, unet_dtype, manual_cast_dtype, offload_device = controlnet_config(controlnet_data)
control_model = comfy.ldm.hydit.controlnet.HunYuanControlNet(operations=operations, device=offload_device, dtype=unet_dtype)
control_model = controlnet_load_state_dict(control_model, controlnet_data)
latent_format = comfy.latent_formats.SDXL()
extra_conds = ['text_embedding_mask', 'encoder_hidden_states_t5', 'text_embedding_mask_t5', 'image_meta_size', 'style', 'cos_cis_img', 'sin_cis_img']
control = ControlNet(control_model, compression_ratio=1, latent_format=latent_format, load_device=load_device, manual_cast_dtype=manual_cast_dtype, extra_conds=extra_conds, strength_type=StrengthType.CONSTANT)
return control
def load_controlnet_flux_xlabs(sd):
model_config, operations, load_device, unet_dtype, manual_cast_dtype, offload_device = controlnet_config(sd)
control_model = comfy.ldm.flux.controlnet.ControlNetFlux(operations=operations, device=offload_device, dtype=unet_dtype, **model_config.unet_config)
control_model = controlnet_load_state_dict(control_model, sd)
extra_conds = ['y', 'guidance']
control = ControlNet(control_model, load_device=load_device, manual_cast_dtype=manual_cast_dtype, extra_conds=extra_conds)
return control
def load_controlnet_flux_instantx(sd):
new_sd = comfy.model_detection.convert_diffusers_mmdit(sd, "")
model_config, operations, load_device, unet_dtype, manual_cast_dtype, offload_device = controlnet_config(new_sd)
for k in sd:
new_sd[k] = sd[k]
num_union_modes = 0
union_cnet = "controlnet_mode_embedder.weight"
if union_cnet in new_sd:
num_union_modes = new_sd[union_cnet].shape[0]
control_model = comfy.ldm.flux.controlnet.ControlNetFlux(latent_input=True, num_union_modes=num_union_modes, operations=operations, device=offload_device, dtype=unet_dtype, **model_config.unet_config)
control_model = controlnet_load_state_dict(control_model, new_sd)
latent_format = comfy.latent_formats.Flux()
extra_conds = ['y', 'guidance']
control = ControlNet(control_model, compression_ratio=1, latent_format=latent_format, load_device=load_device, manual_cast_dtype=manual_cast_dtype, extra_conds=extra_conds)
return control
def load_controlnet(ckpt_path, model=None):
controlnet_data = comfy.utils.load_torch_file(ckpt_path, safe_load=True)
if 'after_proj_list.18.bias' in controlnet_data.keys(): #Hunyuan DiT
return load_controlnet_hunyuandit(controlnet_data)
if "lora_controlnet" in controlnet_data:
return ControlLora(controlnet_data)
controlnet_config = None
supported_inference_dtypes = None
if "controlnet_cond_embedding.conv_in.weight" in controlnet_data: #diffusers format
controlnet_config = comfy.model_detection.unet_config_from_diffusers_unet(controlnet_data)
diffusers_keys = comfy.utils.unet_to_diffusers(controlnet_config)
diffusers_keys["controlnet_mid_block.weight"] = "middle_block_out.0.weight"
diffusers_keys["controlnet_mid_block.bias"] = "middle_block_out.0.bias"
count = 0
loop = True
while loop:
suffix = [".weight", ".bias"]
for s in suffix:
k_in = "controlnet_down_blocks.{}{}".format(count, s)
k_out = "zero_convs.{}.0{}".format(count, s)
if k_in not in controlnet_data:
loop = False
break
diffusers_keys[k_in] = k_out
count += 1
count = 0
loop = True
while loop:
suffix = [".weight", ".bias"]
for s in suffix:
if count == 0:
k_in = "controlnet_cond_embedding.conv_in{}".format(s)
else:
k_in = "controlnet_cond_embedding.blocks.{}{}".format(count - 1, s)
k_out = "input_hint_block.{}{}".format(count * 2, s)
if k_in not in controlnet_data:
k_in = "controlnet_cond_embedding.conv_out{}".format(s)
loop = False
diffusers_keys[k_in] = k_out
count += 1
new_sd = {}
for k in diffusers_keys:
if k in controlnet_data:
new_sd[diffusers_keys[k]] = controlnet_data.pop(k)
if "control_add_embedding.linear_1.bias" in controlnet_data: #Union Controlnet
controlnet_config["union_controlnet_num_control_type"] = controlnet_data["task_embedding"].shape[0]
for k in list(controlnet_data.keys()):
new_k = k.replace('.attn.in_proj_', '.attn.in_proj.')
new_sd[new_k] = controlnet_data.pop(k)
leftover_keys = controlnet_data.keys()
if len(leftover_keys) > 0:
logging.warning("leftover keys: {}".format(leftover_keys))
controlnet_data = new_sd
elif "controlnet_blocks.0.weight" in controlnet_data: #SD3 diffusers format
if "double_blocks.0.img_attn.norm.key_norm.scale" in controlnet_data:
return load_controlnet_flux_xlabs(controlnet_data)
elif "pos_embed_input.proj.weight" in controlnet_data:
return load_controlnet_mmdit(controlnet_data)
elif "controlnet_x_embedder.weight" in controlnet_data:
return load_controlnet_flux_instantx(controlnet_data)
pth_key = 'control_model.zero_convs.0.0.weight'
pth = False
key = 'zero_convs.0.0.weight'
if pth_key in controlnet_data:
pth = True
key = pth_key
prefix = "control_model."
elif key in controlnet_data:
prefix = ""
else:
net = load_t2i_adapter(controlnet_data)
if net is None:
logging.error("error checkpoint does not contain controlnet or t2i adapter data {}".format(ckpt_path))
return net
if controlnet_config is None:
model_config = comfy.model_detection.model_config_from_unet(controlnet_data, prefix, True)
supported_inference_dtypes = model_config.supported_inference_dtypes
controlnet_config = model_config.unet_config
load_device = comfy.model_management.get_torch_device()
if supported_inference_dtypes is None:
unet_dtype = comfy.model_management.unet_dtype()
else:
unet_dtype = comfy.model_management.unet_dtype(supported_dtypes=supported_inference_dtypes)
manual_cast_dtype = comfy.model_management.unet_manual_cast(unet_dtype, load_device)
if manual_cast_dtype is not None:
controlnet_config["operations"] = comfy.ops.manual_cast
controlnet_config["dtype"] = unet_dtype
controlnet_config["device"] = comfy.model_management.unet_offload_device()
controlnet_config.pop("out_channels")
controlnet_config["hint_channels"] = controlnet_data["{}input_hint_block.0.weight".format(prefix)].shape[1]
control_model = comfy.cldm.cldm.ControlNet(**controlnet_config)
if pth:
if 'difference' in controlnet_data:
if model is not None:
comfy.model_management.load_models_gpu([model])
model_sd = model.model_state_dict()
for x in controlnet_data:
c_m = "control_model."
if x.startswith(c_m):
sd_key = "diffusion_model.{}".format(x[len(c_m):])
if sd_key in model_sd:
cd = controlnet_data[x]
cd += model_sd[sd_key].type(cd.dtype).to(cd.device)
else:
logging.warning("WARNING: Loaded a diff controlnet without a model. It will very likely not work.")
class WeightsLoader(torch.nn.Module):
pass
w = WeightsLoader()
w.control_model = control_model
missing, unexpected = w.load_state_dict(controlnet_data, strict=False)
else:
missing, unexpected = control_model.load_state_dict(controlnet_data, strict=False)
if len(missing) > 0:
logging.warning("missing controlnet keys: {}".format(missing))
if len(unexpected) > 0:
logging.debug("unexpected controlnet keys: {}".format(unexpected))
global_average_pooling = False
filename = os.path.splitext(ckpt_path)[0]
if filename.endswith("_shuffle") or filename.endswith("_shuffle_fp16"): #TODO: smarter way of enabling global_average_pooling
global_average_pooling = True
control = ControlNet(control_model, global_average_pooling=global_average_pooling, load_device=load_device, manual_cast_dtype=manual_cast_dtype)
return control
class T2IAdapter(ControlBase):
def __init__(self, t2i_model, channels_in, compression_ratio, upscale_algorithm, device=None):
super().__init__(device)
self.t2i_model = t2i_model
self.channels_in = channels_in
self.control_input = None
self.compression_ratio = compression_ratio
self.upscale_algorithm = upscale_algorithm
def scale_image_to(self, width, height):
unshuffle_amount = self.t2i_model.unshuffle_amount
width = math.ceil(width / unshuffle_amount) * unshuffle_amount
height = math.ceil(height / unshuffle_amount) * unshuffle_amount
return width, height
def get_control(self, x_noisy, t, cond, batched_number):
control_prev = None
if self.previous_controlnet is not None:
control_prev = self.previous_controlnet.get_control(x_noisy, t, cond, batched_number)
if self.timestep_range is not None:
if t[0] > self.timestep_range[0] or t[0] < self.timestep_range[1]:
if control_prev is not None:
return control_prev
else:
return None
if self.cond_hint is None or x_noisy.shape[2] * self.compression_ratio != self.cond_hint.shape[2] or x_noisy.shape[3] * self.compression_ratio != self.cond_hint.shape[3]:
if self.cond_hint is not None:
del self.cond_hint
self.control_input = None
self.cond_hint = None
width, height = self.scale_image_to(x_noisy.shape[3] * self.compression_ratio, x_noisy.shape[2] * self.compression_ratio)
self.cond_hint = comfy.utils.common_upscale(self.cond_hint_original, width, height, self.upscale_algorithm, "center").float().to(self.device)
if self.channels_in == 1 and self.cond_hint.shape[1] > 1:
self.cond_hint = torch.mean(self.cond_hint, 1, keepdim=True)
if x_noisy.shape[0] != self.cond_hint.shape[0]:
self.cond_hint = broadcast_image_to(self.cond_hint, x_noisy.shape[0], batched_number)
if self.control_input is None:
self.t2i_model.to(x_noisy.dtype)
self.t2i_model.to(self.device)
self.control_input = self.t2i_model(self.cond_hint.to(x_noisy.dtype))
self.t2i_model.cpu()
control_input = {}
for k in self.control_input:
control_input[k] = list(map(lambda a: None if a is None else a.clone(), self.control_input[k]))
return self.control_merge(control_input, control_prev, x_noisy.dtype)
def copy(self):
c = T2IAdapter(self.t2i_model, self.channels_in, self.compression_ratio, self.upscale_algorithm)
self.copy_to(c)
return c
def load_t2i_adapter(t2i_data):
compression_ratio = 8
upscale_algorithm = 'nearest-exact'
if 'adapter' in t2i_data:
t2i_data = t2i_data['adapter']
if 'adapter.body.0.resnets.0.block1.weight' in t2i_data: #diffusers format
prefix_replace = {}
for i in range(4):
for j in range(2):
prefix_replace["adapter.body.{}.resnets.{}.".format(i, j)] = "body.{}.".format(i * 2 + j)
prefix_replace["adapter.body.{}.".format(i, j)] = "body.{}.".format(i * 2)
prefix_replace["adapter."] = ""
t2i_data = comfy.utils.state_dict_prefix_replace(t2i_data, prefix_replace)
keys = t2i_data.keys()
if "body.0.in_conv.weight" in keys:
cin = t2i_data['body.0.in_conv.weight'].shape[1]
model_ad = comfy.t2i_adapter.adapter.Adapter_light(cin=cin, channels=[320, 640, 1280, 1280], nums_rb=4)
elif 'conv_in.weight' in keys:
cin = t2i_data['conv_in.weight'].shape[1]
channel = t2i_data['conv_in.weight'].shape[0]
ksize = t2i_data['body.0.block2.weight'].shape[2]
use_conv = False
down_opts = list(filter(lambda a: a.endswith("down_opt.op.weight"), keys))
if len(down_opts) > 0:
use_conv = True
xl = False
if cin == 256 or cin == 768:
xl = True
model_ad = comfy.t2i_adapter.adapter.Adapter(cin=cin, channels=[channel, channel*2, channel*4, channel*4][:4], nums_rb=2, ksize=ksize, sk=True, use_conv=use_conv, xl=xl)
elif "backbone.0.0.weight" in keys:
model_ad = comfy.ldm.cascade.controlnet.ControlNet(c_in=t2i_data['backbone.0.0.weight'].shape[1], proj_blocks=[0, 4, 8, 12, 51, 55, 59, 63])
compression_ratio = 32
upscale_algorithm = 'bilinear'
elif "backbone.10.blocks.0.weight" in keys:
model_ad = comfy.ldm.cascade.controlnet.ControlNet(c_in=t2i_data['backbone.0.weight'].shape[1], bottleneck_mode="large", proj_blocks=[0, 4, 8, 12, 51, 55, 59, 63])
compression_ratio = 1
upscale_algorithm = 'nearest-exact'
else:
return None
missing, unexpected = model_ad.load_state_dict(t2i_data)
if len(missing) > 0:
logging.warning("t2i missing {}".format(missing))
if len(unexpected) > 0:
logging.debug("t2i unexpected {}".format(unexpected))
return T2IAdapter(model_ad, model_ad.input_channels, compression_ratio, upscale_algorithm)

View File

@ -1,14 +1,6 @@
import json
import os
import yaml
import folder_paths
from comfy.ldm.util import instantiate_from_config
from comfy.sd import ModelPatcher, load_model_weights, CLIP, VAE
import os.path as osp
import re
import torch
from safetensors.torch import load_file, save_file
import logging
# conversion code from https://github.com/huggingface/diffusers/blob/main/scripts/convert_diffusers_to_original_stable_diffusion.py
@ -157,6 +149,10 @@ vae_conversion_map_attn = [
("q.", "query."),
("k.", "key."),
("v.", "value."),
("q.", "to_q."),
("k.", "to_k."),
("v.", "to_v."),
("proj_out.", "to_out.0."),
("proj_out.", "proj_attn."),
]
@ -182,7 +178,7 @@ def convert_vae_state_dict(vae_state_dict):
for k, v in new_state_dict.items():
for weight_name in weights_to_convert:
if f"mid.attn_1.{weight_name}.weight" in k:
print(f"Reshaping {k} for SD format")
logging.debug(f"Reshaping {k} for SD format")
new_state_dict[k] = reshape_weight_for_sd(v)
return new_state_dict
@ -210,12 +206,29 @@ textenc_pattern = re.compile("|".join(protected.keys()))
# Ordering is from https://github.com/pytorch/pytorch/blob/master/test/cpp/api/modules.cpp
code2idx = {"q": 0, "k": 1, "v": 2}
# This function exists because at the time of writing torch.cat can't do fp8 with cuda
def cat_tensors(tensors):
x = 0
for t in tensors:
x += t.shape[0]
def convert_text_enc_state_dict_v20(text_enc_dict):
shape = [x] + list(tensors[0].shape)[1:]
out = torch.empty(shape, device=tensors[0].device, dtype=tensors[0].dtype)
x = 0
for t in tensors:
out[x:x + t.shape[0]] = t
x += t.shape[0]
return out
def convert_text_enc_state_dict_v20(text_enc_dict, prefix=""):
new_state_dict = {}
capture_qkv_weight = {}
capture_qkv_bias = {}
for k, v in text_enc_dict.items():
if not k.startswith(prefix):
continue
if (
k.endswith(".self_attn.q_proj.weight")
or k.endswith(".self_attn.k_proj.weight")
@ -240,20 +253,24 @@ def convert_text_enc_state_dict_v20(text_enc_dict):
capture_qkv_bias[k_pre][code2idx[k_code]] = v
continue
relabelled_key = textenc_pattern.sub(lambda m: protected[re.escape(m.group(0))], k)
new_state_dict[relabelled_key] = v
text_proj = "transformer.text_projection.weight"
if k.endswith(text_proj):
new_state_dict[k.replace(text_proj, "text_projection")] = v.transpose(0, 1).contiguous()
else:
relabelled_key = textenc_pattern.sub(lambda m: protected[re.escape(m.group(0))], k)
new_state_dict[relabelled_key] = v
for k_pre, tensors in capture_qkv_weight.items():
if None in tensors:
raise Exception("CORRUPTED MODEL: one of the q-k-v values for the text encoder was missing")
relabelled_key = textenc_pattern.sub(lambda m: protected[re.escape(m.group(0))], k_pre)
new_state_dict[relabelled_key + ".in_proj_weight"] = torch.cat(tensors)
new_state_dict[relabelled_key + ".in_proj_weight"] = cat_tensors(tensors)
for k_pre, tensors in capture_qkv_bias.items():
if None in tensors:
raise Exception("CORRUPTED MODEL: one of the q-k-v values for the text encoder was missing")
relabelled_key = textenc_pattern.sub(lambda m: protected[re.escape(m.group(0))], k_pre)
new_state_dict[relabelled_key + ".in_proj_bias"] = torch.cat(tensors)
new_state_dict[relabelled_key + ".in_proj_bias"] = cat_tensors(tensors)
return new_state_dict
@ -262,101 +279,3 @@ def convert_text_enc_state_dict(text_enc_dict):
return text_enc_dict
def load_diffusers(model_path, fp16=True, output_vae=True, output_clip=True, embedding_directory=None):
diffusers_unet_conf = json.load(open(osp.join(model_path, "unet/config.json")))
diffusers_scheduler_conf = json.load(open(osp.join(model_path, "scheduler/scheduler_config.json")))
# magic
v2 = diffusers_unet_conf["sample_size"] == 96
if 'prediction_type' in diffusers_scheduler_conf:
v_pred = diffusers_scheduler_conf['prediction_type'] == 'v_prediction'
if v2:
if v_pred:
config_path = folder_paths.get_full_path("configs", 'v2-inference-v.yaml')
else:
config_path = folder_paths.get_full_path("configs", 'v2-inference.yaml')
else:
config_path = folder_paths.get_full_path("configs", 'v1-inference.yaml')
with open(config_path, 'r') as stream:
config = yaml.safe_load(stream)
model_config_params = config['model']['params']
clip_config = model_config_params['cond_stage_config']
scale_factor = model_config_params['scale_factor']
vae_config = model_config_params['first_stage_config']
vae_config['scale_factor'] = scale_factor
model_config_params["unet_config"]["params"]["use_fp16"] = fp16
unet_path = osp.join(model_path, "unet", "diffusion_pytorch_model.safetensors")
vae_path = osp.join(model_path, "vae", "diffusion_pytorch_model.safetensors")
text_enc_path = osp.join(model_path, "text_encoder", "model.safetensors")
# Load models from safetensors if it exists, if it doesn't pytorch
if osp.exists(unet_path):
unet_state_dict = load_file(unet_path, device="cpu")
else:
unet_path = osp.join(model_path, "unet", "diffusion_pytorch_model.bin")
unet_state_dict = torch.load(unet_path, map_location="cpu")
if osp.exists(vae_path):
vae_state_dict = load_file(vae_path, device="cpu")
else:
vae_path = osp.join(model_path, "vae", "diffusion_pytorch_model.bin")
vae_state_dict = torch.load(vae_path, map_location="cpu")
if osp.exists(text_enc_path):
text_enc_dict = load_file(text_enc_path, device="cpu")
else:
text_enc_path = osp.join(model_path, "text_encoder", "pytorch_model.bin")
text_enc_dict = torch.load(text_enc_path, map_location="cpu")
# Convert the UNet model
unet_state_dict = convert_unet_state_dict(unet_state_dict)
unet_state_dict = {"model.diffusion_model." + k: v for k, v in unet_state_dict.items()}
# Convert the VAE model
vae_state_dict = convert_vae_state_dict(vae_state_dict)
vae_state_dict = {"first_stage_model." + k: v for k, v in vae_state_dict.items()}
# Easiest way to identify v2.0 model seems to be that the text encoder (OpenCLIP) is deeper
is_v20_model = "text_model.encoder.layers.22.layer_norm2.bias" in text_enc_dict
if is_v20_model:
# Need to add the tag 'transformer' in advance so we can knock it out from the final layer-norm
text_enc_dict = {"transformer." + k: v for k, v in text_enc_dict.items()}
text_enc_dict = convert_text_enc_state_dict_v20(text_enc_dict)
text_enc_dict = {"cond_stage_model.model." + k: v for k, v in text_enc_dict.items()}
else:
text_enc_dict = convert_text_enc_state_dict(text_enc_dict)
text_enc_dict = {"cond_stage_model.transformer." + k: v for k, v in text_enc_dict.items()}
# Put together new checkpoint
sd = {**unet_state_dict, **vae_state_dict, **text_enc_dict}
clip = None
vae = None
class WeightsLoader(torch.nn.Module):
pass
w = WeightsLoader()
load_state_dict_to = []
if output_vae:
vae = VAE(scale_factor=scale_factor, config=vae_config)
w.first_stage_model = vae.first_stage_model
load_state_dict_to = [w]
if output_clip:
clip = CLIP(config=clip_config, embedding_directory=embedding_directory)
w.cond_stage_model = clip.cond_stage_model
load_state_dict_to = [w]
model = instantiate_from_config(config["model"])
model = load_model_weights(model, sd, verbose=False, load_state_dict_to=load_state_dict_to)
if fp16:
model = model.half()
return ModelPatcher(model), clip, vae

36
comfy/diffusers_load.py Normal file
View File

@ -0,0 +1,36 @@
import os
import comfy.sd
def first_file(path, filenames):
for f in filenames:
p = os.path.join(path, f)
if os.path.exists(p):
return p
return None
def load_diffusers(model_path, output_vae=True, output_clip=True, embedding_directory=None):
diffusion_model_names = ["diffusion_pytorch_model.fp16.safetensors", "diffusion_pytorch_model.safetensors", "diffusion_pytorch_model.fp16.bin", "diffusion_pytorch_model.bin"]
unet_path = first_file(os.path.join(model_path, "unet"), diffusion_model_names)
vae_path = first_file(os.path.join(model_path, "vae"), diffusion_model_names)
text_encoder_model_names = ["model.fp16.safetensors", "model.safetensors", "pytorch_model.fp16.bin", "pytorch_model.bin"]
text_encoder1_path = first_file(os.path.join(model_path, "text_encoder"), text_encoder_model_names)
text_encoder2_path = first_file(os.path.join(model_path, "text_encoder_2"), text_encoder_model_names)
text_encoder_paths = [text_encoder1_path]
if text_encoder2_path is not None:
text_encoder_paths.append(text_encoder2_path)
unet = comfy.sd.load_diffusion_model(unet_path)
clip = None
if output_clip:
clip = comfy.sd.load_clip(text_encoder_paths, embedding_directory=embedding_directory)
vae = None
if output_vae:
sd = comfy.utils.load_torch_file(vae_path)
vae = comfy.sd.VAE(sd=sd)
return (unet, clip, vae)

View File

@ -180,7 +180,6 @@ class NoiseScheduleVP:
def model_wrapper(
model,
sampling_function,
noise_schedule,
model_type="noise",
model_kwargs={},
@ -295,7 +294,7 @@ def model_wrapper(
if t_continuous.reshape((-1,)).shape[0] == 1:
t_continuous = t_continuous.expand((x.shape[0]))
t_input = get_model_input_time(t_continuous)
output = sampling_function(model, x, t_input, **model_kwargs)
output = model(x, t_input, **model_kwargs)
if model_type == "noise":
return output
elif model_type == "x_start":
@ -359,9 +358,6 @@ class UniPC:
thresholding=False,
max_val=1.,
variant='bh1',
noise_mask=None,
masked_image=None,
noise=None,
):
"""Construct a UniPC.
@ -373,9 +369,6 @@ class UniPC:
self.predict_x0 = predict_x0
self.thresholding = thresholding
self.max_val = max_val
self.noise_mask = noise_mask
self.masked_image = masked_image
self.noise = noise
def dynamic_thresholding_fn(self, x0, t=None):
"""
@ -392,10 +385,7 @@ class UniPC:
"""
Return the noise prediction model.
"""
if self.noise_mask is not None:
return self.model(x, t) * self.noise_mask
else:
return self.model(x, t)
return self.model(x, t)
def data_prediction_fn(self, x, t):
"""
@ -410,8 +400,6 @@ class UniPC:
s = torch.quantile(torch.abs(x0).reshape((x0.shape[0], -1)), p, dim=1)
s = expand_dims(torch.maximum(s, self.max_val * torch.ones_like(s).to(s.device)), dims)
x0 = torch.clamp(x0, -s, s) / s
if self.noise_mask is not None:
x0 = x0 * self.noise_mask + (1. - self.noise_mask) * self.masked_image
return x0
def model_fn(self, x, t):
@ -689,7 +677,7 @@ class UniPC:
x_t = x_t_ - expand_dims(alpha_t * B_h, dims) * (corr_res + rhos_c[-1] * D1_t)
else:
x_t_ = (
expand_dims(torch.exp(log_alpha_t - log_alpha_prev_0), dimss) * x
expand_dims(torch.exp(log_alpha_t - log_alpha_prev_0), dims) * x
- expand_dims(sigma_t * h_phi_1, dims) * model_prev_0
)
if x_t is None:
@ -714,8 +702,8 @@ class UniPC:
method='singlestep', lower_order_final=True, denoise_to_zero=False, solver_type='dpm_solver',
atol=0.0078, rtol=0.05, corrector=False, callback=None, disable_pbar=False
):
t_0 = 1. / self.noise_schedule.total_N if t_end is None else t_end
t_T = self.noise_schedule.T if t_start is None else t_start
# t_0 = 1. / self.noise_schedule.total_N if t_end is None else t_end
# t_T = self.noise_schedule.T if t_start is None else t_start
device = x.device
steps = len(timesteps) - 1
if method == 'multistep':
@ -724,8 +712,6 @@ class UniPC:
assert timesteps.shape[0] - 1 == steps
# with torch.no_grad():
for step_index in trange(steps, disable=disable_pbar):
if self.noise_mask is not None:
x = x * self.noise_mask + (1. - self.noise_mask) * (self.masked_image * self.noise_schedule.marginal_alpha(timesteps[step_index]) + self.noise * self.noise_schedule.marginal_std(timesteps[step_index]))
if step_index == 0:
vec_t = timesteps[0].expand((x.shape[0]))
model_prev_list = [self.model_fn(x, vec_t)]
@ -767,11 +753,11 @@ class UniPC:
model_x = self.model_fn(x, vec_t)
model_prev_list[-1] = model_x
if callback is not None:
callback(step_index, model_prev_list[-1], x, steps)
callback({'x': x, 'i': step_index, 'denoised': model_prev_list[-1]})
else:
raise NotImplementedError()
if denoise_to_zero:
x = self.denoise_to_zero_fn(x, torch.ones((x.shape[0],)).to(device) * t_0)
# if denoise_to_zero:
# x = self.denoise_to_zero_fn(x, torch.ones((x.shape[0],)).to(device) * t_0)
return x
@ -834,52 +820,56 @@ def expand_dims(v, dims):
return v[(...,) + (None,)*(dims - 1)]
class SigmaConvert:
schedule = ""
def marginal_log_mean_coeff(self, sigma):
return 0.5 * torch.log(1 / ((sigma * sigma) + 1))
def sample_unipc(model, noise, image, sigmas, sampling_function, max_denoise, extra_args=None, callback=None, disable=False, noise_mask=None, variant='bh1'):
to_zero = False
def marginal_alpha(self, t):
return torch.exp(self.marginal_log_mean_coeff(t))
def marginal_std(self, t):
return torch.sqrt(1. - torch.exp(2. * self.marginal_log_mean_coeff(t)))
def marginal_lambda(self, t):
"""
Compute lambda_t = log(alpha_t) - log(sigma_t) of a given continuous-time label t in [0, T].
"""
log_mean_coeff = self.marginal_log_mean_coeff(t)
log_std = 0.5 * torch.log(1. - torch.exp(2. * log_mean_coeff))
return log_mean_coeff - log_std
def predict_eps_sigma(model, input, sigma_in, **kwargs):
sigma = sigma_in.view(sigma_in.shape[:1] + (1,) * (input.ndim - 1))
input = input * ((sigma ** 2 + 1.0) ** 0.5)
return (input - model(input, sigma_in, **kwargs)) / sigma
def sample_unipc(model, noise, sigmas, extra_args=None, callback=None, disable=False, variant='bh1'):
timesteps = sigmas.clone()
if sigmas[-1] == 0:
timesteps = torch.nn.functional.interpolate(sigmas[None,None,:-1], size=(len(sigmas),), mode='linear')[0][0]
to_zero = True
timesteps = sigmas[:]
timesteps[-1] = 0.001
else:
timesteps = sigmas.clone()
ns = SigmaConvert()
for s in range(timesteps.shape[0]):
timesteps[s] = (model.sigma_to_t(timesteps[s]) / 1000) + (1 / len(model.sigmas))
ns = NoiseScheduleVP('discrete', alphas_cumprod=model.inner_model.alphas_cumprod)
if image is not None:
img = image * ns.marginal_alpha(timesteps[0])
if max_denoise:
noise_mult = 1.0
else:
noise_mult = ns.marginal_std(timesteps[0])
img += noise * noise_mult
else:
img = noise
if to_zero:
timesteps[-1] = (1 / len(model.sigmas))
device = noise.device
if model.parameterization == "v":
model_type = "v"
else:
model_type = "noise"
noise = noise / torch.sqrt(1.0 + timesteps[0] ** 2.0)
model_type = "noise"
model_fn = model_wrapper(
model.inner_model.inner_model.apply_model,
sampling_function,
lambda input, sigma, **kwargs: predict_eps_sigma(model, input, sigma, **kwargs),
ns,
model_type=model_type,
guidance_type="uncond",
model_kwargs=extra_args,
)
order = min(3, len(timesteps) - 1)
uni_pc = UniPC(model_fn, ns, predict_x0=True, thresholding=False, noise_mask=noise_mask, masked_image=image, noise=noise, variant=variant)
x = uni_pc.sample(img, timesteps=timesteps, skip_type="time_uniform", method="multistep", order=order, lower_order_final=True, callback=callback, disable_pbar=disable)
if not to_zero:
x /= ns.marginal_alpha(timesteps[-1])
order = min(3, len(timesteps) - 2)
uni_pc = UniPC(model_fn, ns, predict_x0=True, thresholding=False, variant=variant)
x = uni_pc.sample(noise, timesteps=timesteps, skip_type="time_uniform", method="multistep", order=order, lower_order_final=True, callback=callback, disable_pbar=disable)
x /= ns.marginal_alpha(timesteps[-1])
return x
def sample_unipc_bh2(model, noise, sigmas, extra_args=None, callback=None, disable=False):
return sample_unipc(model, noise, sigmas, extra_args, callback, disable, variant='bh2')

62
comfy/float.py Normal file
View File

@ -0,0 +1,62 @@
import torch
import math
def calc_mantissa(abs_x, exponent, normal_mask, MANTISSA_BITS, EXPONENT_BIAS, generator=None):
mantissa_scaled = torch.where(
normal_mask,
(abs_x / (2.0 ** (exponent - EXPONENT_BIAS)) - 1.0) * (2**MANTISSA_BITS),
(abs_x / (2.0 ** (-EXPONENT_BIAS + 1 - MANTISSA_BITS)))
)
mantissa_scaled += torch.rand(mantissa_scaled.size(), dtype=mantissa_scaled.dtype, layout=mantissa_scaled.layout, device=mantissa_scaled.device, generator=generator)
return mantissa_scaled.floor() / (2**MANTISSA_BITS)
#Not 100% sure about this
def manual_stochastic_round_to_float8(x, dtype, generator=None):
if dtype == torch.float8_e4m3fn:
EXPONENT_BITS, MANTISSA_BITS, EXPONENT_BIAS = 4, 3, 7
elif dtype == torch.float8_e5m2:
EXPONENT_BITS, MANTISSA_BITS, EXPONENT_BIAS = 5, 2, 15
else:
raise ValueError("Unsupported dtype")
x = x.half()
sign = torch.sign(x)
abs_x = x.abs()
sign = torch.where(abs_x == 0, 0, sign)
# Combine exponent calculation and clamping
exponent = torch.clamp(
torch.floor(torch.log2(abs_x)) + EXPONENT_BIAS,
0, 2**EXPONENT_BITS - 1
)
# Combine mantissa calculation and rounding
normal_mask = ~(exponent == 0)
abs_x[:] = calc_mantissa(abs_x, exponent, normal_mask, MANTISSA_BITS, EXPONENT_BIAS, generator=generator)
sign *= torch.where(
normal_mask,
(2.0 ** (exponent - EXPONENT_BIAS)) * (1.0 + abs_x),
(2.0 ** (-EXPONENT_BIAS + 1)) * abs_x
)
del abs_x
return sign.to(dtype=dtype)
def stochastic_rounding(value, dtype, seed=0):
if dtype == torch.float32:
return value.to(dtype=torch.float32)
if dtype == torch.float16:
return value.to(dtype=torch.float16)
if dtype == torch.bfloat16:
return value.to(dtype=torch.bfloat16)
if dtype == torch.float8_e4m3fn or dtype == torch.float8_e5m2:
generator = torch.Generator(device=value.device)
generator.manual_seed(seed)
return manual_stochastic_round_to_float8(value, dtype, generator=generator)
return value.to(dtype=dtype)

View File

@ -1,8 +1,9 @@
import torch
from torch import nn, einsum
from torch import nn
from .ldm.modules.attention import CrossAttention
from inspect import isfunction
import comfy.ops
ops = comfy.ops.manual_cast
def exists(val):
return val is not None
@ -22,7 +23,7 @@ def default(val, d):
class GEGLU(nn.Module):
def __init__(self, dim_in, dim_out):
super().__init__()
self.proj = nn.Linear(dim_in, dim_out * 2)
self.proj = ops.Linear(dim_in, dim_out * 2)
def forward(self, x):
x, gate = self.proj(x).chunk(2, dim=-1)
@ -35,14 +36,14 @@ class FeedForward(nn.Module):
inner_dim = int(dim * mult)
dim_out = default(dim_out, dim)
project_in = nn.Sequential(
nn.Linear(dim, inner_dim),
ops.Linear(dim, inner_dim),
nn.GELU()
) if not glu else GEGLU(dim, inner_dim)
self.net = nn.Sequential(
project_in,
nn.Dropout(dropout),
nn.Linear(inner_dim, dim_out)
ops.Linear(inner_dim, dim_out)
)
def forward(self, x):
@ -57,11 +58,12 @@ class GatedCrossAttentionDense(nn.Module):
query_dim=query_dim,
context_dim=context_dim,
heads=n_heads,
dim_head=d_head)
dim_head=d_head,
operations=ops)
self.ff = FeedForward(query_dim, glu=True)
self.norm1 = nn.LayerNorm(query_dim)
self.norm2 = nn.LayerNorm(query_dim)
self.norm1 = ops.LayerNorm(query_dim)
self.norm2 = ops.LayerNorm(query_dim)
self.register_parameter('alpha_attn', nn.Parameter(torch.tensor(0.)))
self.register_parameter('alpha_dense', nn.Parameter(torch.tensor(0.)))
@ -87,17 +89,18 @@ class GatedSelfAttentionDense(nn.Module):
# we need a linear projection since we need cat visual feature and obj
# feature
self.linear = nn.Linear(context_dim, query_dim)
self.linear = ops.Linear(context_dim, query_dim)
self.attn = CrossAttention(
query_dim=query_dim,
context_dim=query_dim,
heads=n_heads,
dim_head=d_head)
dim_head=d_head,
operations=ops)
self.ff = FeedForward(query_dim, glu=True)
self.norm1 = nn.LayerNorm(query_dim)
self.norm2 = nn.LayerNorm(query_dim)
self.norm1 = ops.LayerNorm(query_dim)
self.norm2 = ops.LayerNorm(query_dim)
self.register_parameter('alpha_attn', nn.Parameter(torch.tensor(0.)))
self.register_parameter('alpha_dense', nn.Parameter(torch.tensor(0.)))
@ -126,14 +129,14 @@ class GatedSelfAttentionDense2(nn.Module):
# we need a linear projection since we need cat visual feature and obj
# feature
self.linear = nn.Linear(context_dim, query_dim)
self.linear = ops.Linear(context_dim, query_dim)
self.attn = CrossAttention(
query_dim=query_dim, context_dim=query_dim, dim_head=d_head)
query_dim=query_dim, context_dim=query_dim, dim_head=d_head, operations=ops)
self.ff = FeedForward(query_dim, glu=True)
self.norm1 = nn.LayerNorm(query_dim)
self.norm2 = nn.LayerNorm(query_dim)
self.norm1 = ops.LayerNorm(query_dim)
self.norm2 = ops.LayerNorm(query_dim)
self.register_parameter('alpha_attn', nn.Parameter(torch.tensor(0.)))
self.register_parameter('alpha_dense', nn.Parameter(torch.tensor(0.)))
@ -201,11 +204,11 @@ class PositionNet(nn.Module):
self.position_dim = fourier_freqs * 2 * 4 # 2 is sin&cos, 4 is xyxy
self.linears = nn.Sequential(
nn.Linear(self.in_dim + self.position_dim, 512),
ops.Linear(self.in_dim + self.position_dim, 512),
nn.SiLU(),
nn.Linear(512, 512),
ops.Linear(512, 512),
nn.SiLU(),
nn.Linear(512, out_dim),
ops.Linear(512, out_dim),
)
self.null_positive_feature = torch.nn.Parameter(
@ -216,13 +219,14 @@ class PositionNet(nn.Module):
def forward(self, boxes, masks, positive_embeddings):
B, N, _ = boxes.shape
masks = masks.unsqueeze(-1)
positive_embeddings = positive_embeddings
# embedding position (it may includes padding as placeholder)
xyxy_embedding = self.fourier_embedder(boxes) # B*N*4 --> B*N*C
# learnable null embedding
positive_null = self.null_positive_feature.view(1, 1, -1)
xyxy_null = self.null_position_feature.view(1, 1, -1)
positive_null = self.null_positive_feature.to(device=boxes.device, dtype=boxes.dtype).view(1, 1, -1)
xyxy_null = self.null_position_feature.to(device=boxes.device, dtype=boxes.dtype).view(1, 1, -1)
# replace padding with learnable null embedding
positive_embeddings = positive_embeddings * \
@ -242,28 +246,15 @@ class Gligen(nn.Module):
self.position_net = position_net
self.key_dim = key_dim
self.max_objs = 30
self.lowvram = False
self.current_device = torch.device("cpu")
def _set_position(self, boxes, masks, positive_embeddings):
if self.lowvram == True:
self.position_net.to(boxes.device)
objs = self.position_net(boxes, masks, positive_embeddings)
if self.lowvram == True:
self.position_net.cpu()
def func_lowvram(key, x):
module = self.module_list[key]
module.to(x.device)
r = module(x, objs)
module.cpu()
return r
return func_lowvram
else:
def func(key, x):
module = self.module_list[key]
return module(x, objs)
return func
def func(x, extra_options):
key = extra_options["transformer_index"]
module = self.module_list[key]
return module(x, objs.to(device=x.device, dtype=x.dtype))
return func
def set_position(self, latent_image_shape, position_params, device):
batch, c, h, w = latent_image_shape
@ -308,14 +299,6 @@ class Gligen(nn.Module):
masks.to(device),
conds.to(device))
def set_lowvram(self, value=True):
self.lowvram = value
def cleanup(self):
self.lowvram = False
def get_models(self):
return [self]
def load_gligen(sd):
sd_k = sd.keys()

View File

@ -1,105 +0,0 @@
from functools import reduce
import math
import operator
import numpy as np
from skimage import transform
import torch
from torch import nn
def translate2d(tx, ty):
mat = [[1, 0, tx],
[0, 1, ty],
[0, 0, 1]]
return torch.tensor(mat, dtype=torch.float32)
def scale2d(sx, sy):
mat = [[sx, 0, 0],
[ 0, sy, 0],
[ 0, 0, 1]]
return torch.tensor(mat, dtype=torch.float32)
def rotate2d(theta):
mat = [[torch.cos(theta), torch.sin(-theta), 0],
[torch.sin(theta), torch.cos(theta), 0],
[ 0, 0, 1]]
return torch.tensor(mat, dtype=torch.float32)
class KarrasAugmentationPipeline:
def __init__(self, a_prob=0.12, a_scale=2**0.2, a_aniso=2**0.2, a_trans=1/8):
self.a_prob = a_prob
self.a_scale = a_scale
self.a_aniso = a_aniso
self.a_trans = a_trans
def __call__(self, image):
h, w = image.size
mats = [translate2d(h / 2 - 0.5, w / 2 - 0.5)]
# x-flip
a0 = torch.randint(2, []).float()
mats.append(scale2d(1 - 2 * a0, 1))
# y-flip
do = (torch.rand([]) < self.a_prob).float()
a1 = torch.randint(2, []).float() * do
mats.append(scale2d(1, 1 - 2 * a1))
# scaling
do = (torch.rand([]) < self.a_prob).float()
a2 = torch.randn([]) * do
mats.append(scale2d(self.a_scale ** a2, self.a_scale ** a2))
# rotation
do = (torch.rand([]) < self.a_prob).float()
a3 = (torch.rand([]) * 2 * math.pi - math.pi) * do
mats.append(rotate2d(-a3))
# anisotropy
do = (torch.rand([]) < self.a_prob).float()
a4 = (torch.rand([]) * 2 * math.pi - math.pi) * do
a5 = torch.randn([]) * do
mats.append(rotate2d(a4))
mats.append(scale2d(self.a_aniso ** a5, self.a_aniso ** -a5))
mats.append(rotate2d(-a4))
# translation
do = (torch.rand([]) < self.a_prob).float()
a6 = torch.randn([]) * do
a7 = torch.randn([]) * do
mats.append(translate2d(self.a_trans * w * a6, self.a_trans * h * a7))
# form the transformation matrix and conditioning vector
mats.append(translate2d(-h / 2 + 0.5, -w / 2 + 0.5))
mat = reduce(operator.matmul, mats)
cond = torch.stack([a0, a1, a2, a3.cos() - 1, a3.sin(), a5 * a4.cos(), a5 * a4.sin(), a6, a7])
# apply the transformation
image_orig = np.array(image, dtype=np.float32) / 255
if image_orig.ndim == 2:
image_orig = image_orig[..., None]
tf = transform.AffineTransform(mat.numpy())
image = transform.warp(image_orig, tf.inverse, order=3, mode='reflect', cval=0.5, clip=False, preserve_range=True)
image_orig = torch.as_tensor(image_orig).movedim(2, 0) * 2 - 1
image = torch.as_tensor(image).movedim(2, 0) * 2 - 1
return image, image_orig, cond
class KarrasAugmentWrapper(nn.Module):
def __init__(self, model):
super().__init__()
self.inner_model = model
def forward(self, input, sigma, aug_cond=None, mapping_cond=None, **kwargs):
if aug_cond is None:
aug_cond = input.new_zeros([input.shape[0], 9])
if mapping_cond is None:
mapping_cond = aug_cond
else:
mapping_cond = torch.cat([aug_cond, mapping_cond], dim=1)
return self.inner_model(input, sigma, mapping_cond=mapping_cond, **kwargs)
def set_skip_stages(self, skip_stages):
return self.inner_model.set_skip_stages(skip_stages)
def set_patch_size(self, patch_size):
return self.inner_model.set_patch_size(patch_size)

View File

@ -1,110 +0,0 @@
from functools import partial
import json
import math
import warnings
from jsonmerge import merge
from . import augmentation, layers, models, utils
def load_config(file):
defaults = {
'model': {
'sigma_data': 1.,
'patch_size': 1,
'dropout_rate': 0.,
'augment_wrapper': True,
'augment_prob': 0.,
'mapping_cond_dim': 0,
'unet_cond_dim': 0,
'cross_cond_dim': 0,
'cross_attn_depths': None,
'skip_stages': 0,
'has_variance': False,
},
'dataset': {
'type': 'imagefolder',
},
'optimizer': {
'type': 'adamw',
'lr': 1e-4,
'betas': [0.95, 0.999],
'eps': 1e-6,
'weight_decay': 1e-3,
},
'lr_sched': {
'type': 'inverse',
'inv_gamma': 20000.,
'power': 1.,
'warmup': 0.99,
},
'ema_sched': {
'type': 'inverse',
'power': 0.6667,
'max_value': 0.9999
},
}
config = json.load(file)
return merge(defaults, config)
def make_model(config):
config = config['model']
assert config['type'] == 'image_v1'
model = models.ImageDenoiserModelV1(
config['input_channels'],
config['mapping_out'],
config['depths'],
config['channels'],
config['self_attn_depths'],
config['cross_attn_depths'],
patch_size=config['patch_size'],
dropout_rate=config['dropout_rate'],
mapping_cond_dim=config['mapping_cond_dim'] + (9 if config['augment_wrapper'] else 0),
unet_cond_dim=config['unet_cond_dim'],
cross_cond_dim=config['cross_cond_dim'],
skip_stages=config['skip_stages'],
has_variance=config['has_variance'],
)
if config['augment_wrapper']:
model = augmentation.KarrasAugmentWrapper(model)
return model
def make_denoiser_wrapper(config):
config = config['model']
sigma_data = config.get('sigma_data', 1.)
has_variance = config.get('has_variance', False)
if not has_variance:
return partial(layers.Denoiser, sigma_data=sigma_data)
return partial(layers.DenoiserWithVariance, sigma_data=sigma_data)
def make_sample_density(config):
sd_config = config['sigma_sample_density']
sigma_data = config['sigma_data']
if sd_config['type'] == 'lognormal':
loc = sd_config['mean'] if 'mean' in sd_config else sd_config['loc']
scale = sd_config['std'] if 'std' in sd_config else sd_config['scale']
return partial(utils.rand_log_normal, loc=loc, scale=scale)
if sd_config['type'] == 'loglogistic':
loc = sd_config['loc'] if 'loc' in sd_config else math.log(sigma_data)
scale = sd_config['scale'] if 'scale' in sd_config else 0.5
min_value = sd_config['min_value'] if 'min_value' in sd_config else 0.
max_value = sd_config['max_value'] if 'max_value' in sd_config else float('inf')
return partial(utils.rand_log_logistic, loc=loc, scale=scale, min_value=min_value, max_value=max_value)
if sd_config['type'] == 'loguniform':
min_value = sd_config['min_value'] if 'min_value' in sd_config else config['sigma_min']
max_value = sd_config['max_value'] if 'max_value' in sd_config else config['sigma_max']
return partial(utils.rand_log_uniform, min_value=min_value, max_value=max_value)
if sd_config['type'] == 'v-diffusion':
min_value = sd_config['min_value'] if 'min_value' in sd_config else 0.
max_value = sd_config['max_value'] if 'max_value' in sd_config else float('inf')
return partial(utils.rand_v_diffusion, sigma_data=sigma_data, min_value=min_value, max_value=max_value)
if sd_config['type'] == 'split-lognormal':
loc = sd_config['mean'] if 'mean' in sd_config else sd_config['loc']
scale_1 = sd_config['std_1'] if 'std_1' in sd_config else sd_config['scale_1']
scale_2 = sd_config['std_2'] if 'std_2' in sd_config else sd_config['scale_2']
return partial(utils.rand_split_log_normal, loc=loc, scale_1=scale_1, scale_2=scale_2)
raise ValueError('Unknown sample density type')

121
comfy/k_diffusion/deis.py Normal file
View File

@ -0,0 +1,121 @@
#Taken from: https://github.com/zju-pi/diff-sampler/blob/main/gits-main/solver_utils.py
#under Apache 2 license
import torch
import numpy as np
# A pytorch reimplementation of DEIS (https://github.com/qsh-zh/deis).
#############################
### Utils for DEIS solver ###
#############################
#----------------------------------------------------------------------------
# Transfer from the input time (sigma) used in EDM to that (t) used in DEIS.
def edm2t(edm_steps, epsilon_s=1e-3, sigma_min=0.002, sigma_max=80):
vp_sigma = lambda beta_d, beta_min: lambda t: (np.e ** (0.5 * beta_d * (t ** 2) + beta_min * t) - 1) ** 0.5
vp_sigma_inv = lambda beta_d, beta_min: lambda sigma: ((beta_min ** 2 + 2 * beta_d * (sigma ** 2 + 1).log()).sqrt() - beta_min) / beta_d
vp_beta_d = 2 * (np.log(torch.tensor(sigma_min).cpu() ** 2 + 1) / epsilon_s - np.log(torch.tensor(sigma_max).cpu() ** 2 + 1)) / (epsilon_s - 1)
vp_beta_min = np.log(torch.tensor(sigma_max).cpu() ** 2 + 1) - 0.5 * vp_beta_d
t_steps = vp_sigma_inv(vp_beta_d.clone().detach().cpu(), vp_beta_min.clone().detach().cpu())(edm_steps.clone().detach().cpu())
return t_steps, vp_beta_min, vp_beta_d + vp_beta_min
#----------------------------------------------------------------------------
def cal_poly(prev_t, j, taus):
poly = 1
for k in range(prev_t.shape[0]):
if k == j:
continue
poly *= (taus - prev_t[k]) / (prev_t[j] - prev_t[k])
return poly
#----------------------------------------------------------------------------
# Transfer from t to alpha_t.
def t2alpha_fn(beta_0, beta_1, t):
return torch.exp(-0.5 * t ** 2 * (beta_1 - beta_0) - t * beta_0)
#----------------------------------------------------------------------------
def cal_intergrand(beta_0, beta_1, taus):
with torch.inference_mode(mode=False):
taus = taus.clone()
beta_0 = beta_0.clone()
beta_1 = beta_1.clone()
with torch.enable_grad():
taus.requires_grad_(True)
alpha = t2alpha_fn(beta_0, beta_1, taus)
log_alpha = alpha.log()
log_alpha.sum().backward()
d_log_alpha_dtau = taus.grad
integrand = -0.5 * d_log_alpha_dtau / torch.sqrt(alpha * (1 - alpha))
return integrand
#----------------------------------------------------------------------------
def get_deis_coeff_list(t_steps, max_order, N=10000, deis_mode='tab'):
"""
Get the coefficient list for DEIS sampling.
Args:
t_steps: A pytorch tensor. The time steps for sampling.
max_order: A `int`. Maximum order of the solver. 1 <= max_order <= 4
N: A `int`. Use how many points to perform the numerical integration when deis_mode=='tab'.
deis_mode: A `str`. Select between 'tab' and 'rhoab'. Type of DEIS.
Returns:
A pytorch tensor. A batch of generated samples or sampling trajectories if return_inters=True.
"""
if deis_mode == 'tab':
t_steps, beta_0, beta_1 = edm2t(t_steps)
C = []
for i, (t_cur, t_next) in enumerate(zip(t_steps[:-1], t_steps[1:])):
order = min(i+1, max_order)
if order == 1:
C.append([])
else:
taus = torch.linspace(t_cur, t_next, N) # split the interval for integral appximation
dtau = (t_next - t_cur) / N
prev_t = t_steps[[i - k for k in range(order)]]
coeff_temp = []
integrand = cal_intergrand(beta_0, beta_1, taus)
for j in range(order):
poly = cal_poly(prev_t, j, taus)
coeff_temp.append(torch.sum(integrand * poly) * dtau)
C.append(coeff_temp)
elif deis_mode == 'rhoab':
# Analytical solution, second order
def get_def_intergral_2(a, b, start, end, c):
coeff = (end**3 - start**3) / 3 - (end**2 - start**2) * (a + b) / 2 + (end - start) * a * b
return coeff / ((c - a) * (c - b))
# Analytical solution, third order
def get_def_intergral_3(a, b, c, start, end, d):
coeff = (end**4 - start**4) / 4 - (end**3 - start**3) * (a + b + c) / 3 \
+ (end**2 - start**2) * (a*b + a*c + b*c) / 2 - (end - start) * a * b * c
return coeff / ((d - a) * (d - b) * (d - c))
C = []
for i, (t_cur, t_next) in enumerate(zip(t_steps[:-1], t_steps[1:])):
order = min(i, max_order)
if order == 0:
C.append([])
else:
prev_t = t_steps[[i - k for k in range(order+1)]]
if order == 1:
coeff_cur = ((t_next - prev_t[1])**2 - (t_cur - prev_t[1])**2) / (2 * (t_cur - prev_t[1]))
coeff_prev1 = (t_next - t_cur)**2 / (2 * (prev_t[1] - t_cur))
coeff_temp = [coeff_cur, coeff_prev1]
elif order == 2:
coeff_cur = get_def_intergral_2(prev_t[1], prev_t[2], t_cur, t_next, t_cur)
coeff_prev1 = get_def_intergral_2(t_cur, prev_t[2], t_cur, t_next, prev_t[1])
coeff_prev2 = get_def_intergral_2(t_cur, prev_t[1], t_cur, t_next, prev_t[2])
coeff_temp = [coeff_cur, coeff_prev1, coeff_prev2]
elif order == 3:
coeff_cur = get_def_intergral_3(prev_t[1], prev_t[2], prev_t[3], t_cur, t_next, t_cur)
coeff_prev1 = get_def_intergral_3(t_cur, prev_t[2], prev_t[3], t_cur, t_next, prev_t[1])
coeff_prev2 = get_def_intergral_3(t_cur, prev_t[1], prev_t[3], t_cur, t_next, prev_t[2])
coeff_prev3 = get_def_intergral_3(t_cur, prev_t[1], prev_t[2], t_cur, t_next, prev_t[3])
coeff_temp = [coeff_cur, coeff_prev1, coeff_prev2, coeff_prev3]
C.append(coeff_temp)
return C

View File

@ -1,134 +0,0 @@
import math
import os
from pathlib import Path
from cleanfid.inception_torchscript import InceptionV3W
import clip
from resize_right import resize
import torch
from torch import nn
from torch.nn import functional as F
from torchvision import transforms
from tqdm.auto import trange
from . import utils
class InceptionV3FeatureExtractor(nn.Module):
def __init__(self, device='cpu'):
super().__init__()
path = Path(os.environ.get('XDG_CACHE_HOME', Path.home() / '.cache')) / 'k-diffusion'
url = 'https://nvlabs-fi-cdn.nvidia.com/stylegan2-ada-pytorch/pretrained/metrics/inception-2015-12-05.pt'
digest = 'f58cb9b6ec323ed63459aa4fb441fe750cfe39fafad6da5cb504a16f19e958f4'
utils.download_file(path / 'inception-2015-12-05.pt', url, digest)
self.model = InceptionV3W(str(path), resize_inside=False).to(device)
self.size = (299, 299)
def forward(self, x):
if x.shape[2:4] != self.size:
x = resize(x, out_shape=self.size, pad_mode='reflect')
if x.shape[1] == 1:
x = torch.cat([x] * 3, dim=1)
x = (x * 127.5 + 127.5).clamp(0, 255)
return self.model(x)
class CLIPFeatureExtractor(nn.Module):
def __init__(self, name='ViT-L/14@336px', device='cpu'):
super().__init__()
self.model = clip.load(name, device=device)[0].eval().requires_grad_(False)
self.normalize = transforms.Normalize(mean=(0.48145466, 0.4578275, 0.40821073),
std=(0.26862954, 0.26130258, 0.27577711))
self.size = (self.model.visual.input_resolution, self.model.visual.input_resolution)
def forward(self, x):
if x.shape[2:4] != self.size:
x = resize(x.add(1).div(2), out_shape=self.size, pad_mode='reflect').clamp(0, 1)
x = self.normalize(x)
x = self.model.encode_image(x).float()
x = F.normalize(x) * x.shape[1] ** 0.5
return x
def compute_features(accelerator, sample_fn, extractor_fn, n, batch_size):
n_per_proc = math.ceil(n / accelerator.num_processes)
feats_all = []
try:
for i in trange(0, n_per_proc, batch_size, disable=not accelerator.is_main_process):
cur_batch_size = min(n - i, batch_size)
samples = sample_fn(cur_batch_size)[:cur_batch_size]
feats_all.append(accelerator.gather(extractor_fn(samples)))
except StopIteration:
pass
return torch.cat(feats_all)[:n]
def polynomial_kernel(x, y):
d = x.shape[-1]
dot = x @ y.transpose(-2, -1)
return (dot / d + 1) ** 3
def squared_mmd(x, y, kernel=polynomial_kernel):
m = x.shape[-2]
n = y.shape[-2]
kxx = kernel(x, x)
kyy = kernel(y, y)
kxy = kernel(x, y)
kxx_sum = kxx.sum([-1, -2]) - kxx.diagonal(dim1=-1, dim2=-2).sum(-1)
kyy_sum = kyy.sum([-1, -2]) - kyy.diagonal(dim1=-1, dim2=-2).sum(-1)
kxy_sum = kxy.sum([-1, -2])
term_1 = kxx_sum / m / (m - 1)
term_2 = kyy_sum / n / (n - 1)
term_3 = kxy_sum * 2 / m / n
return term_1 + term_2 - term_3
@utils.tf32_mode(matmul=False)
def kid(x, y, max_size=5000):
x_size, y_size = x.shape[0], y.shape[0]
n_partitions = math.ceil(max(x_size / max_size, y_size / max_size))
total_mmd = x.new_zeros([])
for i in range(n_partitions):
cur_x = x[round(i * x_size / n_partitions):round((i + 1) * x_size / n_partitions)]
cur_y = y[round(i * y_size / n_partitions):round((i + 1) * y_size / n_partitions)]
total_mmd = total_mmd + squared_mmd(cur_x, cur_y)
return total_mmd / n_partitions
class _MatrixSquareRootEig(torch.autograd.Function):
@staticmethod
def forward(ctx, a):
vals, vecs = torch.linalg.eigh(a)
ctx.save_for_backward(vals, vecs)
return vecs @ vals.abs().sqrt().diag_embed() @ vecs.transpose(-2, -1)
@staticmethod
def backward(ctx, grad_output):
vals, vecs = ctx.saved_tensors
d = vals.abs().sqrt().unsqueeze(-1).repeat_interleave(vals.shape[-1], -1)
vecs_t = vecs.transpose(-2, -1)
return vecs @ (vecs_t @ grad_output @ vecs / (d + d.transpose(-2, -1))) @ vecs_t
def sqrtm_eig(a):
if a.ndim < 2:
raise RuntimeError('tensor of matrices must have at least 2 dimensions')
if a.shape[-2] != a.shape[-1]:
raise RuntimeError('tensor must be batches of square matrices')
return _MatrixSquareRootEig.apply(a)
@utils.tf32_mode(matmul=False)
def fid(x, y, eps=1e-8):
x_mean = x.mean(dim=0)
y_mean = y.mean(dim=0)
mean_term = (x_mean - y_mean).pow(2).sum()
x_cov = torch.cov(x.T)
y_cov = torch.cov(y.T)
eps_eye = torch.eye(x_cov.shape[0], device=x_cov.device, dtype=x_cov.dtype) * eps
x_cov = x_cov + eps_eye
y_cov = y_cov + eps_eye
x_cov_sqrt = sqrtm_eig(x_cov)
cov_term = torch.trace(x_cov + y_cov - 2 * sqrtm_eig(x_cov_sqrt @ y_cov @ x_cov_sqrt))
return mean_term + cov_term

View File

@ -1,179 +0,0 @@
import math
import torch
from torch import nn
from . import sampling, utils
class VDenoiser(nn.Module):
"""A v-diffusion-pytorch model wrapper for k-diffusion."""
def __init__(self, inner_model):
super().__init__()
self.inner_model = inner_model
self.sigma_data = 1.
def get_scalings(self, sigma):
c_skip = self.sigma_data ** 2 / (sigma ** 2 + self.sigma_data ** 2)
c_out = -sigma * self.sigma_data / (sigma ** 2 + self.sigma_data ** 2) ** 0.5
c_in = 1 / (sigma ** 2 + self.sigma_data ** 2) ** 0.5
return c_skip, c_out, c_in
def sigma_to_t(self, sigma):
return sigma.atan() / math.pi * 2
def t_to_sigma(self, t):
return (t * math.pi / 2).tan()
def loss(self, input, noise, sigma, **kwargs):
c_skip, c_out, c_in = [utils.append_dims(x, input.ndim) for x in self.get_scalings(sigma)]
noised_input = input + noise * utils.append_dims(sigma, input.ndim)
model_output = self.inner_model(noised_input * c_in, self.sigma_to_t(sigma), **kwargs)
target = (input - c_skip * noised_input) / c_out
return (model_output - target).pow(2).flatten(1).mean(1)
def forward(self, input, sigma, **kwargs):
c_skip, c_out, c_in = [utils.append_dims(x, input.ndim) for x in self.get_scalings(sigma)]
return self.inner_model(input * c_in, self.sigma_to_t(sigma), **kwargs) * c_out + input * c_skip
class DiscreteSchedule(nn.Module):
"""A mapping between continuous noise levels (sigmas) and a list of discrete noise
levels."""
def __init__(self, sigmas, quantize):
super().__init__()
self.register_buffer('sigmas', sigmas)
self.register_buffer('log_sigmas', sigmas.log())
self.quantize = quantize
@property
def sigma_min(self):
return self.sigmas[0]
@property
def sigma_max(self):
return self.sigmas[-1]
def get_sigmas(self, n=None):
if n is None:
return sampling.append_zero(self.sigmas.flip(0))
t_max = len(self.sigmas) - 1
t = torch.linspace(t_max, 0, n, device=self.sigmas.device)
return sampling.append_zero(self.t_to_sigma(t))
def sigma_to_t(self, sigma, quantize=None):
quantize = self.quantize if quantize is None else quantize
log_sigma = sigma.log()
dists = log_sigma.to(self.log_sigmas.device) - self.log_sigmas[:, None]
if quantize:
return dists.abs().argmin(dim=0).view(sigma.shape)
low_idx = dists.ge(0).cumsum(dim=0).argmax(dim=0).clamp(max=self.log_sigmas.shape[0] - 2)
high_idx = low_idx + 1
low, high = self.log_sigmas[low_idx], self.log_sigmas[high_idx]
w = (low - log_sigma) / (low - high)
w = w.clamp(0, 1)
t = (1 - w) * low_idx + w * high_idx
return t.view(sigma.shape)
def t_to_sigma(self, t):
t = t.float()
low_idx = t.floor().long()
high_idx = t.ceil().long()
w = t-low_idx if t.device.type == 'mps' else t.frac()
log_sigma = (1 - w) * self.log_sigmas[low_idx] + w * self.log_sigmas[high_idx]
return log_sigma.exp()
class DiscreteEpsDDPMDenoiser(DiscreteSchedule):
"""A wrapper for discrete schedule DDPM models that output eps (the predicted
noise)."""
def __init__(self, model, alphas_cumprod, quantize):
super().__init__(((1 - alphas_cumprod) / alphas_cumprod) ** 0.5, quantize)
self.inner_model = model
self.sigma_data = 1.
def get_scalings(self, sigma):
c_out = -sigma
c_in = 1 / (sigma ** 2 + self.sigma_data ** 2) ** 0.5
return c_out, c_in
def get_eps(self, *args, **kwargs):
return self.inner_model(*args, **kwargs)
def loss(self, input, noise, sigma, **kwargs):
c_out, c_in = [utils.append_dims(x, input.ndim) for x in self.get_scalings(sigma)]
noised_input = input + noise * utils.append_dims(sigma, input.ndim)
eps = self.get_eps(noised_input * c_in, self.sigma_to_t(sigma), **kwargs)
return (eps - noise).pow(2).flatten(1).mean(1)
def forward(self, input, sigma, **kwargs):
c_out, c_in = [utils.append_dims(x, input.ndim) for x in self.get_scalings(sigma)]
eps = self.get_eps(input * c_in, self.sigma_to_t(sigma), **kwargs)
return input + eps * c_out
class OpenAIDenoiser(DiscreteEpsDDPMDenoiser):
"""A wrapper for OpenAI diffusion models."""
def __init__(self, model, diffusion, quantize=False, has_learned_sigmas=True, device='cpu'):
alphas_cumprod = torch.tensor(diffusion.alphas_cumprod, device=device, dtype=torch.float32)
super().__init__(model, alphas_cumprod, quantize=quantize)
self.has_learned_sigmas = has_learned_sigmas
def get_eps(self, *args, **kwargs):
model_output = self.inner_model(*args, **kwargs)
if self.has_learned_sigmas:
return model_output.chunk(2, dim=1)[0]
return model_output
class CompVisDenoiser(DiscreteEpsDDPMDenoiser):
"""A wrapper for CompVis diffusion models."""
def __init__(self, model, quantize=False, device='cpu'):
super().__init__(model, model.alphas_cumprod, quantize=quantize)
def get_eps(self, *args, **kwargs):
return self.inner_model.apply_model(*args, **kwargs)
class DiscreteVDDPMDenoiser(DiscreteSchedule):
"""A wrapper for discrete schedule DDPM models that output v."""
def __init__(self, model, alphas_cumprod, quantize):
super().__init__(((1 - alphas_cumprod) / alphas_cumprod) ** 0.5, quantize)
self.inner_model = model
self.sigma_data = 1.
def get_scalings(self, sigma):
c_skip = self.sigma_data ** 2 / (sigma ** 2 + self.sigma_data ** 2)
c_out = -sigma * self.sigma_data / (sigma ** 2 + self.sigma_data ** 2) ** 0.5
c_in = 1 / (sigma ** 2 + self.sigma_data ** 2) ** 0.5
return c_skip, c_out, c_in
def get_v(self, *args, **kwargs):
return self.inner_model(*args, **kwargs)
def loss(self, input, noise, sigma, **kwargs):
c_skip, c_out, c_in = [utils.append_dims(x, input.ndim) for x in self.get_scalings(sigma)]
noised_input = input + noise * utils.append_dims(sigma, input.ndim)
model_output = self.get_v(noised_input * c_in, self.sigma_to_t(sigma), **kwargs)
target = (input - c_skip * noised_input) / c_out
return (model_output - target).pow(2).flatten(1).mean(1)
def forward(self, input, sigma, **kwargs):
c_skip, c_out, c_in = [utils.append_dims(x, input.ndim) for x in self.get_scalings(sigma)]
return self.get_v(input * c_in, self.sigma_to_t(sigma), **kwargs) * c_out + input * c_skip
class CompVisVDenoiser(DiscreteVDDPMDenoiser):
"""A wrapper for CompVis diffusion models that output v."""
def __init__(self, model, quantize=False, device='cpu'):
super().__init__(model, model.alphas_cumprod, quantize=quantize)
def get_v(self, x, t, cond, **kwargs):
return self.inner_model.apply_model(x, t, cond)

View File

@ -1,99 +0,0 @@
import torch
from torch import nn
class DDPGradientStatsHook:
def __init__(self, ddp_module):
try:
ddp_module.register_comm_hook(self, self._hook_fn)
except AttributeError:
raise ValueError('DDPGradientStatsHook does not support non-DDP wrapped modules')
self._clear_state()
def _clear_state(self):
self.bucket_sq_norms_small_batch = []
self.bucket_sq_norms_large_batch = []
@staticmethod
def _hook_fn(self, bucket):
buf = bucket.buffer()
self.bucket_sq_norms_small_batch.append(buf.pow(2).sum())
fut = torch.distributed.all_reduce(buf, op=torch.distributed.ReduceOp.AVG, async_op=True).get_future()
def callback(fut):
buf = fut.value()[0]
self.bucket_sq_norms_large_batch.append(buf.pow(2).sum())
return buf
return fut.then(callback)
def get_stats(self):
sq_norm_small_batch = sum(self.bucket_sq_norms_small_batch)
sq_norm_large_batch = sum(self.bucket_sq_norms_large_batch)
self._clear_state()
stats = torch.stack([sq_norm_small_batch, sq_norm_large_batch])
torch.distributed.all_reduce(stats, op=torch.distributed.ReduceOp.AVG)
return stats[0].item(), stats[1].item()
class GradientNoiseScale:
"""Calculates the gradient noise scale (1 / SNR), or critical batch size,
from _An Empirical Model of Large-Batch Training_,
https://arxiv.org/abs/1812.06162).
Args:
beta (float): The decay factor for the exponential moving averages used to
calculate the gradient noise scale.
Default: 0.9998
eps (float): Added for numerical stability.
Default: 1e-8
"""
def __init__(self, beta=0.9998, eps=1e-8):
self.beta = beta
self.eps = eps
self.ema_sq_norm = 0.
self.ema_var = 0.
self.beta_cumprod = 1.
self.gradient_noise_scale = float('nan')
def state_dict(self):
"""Returns the state of the object as a :class:`dict`."""
return dict(self.__dict__.items())
def load_state_dict(self, state_dict):
"""Loads the object's state.
Args:
state_dict (dict): object state. Should be an object returned
from a call to :meth:`state_dict`.
"""
self.__dict__.update(state_dict)
def update(self, sq_norm_small_batch, sq_norm_large_batch, n_small_batch, n_large_batch):
"""Updates the state with a new batch's gradient statistics, and returns the
current gradient noise scale.
Args:
sq_norm_small_batch (float): The mean of the squared 2-norms of microbatch or
per sample gradients.
sq_norm_large_batch (float): The squared 2-norm of the mean of the microbatch or
per sample gradients.
n_small_batch (int): The batch size of the individual microbatch or per sample
gradients (1 if per sample).
n_large_batch (int): The total batch size of the mean of the microbatch or
per sample gradients.
"""
est_sq_norm = (n_large_batch * sq_norm_large_batch - n_small_batch * sq_norm_small_batch) / (n_large_batch - n_small_batch)
est_var = (sq_norm_small_batch - sq_norm_large_batch) / (1 / n_small_batch - 1 / n_large_batch)
self.ema_sq_norm = self.beta * self.ema_sq_norm + (1 - self.beta) * est_sq_norm
self.ema_var = self.beta * self.ema_var + (1 - self.beta) * est_var
self.beta_cumprod *= self.beta
self.gradient_noise_scale = max(self.ema_var, self.eps) / max(self.ema_sq_norm, self.eps)
return self.gradient_noise_scale
def get_gns(self):
"""Returns the current gradient noise scale."""
return self.gradient_noise_scale
def get_stats(self):
"""Returns the current (debiased) estimates of the squared mean gradient
and gradient variance."""
return self.ema_sq_norm / (1 - self.beta_cumprod), self.ema_var / (1 - self.beta_cumprod)

View File

@ -1,246 +0,0 @@
import math
from einops import rearrange, repeat
import torch
from torch import nn
from torch.nn import functional as F
from . import utils
# Karras et al. preconditioned denoiser
class Denoiser(nn.Module):
"""A Karras et al. preconditioner for denoising diffusion models."""
def __init__(self, inner_model, sigma_data=1.):
super().__init__()
self.inner_model = inner_model
self.sigma_data = sigma_data
def get_scalings(self, sigma):
c_skip = self.sigma_data ** 2 / (sigma ** 2 + self.sigma_data ** 2)
c_out = sigma * self.sigma_data / (sigma ** 2 + self.sigma_data ** 2) ** 0.5
c_in = 1 / (sigma ** 2 + self.sigma_data ** 2) ** 0.5
return c_skip, c_out, c_in
def loss(self, input, noise, sigma, **kwargs):
c_skip, c_out, c_in = [utils.append_dims(x, input.ndim) for x in self.get_scalings(sigma)]
noised_input = input + noise * utils.append_dims(sigma, input.ndim)
model_output = self.inner_model(noised_input * c_in, sigma, **kwargs)
target = (input - c_skip * noised_input) / c_out
return (model_output - target).pow(2).flatten(1).mean(1)
def forward(self, input, sigma, **kwargs):
c_skip, c_out, c_in = [utils.append_dims(x, input.ndim) for x in self.get_scalings(sigma)]
return self.inner_model(input * c_in, sigma, **kwargs) * c_out + input * c_skip
class DenoiserWithVariance(Denoiser):
def loss(self, input, noise, sigma, **kwargs):
c_skip, c_out, c_in = [utils.append_dims(x, input.ndim) for x in self.get_scalings(sigma)]
noised_input = input + noise * utils.append_dims(sigma, input.ndim)
model_output, logvar = self.inner_model(noised_input * c_in, sigma, return_variance=True, **kwargs)
logvar = utils.append_dims(logvar, model_output.ndim)
target = (input - c_skip * noised_input) / c_out
losses = ((model_output - target) ** 2 / logvar.exp() + logvar) / 2
return losses.flatten(1).mean(1)
# Residual blocks
class ResidualBlock(nn.Module):
def __init__(self, *main, skip=None):
super().__init__()
self.main = nn.Sequential(*main)
self.skip = skip if skip else nn.Identity()
def forward(self, input):
return self.main(input) + self.skip(input)
# Noise level (and other) conditioning
class ConditionedModule(nn.Module):
pass
class UnconditionedModule(ConditionedModule):
def __init__(self, module):
super().__init__()
self.module = module
def forward(self, input, cond=None):
return self.module(input)
class ConditionedSequential(nn.Sequential, ConditionedModule):
def forward(self, input, cond):
for module in self:
if isinstance(module, ConditionedModule):
input = module(input, cond)
else:
input = module(input)
return input
class ConditionedResidualBlock(ConditionedModule):
def __init__(self, *main, skip=None):
super().__init__()
self.main = ConditionedSequential(*main)
self.skip = skip if skip else nn.Identity()
def forward(self, input, cond):
skip = self.skip(input, cond) if isinstance(self.skip, ConditionedModule) else self.skip(input)
return self.main(input, cond) + skip
class AdaGN(ConditionedModule):
def __init__(self, feats_in, c_out, num_groups, eps=1e-5, cond_key='cond'):
super().__init__()
self.num_groups = num_groups
self.eps = eps
self.cond_key = cond_key
self.mapper = nn.Linear(feats_in, c_out * 2)
def forward(self, input, cond):
weight, bias = self.mapper(cond[self.cond_key]).chunk(2, dim=-1)
input = F.group_norm(input, self.num_groups, eps=self.eps)
return torch.addcmul(utils.append_dims(bias, input.ndim), input, utils.append_dims(weight, input.ndim) + 1)
# Attention
class SelfAttention2d(ConditionedModule):
def __init__(self, c_in, n_head, norm, dropout_rate=0.):
super().__init__()
assert c_in % n_head == 0
self.norm_in = norm(c_in)
self.n_head = n_head
self.qkv_proj = nn.Conv2d(c_in, c_in * 3, 1)
self.out_proj = nn.Conv2d(c_in, c_in, 1)
self.dropout = nn.Dropout(dropout_rate)
def forward(self, input, cond):
n, c, h, w = input.shape
qkv = self.qkv_proj(self.norm_in(input, cond))
qkv = qkv.view([n, self.n_head * 3, c // self.n_head, h * w]).transpose(2, 3)
q, k, v = qkv.chunk(3, dim=1)
scale = k.shape[3] ** -0.25
att = ((q * scale) @ (k.transpose(2, 3) * scale)).softmax(3)
att = self.dropout(att)
y = (att @ v).transpose(2, 3).contiguous().view([n, c, h, w])
return input + self.out_proj(y)
class CrossAttention2d(ConditionedModule):
def __init__(self, c_dec, c_enc, n_head, norm_dec, dropout_rate=0.,
cond_key='cross', cond_key_padding='cross_padding'):
super().__init__()
assert c_dec % n_head == 0
self.cond_key = cond_key
self.cond_key_padding = cond_key_padding
self.norm_enc = nn.LayerNorm(c_enc)
self.norm_dec = norm_dec(c_dec)
self.n_head = n_head
self.q_proj = nn.Conv2d(c_dec, c_dec, 1)
self.kv_proj = nn.Linear(c_enc, c_dec * 2)
self.out_proj = nn.Conv2d(c_dec, c_dec, 1)
self.dropout = nn.Dropout(dropout_rate)
def forward(self, input, cond):
n, c, h, w = input.shape
q = self.q_proj(self.norm_dec(input, cond))
q = q.view([n, self.n_head, c // self.n_head, h * w]).transpose(2, 3)
kv = self.kv_proj(self.norm_enc(cond[self.cond_key]))
kv = kv.view([n, -1, self.n_head * 2, c // self.n_head]).transpose(1, 2)
k, v = kv.chunk(2, dim=1)
scale = k.shape[3] ** -0.25
att = ((q * scale) @ (k.transpose(2, 3) * scale))
att = att - (cond[self.cond_key_padding][:, None, None, :]) * 10000
att = att.softmax(3)
att = self.dropout(att)
y = (att @ v).transpose(2, 3)
y = y.contiguous().view([n, c, h, w])
return input + self.out_proj(y)
# Downsampling/upsampling
_kernels = {
'linear':
[1 / 8, 3 / 8, 3 / 8, 1 / 8],
'cubic':
[-0.01171875, -0.03515625, 0.11328125, 0.43359375,
0.43359375, 0.11328125, -0.03515625, -0.01171875],
'lanczos3':
[0.003689131001010537, 0.015056144446134567, -0.03399861603975296,
-0.066637322306633, 0.13550527393817902, 0.44638532400131226,
0.44638532400131226, 0.13550527393817902, -0.066637322306633,
-0.03399861603975296, 0.015056144446134567, 0.003689131001010537]
}
_kernels['bilinear'] = _kernels['linear']
_kernels['bicubic'] = _kernels['cubic']
class Downsample2d(nn.Module):
def __init__(self, kernel='linear', pad_mode='reflect'):
super().__init__()
self.pad_mode = pad_mode
kernel_1d = torch.tensor([_kernels[kernel]])
self.pad = kernel_1d.shape[1] // 2 - 1
self.register_buffer('kernel', kernel_1d.T @ kernel_1d)
def forward(self, x):
x = F.pad(x, (self.pad,) * 4, self.pad_mode)
weight = x.new_zeros([x.shape[1], x.shape[1], self.kernel.shape[0], self.kernel.shape[1]])
indices = torch.arange(x.shape[1], device=x.device)
weight[indices, indices] = self.kernel.to(weight)
return F.conv2d(x, weight, stride=2)
class Upsample2d(nn.Module):
def __init__(self, kernel='linear', pad_mode='reflect'):
super().__init__()
self.pad_mode = pad_mode
kernel_1d = torch.tensor([_kernels[kernel]]) * 2
self.pad = kernel_1d.shape[1] // 2 - 1
self.register_buffer('kernel', kernel_1d.T @ kernel_1d)
def forward(self, x):
x = F.pad(x, ((self.pad + 1) // 2,) * 4, self.pad_mode)
weight = x.new_zeros([x.shape[1], x.shape[1], self.kernel.shape[0], self.kernel.shape[1]])
indices = torch.arange(x.shape[1], device=x.device)
weight[indices, indices] = self.kernel.to(weight)
return F.conv_transpose2d(x, weight, stride=2, padding=self.pad * 2 + 1)
# Embeddings
class FourierFeatures(nn.Module):
def __init__(self, in_features, out_features, std=1.):
super().__init__()
assert out_features % 2 == 0
self.register_buffer('weight', torch.randn([out_features // 2, in_features]) * std)
def forward(self, input):
f = 2 * math.pi * input @ self.weight.T
return torch.cat([f.cos(), f.sin()], dim=-1)
# U-Nets
class UNet(ConditionedModule):
def __init__(self, d_blocks, u_blocks, skip_stages=0):
super().__init__()
self.d_blocks = nn.ModuleList(d_blocks)
self.u_blocks = nn.ModuleList(u_blocks)
self.skip_stages = skip_stages
def forward(self, input, cond):
skips = []
for block in self.d_blocks[self.skip_stages:]:
input = block(input, cond)
skips.append(input)
for i, (block, skip) in enumerate(zip(self.u_blocks, reversed(skips))):
input = block(input, cond, skip if i > 0 else None)
return input

View File

@ -1 +0,0 @@
from .image_v1 import ImageDenoiserModelV1

View File

@ -1,156 +0,0 @@
import math
import torch
from torch import nn
from torch.nn import functional as F
from .. import layers, utils
def orthogonal_(module):
nn.init.orthogonal_(module.weight)
return module
class ResConvBlock(layers.ConditionedResidualBlock):
def __init__(self, feats_in, c_in, c_mid, c_out, group_size=32, dropout_rate=0.):
skip = None if c_in == c_out else orthogonal_(nn.Conv2d(c_in, c_out, 1, bias=False))
super().__init__(
layers.AdaGN(feats_in, c_in, max(1, c_in // group_size)),
nn.GELU(),
nn.Conv2d(c_in, c_mid, 3, padding=1),
nn.Dropout2d(dropout_rate, inplace=True),
layers.AdaGN(feats_in, c_mid, max(1, c_mid // group_size)),
nn.GELU(),
nn.Conv2d(c_mid, c_out, 3, padding=1),
nn.Dropout2d(dropout_rate, inplace=True),
skip=skip)
class DBlock(layers.ConditionedSequential):
def __init__(self, n_layers, feats_in, c_in, c_mid, c_out, group_size=32, head_size=64, dropout_rate=0., downsample=False, self_attn=False, cross_attn=False, c_enc=0):
modules = [nn.Identity()]
for i in range(n_layers):
my_c_in = c_in if i == 0 else c_mid
my_c_out = c_mid if i < n_layers - 1 else c_out
modules.append(ResConvBlock(feats_in, my_c_in, c_mid, my_c_out, group_size, dropout_rate))
if self_attn:
norm = lambda c_in: layers.AdaGN(feats_in, c_in, max(1, my_c_out // group_size))
modules.append(layers.SelfAttention2d(my_c_out, max(1, my_c_out // head_size), norm, dropout_rate))
if cross_attn:
norm = lambda c_in: layers.AdaGN(feats_in, c_in, max(1, my_c_out // group_size))
modules.append(layers.CrossAttention2d(my_c_out, c_enc, max(1, my_c_out // head_size), norm, dropout_rate))
super().__init__(*modules)
self.set_downsample(downsample)
def set_downsample(self, downsample):
self[0] = layers.Downsample2d() if downsample else nn.Identity()
return self
class UBlock(layers.ConditionedSequential):
def __init__(self, n_layers, feats_in, c_in, c_mid, c_out, group_size=32, head_size=64, dropout_rate=0., upsample=False, self_attn=False, cross_attn=False, c_enc=0):
modules = []
for i in range(n_layers):
my_c_in = c_in if i == 0 else c_mid
my_c_out = c_mid if i < n_layers - 1 else c_out
modules.append(ResConvBlock(feats_in, my_c_in, c_mid, my_c_out, group_size, dropout_rate))
if self_attn:
norm = lambda c_in: layers.AdaGN(feats_in, c_in, max(1, my_c_out // group_size))
modules.append(layers.SelfAttention2d(my_c_out, max(1, my_c_out // head_size), norm, dropout_rate))
if cross_attn:
norm = lambda c_in: layers.AdaGN(feats_in, c_in, max(1, my_c_out // group_size))
modules.append(layers.CrossAttention2d(my_c_out, c_enc, max(1, my_c_out // head_size), norm, dropout_rate))
modules.append(nn.Identity())
super().__init__(*modules)
self.set_upsample(upsample)
def forward(self, input, cond, skip=None):
if skip is not None:
input = torch.cat([input, skip], dim=1)
return super().forward(input, cond)
def set_upsample(self, upsample):
self[-1] = layers.Upsample2d() if upsample else nn.Identity()
return self
class MappingNet(nn.Sequential):
def __init__(self, feats_in, feats_out, n_layers=2):
layers = []
for i in range(n_layers):
layers.append(orthogonal_(nn.Linear(feats_in if i == 0 else feats_out, feats_out)))
layers.append(nn.GELU())
super().__init__(*layers)
class ImageDenoiserModelV1(nn.Module):
def __init__(self, c_in, feats_in, depths, channels, self_attn_depths, cross_attn_depths=None, mapping_cond_dim=0, unet_cond_dim=0, cross_cond_dim=0, dropout_rate=0., patch_size=1, skip_stages=0, has_variance=False):
super().__init__()
self.c_in = c_in
self.channels = channels
self.unet_cond_dim = unet_cond_dim
self.patch_size = patch_size
self.has_variance = has_variance
self.timestep_embed = layers.FourierFeatures(1, feats_in)
if mapping_cond_dim > 0:
self.mapping_cond = nn.Linear(mapping_cond_dim, feats_in, bias=False)
self.mapping = MappingNet(feats_in, feats_in)
self.proj_in = nn.Conv2d((c_in + unet_cond_dim) * self.patch_size ** 2, channels[max(0, skip_stages - 1)], 1)
self.proj_out = nn.Conv2d(channels[max(0, skip_stages - 1)], c_in * self.patch_size ** 2 + (1 if self.has_variance else 0), 1)
nn.init.zeros_(self.proj_out.weight)
nn.init.zeros_(self.proj_out.bias)
if cross_cond_dim == 0:
cross_attn_depths = [False] * len(self_attn_depths)
d_blocks, u_blocks = [], []
for i in range(len(depths)):
my_c_in = channels[max(0, i - 1)]
d_blocks.append(DBlock(depths[i], feats_in, my_c_in, channels[i], channels[i], downsample=i > skip_stages, self_attn=self_attn_depths[i], cross_attn=cross_attn_depths[i], c_enc=cross_cond_dim, dropout_rate=dropout_rate))
for i in range(len(depths)):
my_c_in = channels[i] * 2 if i < len(depths) - 1 else channels[i]
my_c_out = channels[max(0, i - 1)]
u_blocks.append(UBlock(depths[i], feats_in, my_c_in, channels[i], my_c_out, upsample=i > skip_stages, self_attn=self_attn_depths[i], cross_attn=cross_attn_depths[i], c_enc=cross_cond_dim, dropout_rate=dropout_rate))
self.u_net = layers.UNet(d_blocks, reversed(u_blocks), skip_stages=skip_stages)
def forward(self, input, sigma, mapping_cond=None, unet_cond=None, cross_cond=None, cross_cond_padding=None, return_variance=False):
c_noise = sigma.log() / 4
timestep_embed = self.timestep_embed(utils.append_dims(c_noise, 2))
mapping_cond_embed = torch.zeros_like(timestep_embed) if mapping_cond is None else self.mapping_cond(mapping_cond)
mapping_out = self.mapping(timestep_embed + mapping_cond_embed)
cond = {'cond': mapping_out}
if unet_cond is not None:
input = torch.cat([input, unet_cond], dim=1)
if cross_cond is not None:
cond['cross'] = cross_cond
cond['cross_padding'] = cross_cond_padding
if self.patch_size > 1:
input = F.pixel_unshuffle(input, self.patch_size)
input = self.proj_in(input)
input = self.u_net(input, cond)
input = self.proj_out(input)
if self.has_variance:
input, logvar = input[:, :-1], input[:, -1].flatten(1).mean(1)
if self.patch_size > 1:
input = F.pixel_shuffle(input, self.patch_size)
if self.has_variance and return_variance:
return input, logvar
return input
def set_skip_stages(self, skip_stages):
self.proj_in = nn.Conv2d(self.proj_in.in_channels, self.channels[max(0, skip_stages - 1)], 1)
self.proj_out = nn.Conv2d(self.channels[max(0, skip_stages - 1)], self.proj_out.out_channels, 1)
nn.init.zeros_(self.proj_out.weight)
nn.init.zeros_(self.proj_out.bias)
self.u_net.skip_stages = skip_stages
for i, block in enumerate(self.u_net.d_blocks):
block.set_downsample(i > skip_stages)
for i, block in enumerate(reversed(self.u_net.u_blocks)):
block.set_upsample(i > skip_stages)
return self
def set_patch_size(self, patch_size):
self.patch_size = patch_size
self.proj_in = nn.Conv2d((self.c_in + self.unet_cond_dim) * self.patch_size ** 2, self.channels[max(0, self.u_net.skip_stages - 1)], 1)
self.proj_out = nn.Conv2d(self.channels[max(0, self.u_net.skip_stages - 1)], self.c_in * self.patch_size ** 2 + (1 if self.has_variance else 0), 1)
nn.init.zeros_(self.proj_out.weight)
nn.init.zeros_(self.proj_out.bias)

View File

@ -3,12 +3,13 @@ import math
from scipy import integrate
import torch
from torch import nn
from torchdiffeq import odeint
import torchsde
from tqdm.auto import trange, tqdm
from . import utils
from . import deis
import comfy.model_patcher
import comfy.model_sampling
def append_zero(x):
return torch.cat([x, x.new_zeros([1])])
@ -66,6 +67,9 @@ class BatchedBrownianTree:
"""A wrapper around torchsde.BrownianTree that enables batches of entropy."""
def __init__(self, x, t0, t1, seed=None, **kwargs):
self.cpu_tree = True
if "cpu" in kwargs:
self.cpu_tree = kwargs.pop("cpu")
t0, t1, self.sign = self.sort(t0, t1)
w0 = kwargs.get('w0', torch.zeros_like(x))
if seed is None:
@ -77,7 +81,10 @@ class BatchedBrownianTree:
except TypeError:
seed = [seed]
self.batched = False
self.trees = [torchsde.BrownianTree(t0, w0, t1, entropy=s, **kwargs) for s in seed]
if self.cpu_tree:
self.trees = [torchsde.BrownianTree(t0.cpu(), w0.cpu(), t1.cpu(), entropy=s, **kwargs) for s in seed]
else:
self.trees = [torchsde.BrownianTree(t0, w0, t1, entropy=s, **kwargs) for s in seed]
@staticmethod
def sort(a, b):
@ -85,7 +92,11 @@ class BatchedBrownianTree:
def __call__(self, t0, t1):
t0, t1, sign = self.sort(t0, t1)
w = torch.stack([tree(t0, t1) for tree in self.trees]) * (self.sign * sign)
if self.cpu_tree:
w = torch.stack([tree(t0.cpu().float(), t1.cpu().float()).to(t0.dtype).to(t0.device) for tree in self.trees]) * (self.sign * sign)
else:
w = torch.stack([tree(t0, t1) for tree in self.trees]) * (self.sign * sign)
return w if self.batched else w[0]
@ -104,10 +115,10 @@ class BrownianTreeNoiseSampler:
internal timestep.
"""
def __init__(self, x, sigma_min, sigma_max, seed=None, transform=lambda x: x):
def __init__(self, x, sigma_min, sigma_max, seed=None, transform=lambda x: x, cpu=False):
self.transform = transform
t0, t1 = self.transform(torch.as_tensor(sigma_min)), self.transform(torch.as_tensor(sigma_max))
self.tree = BatchedBrownianTree(x, t0, t1, seed)
self.tree = BatchedBrownianTree(x, t0, t1, seed, cpu=cpu)
def __call__(self, sigma, sigma_next):
t0, t1 = self.transform(torch.as_tensor(sigma)), self.transform(torch.as_tensor(sigma_next))
@ -120,10 +131,15 @@ def sample_euler(model, x, sigmas, extra_args=None, callback=None, disable=None,
extra_args = {} if extra_args is None else extra_args
s_in = x.new_ones([x.shape[0]])
for i in trange(len(sigmas) - 1, disable=disable):
gamma = min(s_churn / (len(sigmas) - 1), 2 ** 0.5 - 1) if s_tmin <= sigmas[i] <= s_tmax else 0.
eps = torch.randn_like(x) * s_noise
sigma_hat = sigmas[i] * (gamma + 1)
if s_churn > 0:
gamma = min(s_churn / (len(sigmas) - 1), 2 ** 0.5 - 1) if s_tmin <= sigmas[i] <= s_tmax else 0.
sigma_hat = sigmas[i] * (gamma + 1)
else:
gamma = 0
sigma_hat = sigmas[i]
if gamma > 0:
eps = torch.randn_like(x) * s_noise
x = x + eps * (sigma_hat ** 2 - sigmas[i] ** 2) ** 0.5
denoised = model(x, sigma_hat * s_in, **extra_args)
d = to_d(x, sigma_hat, denoised)
@ -161,10 +177,16 @@ def sample_heun(model, x, sigmas, extra_args=None, callback=None, disable=None,
extra_args = {} if extra_args is None else extra_args
s_in = x.new_ones([x.shape[0]])
for i in trange(len(sigmas) - 1, disable=disable):
gamma = min(s_churn / (len(sigmas) - 1), 2 ** 0.5 - 1) if s_tmin <= sigmas[i] <= s_tmax else 0.
eps = torch.randn_like(x) * s_noise
if s_churn > 0:
gamma = min(s_churn / (len(sigmas) - 1), 2 ** 0.5 - 1) if s_tmin <= sigmas[i] <= s_tmax else 0.
sigma_hat = sigmas[i] * (gamma + 1)
else:
gamma = 0
sigma_hat = sigmas[i]
sigma_hat = sigmas[i] * (gamma + 1)
if gamma > 0:
eps = torch.randn_like(x) * s_noise
x = x + eps * (sigma_hat ** 2 - sigmas[i] ** 2) ** 0.5
denoised = model(x, sigma_hat * s_in, **extra_args)
d = to_d(x, sigma_hat, denoised)
@ -190,10 +212,15 @@ def sample_dpm_2(model, x, sigmas, extra_args=None, callback=None, disable=None,
extra_args = {} if extra_args is None else extra_args
s_in = x.new_ones([x.shape[0]])
for i in trange(len(sigmas) - 1, disable=disable):
gamma = min(s_churn / (len(sigmas) - 1), 2 ** 0.5 - 1) if s_tmin <= sigmas[i] <= s_tmax else 0.
eps = torch.randn_like(x) * s_noise
sigma_hat = sigmas[i] * (gamma + 1)
if s_churn > 0:
gamma = min(s_churn / (len(sigmas) - 1), 2 ** 0.5 - 1) if s_tmin <= sigmas[i] <= s_tmax else 0.
sigma_hat = sigmas[i] * (gamma + 1)
else:
gamma = 0
sigma_hat = sigmas[i]
if gamma > 0:
eps = torch.randn_like(x) * s_noise
x = x + eps * (sigma_hat ** 2 - sigmas[i] ** 2) ** 0.5
denoised = model(x, sigma_hat * s_in, **extra_args)
d = to_d(x, sigma_hat, denoised)
@ -277,30 +304,6 @@ def sample_lms(model, x, sigmas, extra_args=None, callback=None, disable=None, o
return x
@torch.no_grad()
def log_likelihood(model, x, sigma_min, sigma_max, extra_args=None, atol=1e-4, rtol=1e-4):
extra_args = {} if extra_args is None else extra_args
s_in = x.new_ones([x.shape[0]])
v = torch.randint_like(x, 2) * 2 - 1
fevals = 0
def ode_fn(sigma, x):
nonlocal fevals
with torch.enable_grad():
x = x[0].detach().requires_grad_()
denoised = model(x, sigma * s_in, **extra_args)
d = to_d(x, sigma, denoised)
fevals += 1
grad = torch.autograd.grad((d * v).sum(), x)[0]
d_ll = (v * grad).flatten(1).sum(1)
return d.detach(), d_ll
x_min = x, x.new_zeros([x.shape[0]])
t = x.new_tensor([sigma_min, sigma_max])
sol = odeint(ode_fn, x_min, t, atol=atol, rtol=rtol, method='dopri5')
latent, delta_ll = sol[0][-1], sol[1][-1]
ll_prior = torch.distributions.Normal(0, sigma_max).log_prob(latent).flatten(1).sum(1)
return ll_prior + delta_ll, {'fevals': fevals}
class PIDStepSizeController:
"""A PID controller for ODE adaptive step size control."""
def __init__(self, h, pcoeff, icoeff, dcoeff, order=1, accept_safety=0.81, eps=1e-8):
@ -507,6 +510,9 @@ def sample_dpm_adaptive(model, x, sigma_min, sigma_max, extra_args=None, callbac
@torch.no_grad()
def sample_dpmpp_2s_ancestral(model, x, sigmas, extra_args=None, callback=None, disable=None, eta=1., s_noise=1., noise_sampler=None):
if isinstance(model.inner_model.inner_model.model_sampling, comfy.model_sampling.CONST):
return sample_dpmpp_2s_ancestral_RF(model, x, sigmas, extra_args, callback, disable, eta, s_noise, noise_sampler)
"""Ancestral sampling with DPM-Solver++(2S) second-order steps."""
extra_args = {} if extra_args is None else extra_args
noise_sampler = default_noise_sampler(x) if noise_sampler is None else noise_sampler
@ -539,11 +545,64 @@ def sample_dpmpp_2s_ancestral(model, x, sigmas, extra_args=None, callback=None,
return x
@torch.no_grad()
def sample_dpmpp_2s_ancestral_RF(model, x, sigmas, extra_args=None, callback=None, disable=None, eta=1., s_noise=1., noise_sampler=None):
"""Ancestral sampling with DPM-Solver++(2S) second-order steps."""
extra_args = {} if extra_args is None else extra_args
noise_sampler = default_noise_sampler(x) if noise_sampler is None else noise_sampler
s_in = x.new_ones([x.shape[0]])
sigma_fn = lambda lbda: (lbda.exp() + 1) ** -1
lambda_fn = lambda sigma: ((1-sigma)/sigma).log()
# logged_x = x.unsqueeze(0)
for i in trange(len(sigmas) - 1, disable=disable):
denoised = model(x, sigmas[i] * s_in, **extra_args)
downstep_ratio = 1 + (sigmas[i+1]/sigmas[i] - 1) * eta
sigma_down = sigmas[i+1] * downstep_ratio
alpha_ip1 = 1 - sigmas[i+1]
alpha_down = 1 - sigma_down
renoise_coeff = (sigmas[i+1]**2 - sigma_down**2*alpha_ip1**2/alpha_down**2)**0.5
# sigma_down, sigma_up = get_ancestral_step(sigmas[i], sigmas[i + 1], eta=eta)
if callback is not None:
callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
if sigmas[i + 1] == 0:
# Euler method
d = to_d(x, sigmas[i], denoised)
dt = sigma_down - sigmas[i]
x = x + d * dt
else:
# DPM-Solver++(2S)
if sigmas[i] == 1.0:
sigma_s = 0.9999
else:
t_i, t_down = lambda_fn(sigmas[i]), lambda_fn(sigma_down)
r = 1 / 2
h = t_down - t_i
s = t_i + r * h
sigma_s = sigma_fn(s)
# sigma_s = sigmas[i+1]
sigma_s_i_ratio = sigma_s / sigmas[i]
u = sigma_s_i_ratio * x + (1 - sigma_s_i_ratio) * denoised
D_i = model(u, sigma_s * s_in, **extra_args)
sigma_down_i_ratio = sigma_down / sigmas[i]
x = sigma_down_i_ratio * x + (1 - sigma_down_i_ratio) * D_i
# print("sigma_i", sigmas[i], "sigma_ip1", sigmas[i+1],"sigma_down", sigma_down, "sigma_down_i_ratio", sigma_down_i_ratio, "sigma_s_i_ratio", sigma_s_i_ratio, "renoise_coeff", renoise_coeff)
# Noise addition
if sigmas[i + 1] > 0 and eta > 0:
x = (alpha_ip1/alpha_down) * x + noise_sampler(sigmas[i], sigmas[i + 1]) * s_noise * renoise_coeff
# logged_x = torch.cat((logged_x, x.unsqueeze(0)), dim=0)
return x
@torch.no_grad()
def sample_dpmpp_sde(model, x, sigmas, extra_args=None, callback=None, disable=None, eta=1., s_noise=1., noise_sampler=None, r=1 / 2):
"""DPM-Solver++ (stochastic)."""
if len(sigmas) <= 1:
return x
sigma_min, sigma_max = sigmas[sigmas > 0].min(), sigmas.max()
noise_sampler = BrownianTreeNoiseSampler(x, sigma_min, sigma_max) if noise_sampler is None else noise_sampler
seed = extra_args.get("seed", None)
noise_sampler = BrownianTreeNoiseSampler(x, sigma_min, sigma_max, seed=seed, cpu=True) if noise_sampler is None else noise_sampler
extra_args = {} if extra_args is None else extra_args
s_in = x.new_ones([x.shape[0]])
sigma_fn = lambda t: t.neg().exp()
@ -605,3 +664,440 @@ def sample_dpmpp_2m(model, x, sigmas, extra_args=None, callback=None, disable=No
x = (sigma_fn(t_next) / sigma_fn(t)) * x - (-h).expm1() * denoised_d
old_denoised = denoised
return x
@torch.no_grad()
def sample_dpmpp_2m_sde(model, x, sigmas, extra_args=None, callback=None, disable=None, eta=1., s_noise=1., noise_sampler=None, solver_type='midpoint'):
"""DPM-Solver++(2M) SDE."""
if len(sigmas) <= 1:
return x
if solver_type not in {'heun', 'midpoint'}:
raise ValueError('solver_type must be \'heun\' or \'midpoint\'')
seed = extra_args.get("seed", None)
sigma_min, sigma_max = sigmas[sigmas > 0].min(), sigmas.max()
noise_sampler = BrownianTreeNoiseSampler(x, sigma_min, sigma_max, seed=seed, cpu=True) if noise_sampler is None else noise_sampler
extra_args = {} if extra_args is None else extra_args
s_in = x.new_ones([x.shape[0]])
old_denoised = None
h_last = None
h = None
for i in trange(len(sigmas) - 1, disable=disable):
denoised = model(x, sigmas[i] * s_in, **extra_args)
if callback is not None:
callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
if sigmas[i + 1] == 0:
# Denoising step
x = denoised
else:
# DPM-Solver++(2M) SDE
t, s = -sigmas[i].log(), -sigmas[i + 1].log()
h = s - t
eta_h = eta * h
x = sigmas[i + 1] / sigmas[i] * (-eta_h).exp() * x + (-h - eta_h).expm1().neg() * denoised
if old_denoised is not None:
r = h_last / h
if solver_type == 'heun':
x = x + ((-h - eta_h).expm1().neg() / (-h - eta_h) + 1) * (1 / r) * (denoised - old_denoised)
elif solver_type == 'midpoint':
x = x + 0.5 * (-h - eta_h).expm1().neg() * (1 / r) * (denoised - old_denoised)
if eta:
x = x + noise_sampler(sigmas[i], sigmas[i + 1]) * sigmas[i + 1] * (-2 * eta_h).expm1().neg().sqrt() * s_noise
old_denoised = denoised
h_last = h
return x
@torch.no_grad()
def sample_dpmpp_3m_sde(model, x, sigmas, extra_args=None, callback=None, disable=None, eta=1., s_noise=1., noise_sampler=None):
"""DPM-Solver++(3M) SDE."""
if len(sigmas) <= 1:
return x
seed = extra_args.get("seed", None)
sigma_min, sigma_max = sigmas[sigmas > 0].min(), sigmas.max()
noise_sampler = BrownianTreeNoiseSampler(x, sigma_min, sigma_max, seed=seed, cpu=True) if noise_sampler is None else noise_sampler
extra_args = {} if extra_args is None else extra_args
s_in = x.new_ones([x.shape[0]])
denoised_1, denoised_2 = None, None
h, h_1, h_2 = None, None, None
for i in trange(len(sigmas) - 1, disable=disable):
denoised = model(x, sigmas[i] * s_in, **extra_args)
if callback is not None:
callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
if sigmas[i + 1] == 0:
# Denoising step
x = denoised
else:
t, s = -sigmas[i].log(), -sigmas[i + 1].log()
h = s - t
h_eta = h * (eta + 1)
x = torch.exp(-h_eta) * x + (-h_eta).expm1().neg() * denoised
if h_2 is not None:
r0 = h_1 / h
r1 = h_2 / h
d1_0 = (denoised - denoised_1) / r0
d1_1 = (denoised_1 - denoised_2) / r1
d1 = d1_0 + (d1_0 - d1_1) * r0 / (r0 + r1)
d2 = (d1_0 - d1_1) / (r0 + r1)
phi_2 = h_eta.neg().expm1() / h_eta + 1
phi_3 = phi_2 / h_eta - 0.5
x = x + phi_2 * d1 - phi_3 * d2
elif h_1 is not None:
r = h_1 / h
d = (denoised - denoised_1) / r
phi_2 = h_eta.neg().expm1() / h_eta + 1
x = x + phi_2 * d
if eta:
x = x + noise_sampler(sigmas[i], sigmas[i + 1]) * sigmas[i + 1] * (-2 * h * eta).expm1().neg().sqrt() * s_noise
denoised_1, denoised_2 = denoised, denoised_1
h_1, h_2 = h, h_1
return x
@torch.no_grad()
def sample_dpmpp_3m_sde_gpu(model, x, sigmas, extra_args=None, callback=None, disable=None, eta=1., s_noise=1., noise_sampler=None):
if len(sigmas) <= 1:
return x
sigma_min, sigma_max = sigmas[sigmas > 0].min(), sigmas.max()
noise_sampler = BrownianTreeNoiseSampler(x, sigma_min, sigma_max, seed=extra_args.get("seed", None), cpu=False) if noise_sampler is None else noise_sampler
return sample_dpmpp_3m_sde(model, x, sigmas, extra_args=extra_args, callback=callback, disable=disable, eta=eta, s_noise=s_noise, noise_sampler=noise_sampler)
@torch.no_grad()
def sample_dpmpp_2m_sde_gpu(model, x, sigmas, extra_args=None, callback=None, disable=None, eta=1., s_noise=1., noise_sampler=None, solver_type='midpoint'):
if len(sigmas) <= 1:
return x
sigma_min, sigma_max = sigmas[sigmas > 0].min(), sigmas.max()
noise_sampler = BrownianTreeNoiseSampler(x, sigma_min, sigma_max, seed=extra_args.get("seed", None), cpu=False) if noise_sampler is None else noise_sampler
return sample_dpmpp_2m_sde(model, x, sigmas, extra_args=extra_args, callback=callback, disable=disable, eta=eta, s_noise=s_noise, noise_sampler=noise_sampler, solver_type=solver_type)
@torch.no_grad()
def sample_dpmpp_sde_gpu(model, x, sigmas, extra_args=None, callback=None, disable=None, eta=1., s_noise=1., noise_sampler=None, r=1 / 2):
if len(sigmas) <= 1:
return x
sigma_min, sigma_max = sigmas[sigmas > 0].min(), sigmas.max()
noise_sampler = BrownianTreeNoiseSampler(x, sigma_min, sigma_max, seed=extra_args.get("seed", None), cpu=False) if noise_sampler is None else noise_sampler
return sample_dpmpp_sde(model, x, sigmas, extra_args=extra_args, callback=callback, disable=disable, eta=eta, s_noise=s_noise, noise_sampler=noise_sampler, r=r)
def DDPMSampler_step(x, sigma, sigma_prev, noise, noise_sampler):
alpha_cumprod = 1 / ((sigma * sigma) + 1)
alpha_cumprod_prev = 1 / ((sigma_prev * sigma_prev) + 1)
alpha = (alpha_cumprod / alpha_cumprod_prev)
mu = (1.0 / alpha).sqrt() * (x - (1 - alpha) * noise / (1 - alpha_cumprod).sqrt())
if sigma_prev > 0:
mu += ((1 - alpha) * (1. - alpha_cumprod_prev) / (1. - alpha_cumprod)).sqrt() * noise_sampler(sigma, sigma_prev)
return mu
def generic_step_sampler(model, x, sigmas, extra_args=None, callback=None, disable=None, noise_sampler=None, step_function=None):
extra_args = {} if extra_args is None else extra_args
noise_sampler = default_noise_sampler(x) if noise_sampler is None else noise_sampler
s_in = x.new_ones([x.shape[0]])
for i in trange(len(sigmas) - 1, disable=disable):
denoised = model(x, sigmas[i] * s_in, **extra_args)
if callback is not None:
callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
x = step_function(x / torch.sqrt(1.0 + sigmas[i] ** 2.0), sigmas[i], sigmas[i + 1], (x - denoised) / sigmas[i], noise_sampler)
if sigmas[i + 1] != 0:
x *= torch.sqrt(1.0 + sigmas[i + 1] ** 2.0)
return x
@torch.no_grad()
def sample_ddpm(model, x, sigmas, extra_args=None, callback=None, disable=None, noise_sampler=None):
return generic_step_sampler(model, x, sigmas, extra_args, callback, disable, noise_sampler, DDPMSampler_step)
@torch.no_grad()
def sample_lcm(model, x, sigmas, extra_args=None, callback=None, disable=None, noise_sampler=None):
extra_args = {} if extra_args is None else extra_args
noise_sampler = default_noise_sampler(x) if noise_sampler is None else noise_sampler
s_in = x.new_ones([x.shape[0]])
for i in trange(len(sigmas) - 1, disable=disable):
denoised = model(x, sigmas[i] * s_in, **extra_args)
if callback is not None:
callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
x = denoised
if sigmas[i + 1] > 0:
x = model.inner_model.inner_model.model_sampling.noise_scaling(sigmas[i + 1], noise_sampler(sigmas[i], sigmas[i + 1]), x)
return x
@torch.no_grad()
def sample_heunpp2(model, x, sigmas, extra_args=None, callback=None, disable=None, s_churn=0., s_tmin=0., s_tmax=float('inf'), s_noise=1.):
# From MIT licensed: https://github.com/Carzit/sd-webui-samplers-scheduler/
extra_args = {} if extra_args is None else extra_args
s_in = x.new_ones([x.shape[0]])
s_end = sigmas[-1]
for i in trange(len(sigmas) - 1, disable=disable):
gamma = min(s_churn / (len(sigmas) - 1), 2 ** 0.5 - 1) if s_tmin <= sigmas[i] <= s_tmax else 0.
eps = torch.randn_like(x) * s_noise
sigma_hat = sigmas[i] * (gamma + 1)
if gamma > 0:
x = x + eps * (sigma_hat ** 2 - sigmas[i] ** 2) ** 0.5
denoised = model(x, sigma_hat * s_in, **extra_args)
d = to_d(x, sigma_hat, denoised)
if callback is not None:
callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigma_hat, 'denoised': denoised})
dt = sigmas[i + 1] - sigma_hat
if sigmas[i + 1] == s_end:
# Euler method
x = x + d * dt
elif sigmas[i + 2] == s_end:
# Heun's method
x_2 = x + d * dt
denoised_2 = model(x_2, sigmas[i + 1] * s_in, **extra_args)
d_2 = to_d(x_2, sigmas[i + 1], denoised_2)
w = 2 * sigmas[0]
w2 = sigmas[i+1]/w
w1 = 1 - w2
d_prime = d * w1 + d_2 * w2
x = x + d_prime * dt
else:
# Heun++
x_2 = x + d * dt
denoised_2 = model(x_2, sigmas[i + 1] * s_in, **extra_args)
d_2 = to_d(x_2, sigmas[i + 1], denoised_2)
dt_2 = sigmas[i + 2] - sigmas[i + 1]
x_3 = x_2 + d_2 * dt_2
denoised_3 = model(x_3, sigmas[i + 2] * s_in, **extra_args)
d_3 = to_d(x_3, sigmas[i + 2], denoised_3)
w = 3 * sigmas[0]
w2 = sigmas[i + 1] / w
w3 = sigmas[i + 2] / w
w1 = 1 - w2 - w3
d_prime = w1 * d + w2 * d_2 + w3 * d_3
x = x + d_prime * dt
return x
#From https://github.com/zju-pi/diff-sampler/blob/main/diff-solvers-main/solvers.py
#under Apache 2 license
def sample_ipndm(model, x, sigmas, extra_args=None, callback=None, disable=None, max_order=4):
extra_args = {} if extra_args is None else extra_args
s_in = x.new_ones([x.shape[0]])
x_next = x
buffer_model = []
for i in trange(len(sigmas) - 1, disable=disable):
t_cur = sigmas[i]
t_next = sigmas[i + 1]
x_cur = x_next
denoised = model(x_cur, t_cur * s_in, **extra_args)
if callback is not None:
callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
d_cur = (x_cur - denoised) / t_cur
order = min(max_order, i+1)
if order == 1: # First Euler step.
x_next = x_cur + (t_next - t_cur) * d_cur
elif order == 2: # Use one history point.
x_next = x_cur + (t_next - t_cur) * (3 * d_cur - buffer_model[-1]) / 2
elif order == 3: # Use two history points.
x_next = x_cur + (t_next - t_cur) * (23 * d_cur - 16 * buffer_model[-1] + 5 * buffer_model[-2]) / 12
elif order == 4: # Use three history points.
x_next = x_cur + (t_next - t_cur) * (55 * d_cur - 59 * buffer_model[-1] + 37 * buffer_model[-2] - 9 * buffer_model[-3]) / 24
if len(buffer_model) == max_order - 1:
for k in range(max_order - 2):
buffer_model[k] = buffer_model[k+1]
buffer_model[-1] = d_cur
else:
buffer_model.append(d_cur)
return x_next
#From https://github.com/zju-pi/diff-sampler/blob/main/diff-solvers-main/solvers.py
#under Apache 2 license
def sample_ipndm_v(model, x, sigmas, extra_args=None, callback=None, disable=None, max_order=4):
extra_args = {} if extra_args is None else extra_args
s_in = x.new_ones([x.shape[0]])
x_next = x
t_steps = sigmas
buffer_model = []
for i in trange(len(sigmas) - 1, disable=disable):
t_cur = sigmas[i]
t_next = sigmas[i + 1]
x_cur = x_next
denoised = model(x_cur, t_cur * s_in, **extra_args)
if callback is not None:
callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
d_cur = (x_cur - denoised) / t_cur
order = min(max_order, i+1)
if order == 1: # First Euler step.
x_next = x_cur + (t_next - t_cur) * d_cur
elif order == 2: # Use one history point.
h_n = (t_next - t_cur)
h_n_1 = (t_cur - t_steps[i-1])
coeff1 = (2 + (h_n / h_n_1)) / 2
coeff2 = -(h_n / h_n_1) / 2
x_next = x_cur + (t_next - t_cur) * (coeff1 * d_cur + coeff2 * buffer_model[-1])
elif order == 3: # Use two history points.
h_n = (t_next - t_cur)
h_n_1 = (t_cur - t_steps[i-1])
h_n_2 = (t_steps[i-1] - t_steps[i-2])
temp = (1 - h_n / (3 * (h_n + h_n_1)) * (h_n * (h_n + h_n_1)) / (h_n_1 * (h_n_1 + h_n_2))) / 2
coeff1 = (2 + (h_n / h_n_1)) / 2 + temp
coeff2 = -(h_n / h_n_1) / 2 - (1 + h_n_1 / h_n_2) * temp
coeff3 = temp * h_n_1 / h_n_2
x_next = x_cur + (t_next - t_cur) * (coeff1 * d_cur + coeff2 * buffer_model[-1] + coeff3 * buffer_model[-2])
elif order == 4: # Use three history points.
h_n = (t_next - t_cur)
h_n_1 = (t_cur - t_steps[i-1])
h_n_2 = (t_steps[i-1] - t_steps[i-2])
h_n_3 = (t_steps[i-2] - t_steps[i-3])
temp1 = (1 - h_n / (3 * (h_n + h_n_1)) * (h_n * (h_n + h_n_1)) / (h_n_1 * (h_n_1 + h_n_2))) / 2
temp2 = ((1 - h_n / (3 * (h_n + h_n_1))) / 2 + (1 - h_n / (2 * (h_n + h_n_1))) * h_n / (6 * (h_n + h_n_1 + h_n_2))) \
* (h_n * (h_n + h_n_1) * (h_n + h_n_1 + h_n_2)) / (h_n_1 * (h_n_1 + h_n_2) * (h_n_1 + h_n_2 + h_n_3))
coeff1 = (2 + (h_n / h_n_1)) / 2 + temp1 + temp2
coeff2 = -(h_n / h_n_1) / 2 - (1 + h_n_1 / h_n_2) * temp1 - (1 + (h_n_1 / h_n_2) + (h_n_1 * (h_n_1 + h_n_2) / (h_n_2 * (h_n_2 + h_n_3)))) * temp2
coeff3 = temp1 * h_n_1 / h_n_2 + ((h_n_1 / h_n_2) + (h_n_1 * (h_n_1 + h_n_2) / (h_n_2 * (h_n_2 + h_n_3))) * (1 + h_n_2 / h_n_3)) * temp2
coeff4 = -temp2 * (h_n_1 * (h_n_1 + h_n_2) / (h_n_2 * (h_n_2 + h_n_3))) * h_n_1 / h_n_2
x_next = x_cur + (t_next - t_cur) * (coeff1 * d_cur + coeff2 * buffer_model[-1] + coeff3 * buffer_model[-2] + coeff4 * buffer_model[-3])
if len(buffer_model) == max_order - 1:
for k in range(max_order - 2):
buffer_model[k] = buffer_model[k+1]
buffer_model[-1] = d_cur.detach()
else:
buffer_model.append(d_cur.detach())
return x_next
#From https://github.com/zju-pi/diff-sampler/blob/main/diff-solvers-main/solvers.py
#under Apache 2 license
@torch.no_grad()
def sample_deis(model, x, sigmas, extra_args=None, callback=None, disable=None, max_order=3, deis_mode='tab'):
extra_args = {} if extra_args is None else extra_args
s_in = x.new_ones([x.shape[0]])
x_next = x
t_steps = sigmas
coeff_list = deis.get_deis_coeff_list(t_steps, max_order, deis_mode=deis_mode)
buffer_model = []
for i in trange(len(sigmas) - 1, disable=disable):
t_cur = sigmas[i]
t_next = sigmas[i + 1]
x_cur = x_next
denoised = model(x_cur, t_cur * s_in, **extra_args)
if callback is not None:
callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
d_cur = (x_cur - denoised) / t_cur
order = min(max_order, i+1)
if t_next <= 0:
order = 1
if order == 1: # First Euler step.
x_next = x_cur + (t_next - t_cur) * d_cur
elif order == 2: # Use one history point.
coeff_cur, coeff_prev1 = coeff_list[i]
x_next = x_cur + coeff_cur * d_cur + coeff_prev1 * buffer_model[-1]
elif order == 3: # Use two history points.
coeff_cur, coeff_prev1, coeff_prev2 = coeff_list[i]
x_next = x_cur + coeff_cur * d_cur + coeff_prev1 * buffer_model[-1] + coeff_prev2 * buffer_model[-2]
elif order == 4: # Use three history points.
coeff_cur, coeff_prev1, coeff_prev2, coeff_prev3 = coeff_list[i]
x_next = x_cur + coeff_cur * d_cur + coeff_prev1 * buffer_model[-1] + coeff_prev2 * buffer_model[-2] + coeff_prev3 * buffer_model[-3]
if len(buffer_model) == max_order - 1:
for k in range(max_order - 2):
buffer_model[k] = buffer_model[k+1]
buffer_model[-1] = d_cur.detach()
else:
buffer_model.append(d_cur.detach())
return x_next
@torch.no_grad()
def sample_euler_cfg_pp(model, x, sigmas, extra_args=None, callback=None, disable=None):
extra_args = {} if extra_args is None else extra_args
temp = [0]
def post_cfg_function(args):
temp[0] = args["uncond_denoised"]
return args["denoised"]
model_options = extra_args.get("model_options", {}).copy()
extra_args["model_options"] = comfy.model_patcher.set_model_options_post_cfg_function(model_options, post_cfg_function, disable_cfg1_optimization=True)
s_in = x.new_ones([x.shape[0]])
for i in trange(len(sigmas) - 1, disable=disable):
sigma_hat = sigmas[i]
denoised = model(x, sigma_hat * s_in, **extra_args)
d = to_d(x, sigma_hat, temp[0])
if callback is not None:
callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigma_hat, 'denoised': denoised})
dt = sigmas[i + 1] - sigma_hat
# Euler method
x = denoised + d * sigmas[i + 1]
return x
@torch.no_grad()
def sample_euler_ancestral_cfg_pp(model, x, sigmas, extra_args=None, callback=None, disable=None, eta=1., s_noise=1., noise_sampler=None):
"""Ancestral sampling with Euler method steps."""
extra_args = {} if extra_args is None else extra_args
noise_sampler = default_noise_sampler(x) if noise_sampler is None else noise_sampler
temp = [0]
def post_cfg_function(args):
temp[0] = args["uncond_denoised"]
return args["denoised"]
model_options = extra_args.get("model_options", {}).copy()
extra_args["model_options"] = comfy.model_patcher.set_model_options_post_cfg_function(model_options, post_cfg_function, disable_cfg1_optimization=True)
s_in = x.new_ones([x.shape[0]])
for i in trange(len(sigmas) - 1, disable=disable):
denoised = model(x, sigmas[i] * s_in, **extra_args)
sigma_down, sigma_up = get_ancestral_step(sigmas[i], sigmas[i + 1], eta=eta)
if callback is not None:
callback({'x': x, 'i': i, 'sigma': sigmas[i], 'sigma_hat': sigmas[i], 'denoised': denoised})
d = to_d(x, sigmas[i], temp[0])
# Euler method
dt = sigma_down - sigmas[i]
x = denoised + d * sigma_down
if sigmas[i + 1] > 0:
x = x + noise_sampler(sigmas[i], sigmas[i + 1]) * s_noise * sigma_up
return x

View File

@ -10,25 +10,6 @@ from PIL import Image
import torch
from torch import nn, optim
from torch.utils import data
from torchvision.transforms import functional as TF
def from_pil_image(x):
"""Converts from a PIL image to a tensor."""
x = TF.to_tensor(x)
if x.ndim == 2:
x = x[..., None]
return x * 2 - 1
def to_pil_image(x):
"""Converts from a tensor to a PIL image."""
if x.ndim == 4:
assert x.shape[0] == 1
x = x[0]
if x.shape[0] == 1:
x = x[0]
return TF.to_pil_image((x.clamp(-1, 1) + 1) / 2)
def hf_datasets_augs_helper(examples, transform, image_key, mode='RGB'):

172
comfy/latent_formats.py Normal file
View File

@ -0,0 +1,172 @@
import torch
class LatentFormat:
scale_factor = 1.0
latent_channels = 4
latent_rgb_factors = None
taesd_decoder_name = None
def process_in(self, latent):
return latent * self.scale_factor
def process_out(self, latent):
return latent / self.scale_factor
class SD15(LatentFormat):
def __init__(self, scale_factor=0.18215):
self.scale_factor = scale_factor
self.latent_rgb_factors = [
# R G B
[ 0.3512, 0.2297, 0.3227],
[ 0.3250, 0.4974, 0.2350],
[-0.2829, 0.1762, 0.2721],
[-0.2120, -0.2616, -0.7177]
]
self.taesd_decoder_name = "taesd_decoder"
class SDXL(LatentFormat):
scale_factor = 0.13025
def __init__(self):
self.latent_rgb_factors = [
# R G B
[ 0.3920, 0.4054, 0.4549],
[-0.2634, -0.0196, 0.0653],
[ 0.0568, 0.1687, -0.0755],
[-0.3112, -0.2359, -0.2076]
]
self.taesd_decoder_name = "taesdxl_decoder"
class SDXL_Playground_2_5(LatentFormat):
def __init__(self):
self.scale_factor = 0.5
self.latents_mean = torch.tensor([-1.6574, 1.886, -1.383, 2.5155]).view(1, 4, 1, 1)
self.latents_std = torch.tensor([8.4927, 5.9022, 6.5498, 5.2299]).view(1, 4, 1, 1)
self.latent_rgb_factors = [
# R G B
[ 0.3920, 0.4054, 0.4549],
[-0.2634, -0.0196, 0.0653],
[ 0.0568, 0.1687, -0.0755],
[-0.3112, -0.2359, -0.2076]
]
self.taesd_decoder_name = "taesdxl_decoder"
def process_in(self, latent):
latents_mean = self.latents_mean.to(latent.device, latent.dtype)
latents_std = self.latents_std.to(latent.device, latent.dtype)
return (latent - latents_mean) * self.scale_factor / latents_std
def process_out(self, latent):
latents_mean = self.latents_mean.to(latent.device, latent.dtype)
latents_std = self.latents_std.to(latent.device, latent.dtype)
return latent * latents_std / self.scale_factor + latents_mean
class SD_X4(LatentFormat):
def __init__(self):
self.scale_factor = 0.08333
self.latent_rgb_factors = [
[-0.2340, -0.3863, -0.3257],
[ 0.0994, 0.0885, -0.0908],
[-0.2833, -0.2349, -0.3741],
[ 0.2523, -0.0055, -0.1651]
]
class SC_Prior(LatentFormat):
latent_channels = 16
def __init__(self):
self.scale_factor = 1.0
self.latent_rgb_factors = [
[-0.0326, -0.0204, -0.0127],
[-0.1592, -0.0427, 0.0216],
[ 0.0873, 0.0638, -0.0020],
[-0.0602, 0.0442, 0.1304],
[ 0.0800, -0.0313, -0.1796],
[-0.0810, -0.0638, -0.1581],
[ 0.1791, 0.1180, 0.0967],
[ 0.0740, 0.1416, 0.0432],
[-0.1745, -0.1888, -0.1373],
[ 0.2412, 0.1577, 0.0928],
[ 0.1908, 0.0998, 0.0682],
[ 0.0209, 0.0365, -0.0092],
[ 0.0448, -0.0650, -0.1728],
[-0.1658, -0.1045, -0.1308],
[ 0.0542, 0.1545, 0.1325],
[-0.0352, -0.1672, -0.2541]
]
class SC_B(LatentFormat):
def __init__(self):
self.scale_factor = 1.0 / 0.43
self.latent_rgb_factors = [
[ 0.1121, 0.2006, 0.1023],
[-0.2093, -0.0222, -0.0195],
[-0.3087, -0.1535, 0.0366],
[ 0.0290, -0.1574, -0.4078]
]
class SD3(LatentFormat):
latent_channels = 16
def __init__(self):
self.scale_factor = 1.5305
self.shift_factor = 0.0609
self.latent_rgb_factors = [
[-0.0645, 0.0177, 0.1052],
[ 0.0028, 0.0312, 0.0650],
[ 0.1848, 0.0762, 0.0360],
[ 0.0944, 0.0360, 0.0889],
[ 0.0897, 0.0506, -0.0364],
[-0.0020, 0.1203, 0.0284],
[ 0.0855, 0.0118, 0.0283],
[-0.0539, 0.0658, 0.1047],
[-0.0057, 0.0116, 0.0700],
[-0.0412, 0.0281, -0.0039],
[ 0.1106, 0.1171, 0.1220],
[-0.0248, 0.0682, -0.0481],
[ 0.0815, 0.0846, 0.1207],
[-0.0120, -0.0055, -0.0867],
[-0.0749, -0.0634, -0.0456],
[-0.1418, -0.1457, -0.1259]
]
self.taesd_decoder_name = "taesd3_decoder"
def process_in(self, latent):
return (latent - self.shift_factor) * self.scale_factor
def process_out(self, latent):
return (latent / self.scale_factor) + self.shift_factor
class StableAudio1(LatentFormat):
latent_channels = 64
class Flux(SD3):
latent_channels = 16
def __init__(self):
self.scale_factor = 0.3611
self.shift_factor = 0.1159
self.latent_rgb_factors =[
[-0.0404, 0.0159, 0.0609],
[ 0.0043, 0.0298, 0.0850],
[ 0.0328, -0.0749, -0.0503],
[-0.0245, 0.0085, 0.0549],
[ 0.0966, 0.0894, 0.0530],
[ 0.0035, 0.0399, 0.0123],
[ 0.0583, 0.1184, 0.1262],
[-0.0191, -0.0206, -0.0306],
[-0.0324, 0.0055, 0.1001],
[ 0.0955, 0.0659, -0.0545],
[-0.0504, 0.0231, -0.0013],
[ 0.0500, -0.0008, -0.0088],
[ 0.0982, 0.0941, 0.0976],
[-0.1233, -0.0280, -0.0897],
[-0.0005, -0.0530, -0.0020],
[-0.1273, -0.0932, -0.0680]
]
self.taesd_decoder_name = "taef1_decoder"
def process_in(self, latent):
return (latent - self.shift_factor) * self.scale_factor
def process_out(self, latent):
return (latent / self.scale_factor) + self.shift_factor

View File

@ -0,0 +1,282 @@
# code adapted from: https://github.com/Stability-AI/stable-audio-tools
import torch
from torch import nn
from typing import Literal, Dict, Any
import math
import comfy.ops
ops = comfy.ops.disable_weight_init
def vae_sample(mean, scale):
stdev = nn.functional.softplus(scale) + 1e-4
var = stdev * stdev
logvar = torch.log(var)
latents = torch.randn_like(mean) * stdev + mean
kl = (mean * mean + var - logvar - 1).sum(1).mean()
return latents, kl
class VAEBottleneck(nn.Module):
def __init__(self):
super().__init__()
self.is_discrete = False
def encode(self, x, return_info=False, **kwargs):
info = {}
mean, scale = x.chunk(2, dim=1)
x, kl = vae_sample(mean, scale)
info["kl"] = kl
if return_info:
return x, info
else:
return x
def decode(self, x):
return x
def snake_beta(x, alpha, beta):
return x + (1.0 / (beta + 0.000000001)) * pow(torch.sin(x * alpha), 2)
# Adapted from https://github.com/NVIDIA/BigVGAN/blob/main/activations.py under MIT license
class SnakeBeta(nn.Module):
def __init__(self, in_features, alpha=1.0, alpha_trainable=True, alpha_logscale=True):
super(SnakeBeta, self).__init__()
self.in_features = in_features
# initialize alpha
self.alpha_logscale = alpha_logscale
if self.alpha_logscale: # log scale alphas initialized to zeros
self.alpha = nn.Parameter(torch.zeros(in_features) * alpha)
self.beta = nn.Parameter(torch.zeros(in_features) * alpha)
else: # linear scale alphas initialized to ones
self.alpha = nn.Parameter(torch.ones(in_features) * alpha)
self.beta = nn.Parameter(torch.ones(in_features) * alpha)
# self.alpha.requires_grad = alpha_trainable
# self.beta.requires_grad = alpha_trainable
self.no_div_by_zero = 0.000000001
def forward(self, x):
alpha = self.alpha.unsqueeze(0).unsqueeze(-1).to(x.device) # line up with x to [B, C, T]
beta = self.beta.unsqueeze(0).unsqueeze(-1).to(x.device)
if self.alpha_logscale:
alpha = torch.exp(alpha)
beta = torch.exp(beta)
x = snake_beta(x, alpha, beta)
return x
def WNConv1d(*args, **kwargs):
try:
return torch.nn.utils.parametrizations.weight_norm(ops.Conv1d(*args, **kwargs))
except:
return torch.nn.utils.weight_norm(ops.Conv1d(*args, **kwargs)) #support pytorch 2.1 and older
def WNConvTranspose1d(*args, **kwargs):
try:
return torch.nn.utils.parametrizations.weight_norm(ops.ConvTranspose1d(*args, **kwargs))
except:
return torch.nn.utils.weight_norm(ops.ConvTranspose1d(*args, **kwargs)) #support pytorch 2.1 and older
def get_activation(activation: Literal["elu", "snake", "none"], antialias=False, channels=None) -> nn.Module:
if activation == "elu":
act = torch.nn.ELU()
elif activation == "snake":
act = SnakeBeta(channels)
elif activation == "none":
act = torch.nn.Identity()
else:
raise ValueError(f"Unknown activation {activation}")
if antialias:
act = Activation1d(act)
return act
class ResidualUnit(nn.Module):
def __init__(self, in_channels, out_channels, dilation, use_snake=False, antialias_activation=False):
super().__init__()
self.dilation = dilation
padding = (dilation * (7-1)) // 2
self.layers = nn.Sequential(
get_activation("snake" if use_snake else "elu", antialias=antialias_activation, channels=out_channels),
WNConv1d(in_channels=in_channels, out_channels=out_channels,
kernel_size=7, dilation=dilation, padding=padding),
get_activation("snake" if use_snake else "elu", antialias=antialias_activation, channels=out_channels),
WNConv1d(in_channels=out_channels, out_channels=out_channels,
kernel_size=1)
)
def forward(self, x):
res = x
#x = checkpoint(self.layers, x)
x = self.layers(x)
return x + res
class EncoderBlock(nn.Module):
def __init__(self, in_channels, out_channels, stride, use_snake=False, antialias_activation=False):
super().__init__()
self.layers = nn.Sequential(
ResidualUnit(in_channels=in_channels,
out_channels=in_channels, dilation=1, use_snake=use_snake),
ResidualUnit(in_channels=in_channels,
out_channels=in_channels, dilation=3, use_snake=use_snake),
ResidualUnit(in_channels=in_channels,
out_channels=in_channels, dilation=9, use_snake=use_snake),
get_activation("snake" if use_snake else "elu", antialias=antialias_activation, channels=in_channels),
WNConv1d(in_channels=in_channels, out_channels=out_channels,
kernel_size=2*stride, stride=stride, padding=math.ceil(stride/2)),
)
def forward(self, x):
return self.layers(x)
class DecoderBlock(nn.Module):
def __init__(self, in_channels, out_channels, stride, use_snake=False, antialias_activation=False, use_nearest_upsample=False):
super().__init__()
if use_nearest_upsample:
upsample_layer = nn.Sequential(
nn.Upsample(scale_factor=stride, mode="nearest"),
WNConv1d(in_channels=in_channels,
out_channels=out_channels,
kernel_size=2*stride,
stride=1,
bias=False,
padding='same')
)
else:
upsample_layer = WNConvTranspose1d(in_channels=in_channels,
out_channels=out_channels,
kernel_size=2*stride, stride=stride, padding=math.ceil(stride/2))
self.layers = nn.Sequential(
get_activation("snake" if use_snake else "elu", antialias=antialias_activation, channels=in_channels),
upsample_layer,
ResidualUnit(in_channels=out_channels, out_channels=out_channels,
dilation=1, use_snake=use_snake),
ResidualUnit(in_channels=out_channels, out_channels=out_channels,
dilation=3, use_snake=use_snake),
ResidualUnit(in_channels=out_channels, out_channels=out_channels,
dilation=9, use_snake=use_snake),
)
def forward(self, x):
return self.layers(x)
class OobleckEncoder(nn.Module):
def __init__(self,
in_channels=2,
channels=128,
latent_dim=32,
c_mults = [1, 2, 4, 8],
strides = [2, 4, 8, 8],
use_snake=False,
antialias_activation=False
):
super().__init__()
c_mults = [1] + c_mults
self.depth = len(c_mults)
layers = [
WNConv1d(in_channels=in_channels, out_channels=c_mults[0] * channels, kernel_size=7, padding=3)
]
for i in range(self.depth-1):
layers += [EncoderBlock(in_channels=c_mults[i]*channels, out_channels=c_mults[i+1]*channels, stride=strides[i], use_snake=use_snake)]
layers += [
get_activation("snake" if use_snake else "elu", antialias=antialias_activation, channels=c_mults[-1] * channels),
WNConv1d(in_channels=c_mults[-1]*channels, out_channels=latent_dim, kernel_size=3, padding=1)
]
self.layers = nn.Sequential(*layers)
def forward(self, x):
return self.layers(x)
class OobleckDecoder(nn.Module):
def __init__(self,
out_channels=2,
channels=128,
latent_dim=32,
c_mults = [1, 2, 4, 8],
strides = [2, 4, 8, 8],
use_snake=False,
antialias_activation=False,
use_nearest_upsample=False,
final_tanh=True):
super().__init__()
c_mults = [1] + c_mults
self.depth = len(c_mults)
layers = [
WNConv1d(in_channels=latent_dim, out_channels=c_mults[-1]*channels, kernel_size=7, padding=3),
]
for i in range(self.depth-1, 0, -1):
layers += [DecoderBlock(
in_channels=c_mults[i]*channels,
out_channels=c_mults[i-1]*channels,
stride=strides[i-1],
use_snake=use_snake,
antialias_activation=antialias_activation,
use_nearest_upsample=use_nearest_upsample
)
]
layers += [
get_activation("snake" if use_snake else "elu", antialias=antialias_activation, channels=c_mults[0] * channels),
WNConv1d(in_channels=c_mults[0] * channels, out_channels=out_channels, kernel_size=7, padding=3, bias=False),
nn.Tanh() if final_tanh else nn.Identity()
]
self.layers = nn.Sequential(*layers)
def forward(self, x):
return self.layers(x)
class AudioOobleckVAE(nn.Module):
def __init__(self,
in_channels=2,
channels=128,
latent_dim=64,
c_mults = [1, 2, 4, 8, 16],
strides = [2, 4, 4, 8, 8],
use_snake=True,
antialias_activation=False,
use_nearest_upsample=False,
final_tanh=False):
super().__init__()
self.encoder = OobleckEncoder(in_channels, channels, latent_dim * 2, c_mults, strides, use_snake, antialias_activation)
self.decoder = OobleckDecoder(in_channels, channels, latent_dim, c_mults, strides, use_snake, antialias_activation,
use_nearest_upsample=use_nearest_upsample, final_tanh=final_tanh)
self.bottleneck = VAEBottleneck()
def encode(self, x):
return self.bottleneck.encode(self.encoder(x))
def decode(self, x):
return self.decoder(self.bottleneck.decode(x))

891
comfy/ldm/audio/dit.py Normal file
View File

@ -0,0 +1,891 @@
# code adapted from: https://github.com/Stability-AI/stable-audio-tools
from comfy.ldm.modules.attention import optimized_attention
import typing as tp
import torch
from einops import rearrange
from torch import nn
from torch.nn import functional as F
import math
import comfy.ops
class FourierFeatures(nn.Module):
def __init__(self, in_features, out_features, std=1., dtype=None, device=None):
super().__init__()
assert out_features % 2 == 0
self.weight = nn.Parameter(torch.empty(
[out_features // 2, in_features], dtype=dtype, device=device))
def forward(self, input):
f = 2 * math.pi * input @ comfy.ops.cast_to_input(self.weight.T, input)
return torch.cat([f.cos(), f.sin()], dim=-1)
# norms
class LayerNorm(nn.Module):
def __init__(self, dim, bias=False, fix_scale=False, dtype=None, device=None):
"""
bias-less layernorm has been shown to be more stable. most newer models have moved towards rmsnorm, also bias-less
"""
super().__init__()
self.gamma = nn.Parameter(torch.empty(dim, dtype=dtype, device=device))
if bias:
self.beta = nn.Parameter(torch.empty(dim, dtype=dtype, device=device))
else:
self.beta = None
def forward(self, x):
beta = self.beta
if beta is not None:
beta = comfy.ops.cast_to_input(beta, x)
return F.layer_norm(x, x.shape[-1:], weight=comfy.ops.cast_to_input(self.gamma, x), bias=beta)
class GLU(nn.Module):
def __init__(
self,
dim_in,
dim_out,
activation,
use_conv = False,
conv_kernel_size = 3,
dtype=None,
device=None,
operations=None,
):
super().__init__()
self.act = activation
self.proj = operations.Linear(dim_in, dim_out * 2, dtype=dtype, device=device) if not use_conv else operations.Conv1d(dim_in, dim_out * 2, conv_kernel_size, padding = (conv_kernel_size // 2), dtype=dtype, device=device)
self.use_conv = use_conv
def forward(self, x):
if self.use_conv:
x = rearrange(x, 'b n d -> b d n')
x = self.proj(x)
x = rearrange(x, 'b d n -> b n d')
else:
x = self.proj(x)
x, gate = x.chunk(2, dim = -1)
return x * self.act(gate)
class AbsolutePositionalEmbedding(nn.Module):
def __init__(self, dim, max_seq_len):
super().__init__()
self.scale = dim ** -0.5
self.max_seq_len = max_seq_len
self.emb = nn.Embedding(max_seq_len, dim)
def forward(self, x, pos = None, seq_start_pos = None):
seq_len, device = x.shape[1], x.device
assert seq_len <= self.max_seq_len, f'you are passing in a sequence length of {seq_len} but your absolute positional embedding has a max sequence length of {self.max_seq_len}'
if pos is None:
pos = torch.arange(seq_len, device = device)
if seq_start_pos is not None:
pos = (pos - seq_start_pos[..., None]).clamp(min = 0)
pos_emb = self.emb(pos)
pos_emb = pos_emb * self.scale
return pos_emb
class ScaledSinusoidalEmbedding(nn.Module):
def __init__(self, dim, theta = 10000):
super().__init__()
assert (dim % 2) == 0, 'dimension must be divisible by 2'
self.scale = nn.Parameter(torch.ones(1) * dim ** -0.5)
half_dim = dim // 2
freq_seq = torch.arange(half_dim).float() / half_dim
inv_freq = theta ** -freq_seq
self.register_buffer('inv_freq', inv_freq, persistent = False)
def forward(self, x, pos = None, seq_start_pos = None):
seq_len, device = x.shape[1], x.device
if pos is None:
pos = torch.arange(seq_len, device = device)
if seq_start_pos is not None:
pos = pos - seq_start_pos[..., None]
emb = torch.einsum('i, j -> i j', pos, self.inv_freq)
emb = torch.cat((emb.sin(), emb.cos()), dim = -1)
return emb * self.scale
class RotaryEmbedding(nn.Module):
def __init__(
self,
dim,
use_xpos = False,
scale_base = 512,
interpolation_factor = 1.,
base = 10000,
base_rescale_factor = 1.,
dtype=None,
device=None,
):
super().__init__()
# proposed by reddit user bloc97, to rescale rotary embeddings to longer sequence length without fine-tuning
# has some connection to NTK literature
# https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/
base *= base_rescale_factor ** (dim / (dim - 2))
# inv_freq = 1. / (base ** (torch.arange(0, dim, 2).float() / dim))
self.register_buffer('inv_freq', torch.empty((dim // 2,), device=device, dtype=dtype))
assert interpolation_factor >= 1.
self.interpolation_factor = interpolation_factor
if not use_xpos:
self.register_buffer('scale', None)
return
scale = (torch.arange(0, dim, 2) + 0.4 * dim) / (1.4 * dim)
self.scale_base = scale_base
self.register_buffer('scale', scale)
def forward_from_seq_len(self, seq_len, device, dtype):
# device = self.inv_freq.device
t = torch.arange(seq_len, device=device, dtype=dtype)
return self.forward(t)
def forward(self, t):
# device = self.inv_freq.device
device = t.device
dtype = t.dtype
# t = t.to(torch.float32)
t = t / self.interpolation_factor
freqs = torch.einsum('i , j -> i j', t, comfy.ops.cast_to_input(self.inv_freq, t))
freqs = torch.cat((freqs, freqs), dim = -1)
if self.scale is None:
return freqs, 1.
power = (torch.arange(seq_len, device = device) - (seq_len // 2)) / self.scale_base
scale = comfy.ops.cast_to_input(self.scale, t) ** rearrange(power, 'n -> n 1')
scale = torch.cat((scale, scale), dim = -1)
return freqs, scale
def rotate_half(x):
x = rearrange(x, '... (j d) -> ... j d', j = 2)
x1, x2 = x.unbind(dim = -2)
return torch.cat((-x2, x1), dim = -1)
def apply_rotary_pos_emb(t, freqs, scale = 1):
out_dtype = t.dtype
# cast to float32 if necessary for numerical stability
dtype = t.dtype #reduce(torch.promote_types, (t.dtype, freqs.dtype, torch.float32))
rot_dim, seq_len = freqs.shape[-1], t.shape[-2]
freqs, t = freqs.to(dtype), t.to(dtype)
freqs = freqs[-seq_len:, :]
if t.ndim == 4 and freqs.ndim == 3:
freqs = rearrange(freqs, 'b n d -> b 1 n d')
# partial rotary embeddings, Wang et al. GPT-J
t, t_unrotated = t[..., :rot_dim], t[..., rot_dim:]
t = (t * freqs.cos() * scale) + (rotate_half(t) * freqs.sin() * scale)
t, t_unrotated = t.to(out_dtype), t_unrotated.to(out_dtype)
return torch.cat((t, t_unrotated), dim = -1)
class FeedForward(nn.Module):
def __init__(
self,
dim,
dim_out = None,
mult = 4,
no_bias = False,
glu = True,
use_conv = False,
conv_kernel_size = 3,
zero_init_output = True,
dtype=None,
device=None,
operations=None,
):
super().__init__()
inner_dim = int(dim * mult)
# Default to SwiGLU
activation = nn.SiLU()
dim_out = dim if dim_out is None else dim_out
if glu:
linear_in = GLU(dim, inner_dim, activation, dtype=dtype, device=device, operations=operations)
else:
linear_in = nn.Sequential(
Rearrange('b n d -> b d n') if use_conv else nn.Identity(),
operations.Linear(dim, inner_dim, bias = not no_bias, dtype=dtype, device=device) if not use_conv else operations.Conv1d(dim, inner_dim, conv_kernel_size, padding = (conv_kernel_size // 2), bias = not no_bias, dtype=dtype, device=device),
Rearrange('b n d -> b d n') if use_conv else nn.Identity(),
activation
)
linear_out = operations.Linear(inner_dim, dim_out, bias = not no_bias, dtype=dtype, device=device) if not use_conv else operations.Conv1d(inner_dim, dim_out, conv_kernel_size, padding = (conv_kernel_size // 2), bias = not no_bias, dtype=dtype, device=device)
# # init last linear layer to 0
# if zero_init_output:
# nn.init.zeros_(linear_out.weight)
# if not no_bias:
# nn.init.zeros_(linear_out.bias)
self.ff = nn.Sequential(
linear_in,
Rearrange('b d n -> b n d') if use_conv else nn.Identity(),
linear_out,
Rearrange('b n d -> b d n') if use_conv else nn.Identity(),
)
def forward(self, x):
return self.ff(x)
class Attention(nn.Module):
def __init__(
self,
dim,
dim_heads = 64,
dim_context = None,
causal = False,
zero_init_output=True,
qk_norm = False,
natten_kernel_size = None,
dtype=None,
device=None,
operations=None,
):
super().__init__()
self.dim = dim
self.dim_heads = dim_heads
self.causal = causal
dim_kv = dim_context if dim_context is not None else dim
self.num_heads = dim // dim_heads
self.kv_heads = dim_kv // dim_heads
if dim_context is not None:
self.to_q = operations.Linear(dim, dim, bias=False, dtype=dtype, device=device)
self.to_kv = operations.Linear(dim_kv, dim_kv * 2, bias=False, dtype=dtype, device=device)
else:
self.to_qkv = operations.Linear(dim, dim * 3, bias=False, dtype=dtype, device=device)
self.to_out = operations.Linear(dim, dim, bias=False, dtype=dtype, device=device)
# if zero_init_output:
# nn.init.zeros_(self.to_out.weight)
self.qk_norm = qk_norm
def forward(
self,
x,
context = None,
mask = None,
context_mask = None,
rotary_pos_emb = None,
causal = None
):
h, kv_h, has_context = self.num_heads, self.kv_heads, context is not None
kv_input = context if has_context else x
if hasattr(self, 'to_q'):
# Use separate linear projections for q and k/v
q = self.to_q(x)
q = rearrange(q, 'b n (h d) -> b h n d', h = h)
k, v = self.to_kv(kv_input).chunk(2, dim=-1)
k, v = map(lambda t: rearrange(t, 'b n (h d) -> b h n d', h = kv_h), (k, v))
else:
# Use fused linear projection
q, k, v = self.to_qkv(x).chunk(3, dim=-1)
q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b h n d', h = h), (q, k, v))
# Normalize q and k for cosine sim attention
if self.qk_norm:
q = F.normalize(q, dim=-1)
k = F.normalize(k, dim=-1)
if rotary_pos_emb is not None and not has_context:
freqs, _ = rotary_pos_emb
q_dtype = q.dtype
k_dtype = k.dtype
q = q.to(torch.float32)
k = k.to(torch.float32)
freqs = freqs.to(torch.float32)
q = apply_rotary_pos_emb(q, freqs)
k = apply_rotary_pos_emb(k, freqs)
q = q.to(q_dtype)
k = k.to(k_dtype)
input_mask = context_mask
if input_mask is None and not has_context:
input_mask = mask
# determine masking
masks = []
final_attn_mask = None # The mask that will be applied to the attention matrix, taking all masks into account
if input_mask is not None:
input_mask = rearrange(input_mask, 'b j -> b 1 1 j')
masks.append(~input_mask)
# Other masks will be added here later
if len(masks) > 0:
final_attn_mask = ~or_reduce(masks)
n, device = q.shape[-2], q.device
causal = self.causal if causal is None else causal
if n == 1 and causal:
causal = False
if h != kv_h:
# Repeat interleave kv_heads to match q_heads
heads_per_kv_head = h // kv_h
k, v = map(lambda t: t.repeat_interleave(heads_per_kv_head, dim = 1), (k, v))
out = optimized_attention(q, k, v, h, skip_reshape=True)
out = self.to_out(out)
if mask is not None:
mask = rearrange(mask, 'b n -> b n 1')
out = out.masked_fill(~mask, 0.)
return out
class ConformerModule(nn.Module):
def __init__(
self,
dim,
norm_kwargs = {},
):
super().__init__()
self.dim = dim
self.in_norm = LayerNorm(dim, **norm_kwargs)
self.pointwise_conv = nn.Conv1d(dim, dim, kernel_size=1, bias=False)
self.glu = GLU(dim, dim, nn.SiLU())
self.depthwise_conv = nn.Conv1d(dim, dim, kernel_size=17, groups=dim, padding=8, bias=False)
self.mid_norm = LayerNorm(dim, **norm_kwargs) # This is a batch norm in the original but I don't like batch norm
self.swish = nn.SiLU()
self.pointwise_conv_2 = nn.Conv1d(dim, dim, kernel_size=1, bias=False)
def forward(self, x):
x = self.in_norm(x)
x = rearrange(x, 'b n d -> b d n')
x = self.pointwise_conv(x)
x = rearrange(x, 'b d n -> b n d')
x = self.glu(x)
x = rearrange(x, 'b n d -> b d n')
x = self.depthwise_conv(x)
x = rearrange(x, 'b d n -> b n d')
x = self.mid_norm(x)
x = self.swish(x)
x = rearrange(x, 'b n d -> b d n')
x = self.pointwise_conv_2(x)
x = rearrange(x, 'b d n -> b n d')
return x
class TransformerBlock(nn.Module):
def __init__(
self,
dim,
dim_heads = 64,
cross_attend = False,
dim_context = None,
global_cond_dim = None,
causal = False,
zero_init_branch_outputs = True,
conformer = False,
layer_ix = -1,
remove_norms = False,
attn_kwargs = {},
ff_kwargs = {},
norm_kwargs = {},
dtype=None,
device=None,
operations=None,
):
super().__init__()
self.dim = dim
self.dim_heads = dim_heads
self.cross_attend = cross_attend
self.dim_context = dim_context
self.causal = causal
self.pre_norm = LayerNorm(dim, dtype=dtype, device=device, **norm_kwargs) if not remove_norms else nn.Identity()
self.self_attn = Attention(
dim,
dim_heads = dim_heads,
causal = causal,
zero_init_output=zero_init_branch_outputs,
dtype=dtype,
device=device,
operations=operations,
**attn_kwargs
)
if cross_attend:
self.cross_attend_norm = LayerNorm(dim, dtype=dtype, device=device, **norm_kwargs) if not remove_norms else nn.Identity()
self.cross_attn = Attention(
dim,
dim_heads = dim_heads,
dim_context=dim_context,
causal = causal,
zero_init_output=zero_init_branch_outputs,
dtype=dtype,
device=device,
operations=operations,
**attn_kwargs
)
self.ff_norm = LayerNorm(dim, dtype=dtype, device=device, **norm_kwargs) if not remove_norms else nn.Identity()
self.ff = FeedForward(dim, zero_init_output=zero_init_branch_outputs, dtype=dtype, device=device, operations=operations,**ff_kwargs)
self.layer_ix = layer_ix
self.conformer = ConformerModule(dim, norm_kwargs=norm_kwargs) if conformer else None
self.global_cond_dim = global_cond_dim
if global_cond_dim is not None:
self.to_scale_shift_gate = nn.Sequential(
nn.SiLU(),
nn.Linear(global_cond_dim, dim * 6, bias=False)
)
nn.init.zeros_(self.to_scale_shift_gate[1].weight)
#nn.init.zeros_(self.to_scale_shift_gate_self[1].bias)
def forward(
self,
x,
context = None,
global_cond=None,
mask = None,
context_mask = None,
rotary_pos_emb = None
):
if self.global_cond_dim is not None and self.global_cond_dim > 0 and global_cond is not None:
scale_self, shift_self, gate_self, scale_ff, shift_ff, gate_ff = self.to_scale_shift_gate(global_cond).unsqueeze(1).chunk(6, dim = -1)
# self-attention with adaLN
residual = x
x = self.pre_norm(x)
x = x * (1 + scale_self) + shift_self
x = self.self_attn(x, mask = mask, rotary_pos_emb = rotary_pos_emb)
x = x * torch.sigmoid(1 - gate_self)
x = x + residual
if context is not None:
x = x + self.cross_attn(self.cross_attend_norm(x), context = context, context_mask = context_mask)
if self.conformer is not None:
x = x + self.conformer(x)
# feedforward with adaLN
residual = x
x = self.ff_norm(x)
x = x * (1 + scale_ff) + shift_ff
x = self.ff(x)
x = x * torch.sigmoid(1 - gate_ff)
x = x + residual
else:
x = x + self.self_attn(self.pre_norm(x), mask = mask, rotary_pos_emb = rotary_pos_emb)
if context is not None:
x = x + self.cross_attn(self.cross_attend_norm(x), context = context, context_mask = context_mask)
if self.conformer is not None:
x = x + self.conformer(x)
x = x + self.ff(self.ff_norm(x))
return x
class ContinuousTransformer(nn.Module):
def __init__(
self,
dim,
depth,
*,
dim_in = None,
dim_out = None,
dim_heads = 64,
cross_attend=False,
cond_token_dim=None,
global_cond_dim=None,
causal=False,
rotary_pos_emb=True,
zero_init_branch_outputs=True,
conformer=False,
use_sinusoidal_emb=False,
use_abs_pos_emb=False,
abs_pos_emb_max_length=10000,
dtype=None,
device=None,
operations=None,
**kwargs
):
super().__init__()
self.dim = dim
self.depth = depth
self.causal = causal
self.layers = nn.ModuleList([])
self.project_in = operations.Linear(dim_in, dim, bias=False, dtype=dtype, device=device) if dim_in is not None else nn.Identity()
self.project_out = operations.Linear(dim, dim_out, bias=False, dtype=dtype, device=device) if dim_out is not None else nn.Identity()
if rotary_pos_emb:
self.rotary_pos_emb = RotaryEmbedding(max(dim_heads // 2, 32), device=device, dtype=dtype)
else:
self.rotary_pos_emb = None
self.use_sinusoidal_emb = use_sinusoidal_emb
if use_sinusoidal_emb:
self.pos_emb = ScaledSinusoidalEmbedding(dim)
self.use_abs_pos_emb = use_abs_pos_emb
if use_abs_pos_emb:
self.pos_emb = AbsolutePositionalEmbedding(dim, abs_pos_emb_max_length)
for i in range(depth):
self.layers.append(
TransformerBlock(
dim,
dim_heads = dim_heads,
cross_attend = cross_attend,
dim_context = cond_token_dim,
global_cond_dim = global_cond_dim,
causal = causal,
zero_init_branch_outputs = zero_init_branch_outputs,
conformer=conformer,
layer_ix=i,
dtype=dtype,
device=device,
operations=operations,
**kwargs
)
)
def forward(
self,
x,
mask = None,
prepend_embeds = None,
prepend_mask = None,
global_cond = None,
return_info = False,
**kwargs
):
batch, seq, device = *x.shape[:2], x.device
info = {
"hidden_states": [],
}
x = self.project_in(x)
if prepend_embeds is not None:
prepend_length, prepend_dim = prepend_embeds.shape[1:]
assert prepend_dim == x.shape[-1], 'prepend dimension must match sequence dimension'
x = torch.cat((prepend_embeds, x), dim = -2)
if prepend_mask is not None or mask is not None:
mask = mask if mask is not None else torch.ones((batch, seq), device = device, dtype = torch.bool)
prepend_mask = prepend_mask if prepend_mask is not None else torch.ones((batch, prepend_length), device = device, dtype = torch.bool)
mask = torch.cat((prepend_mask, mask), dim = -1)
# Attention layers
if self.rotary_pos_emb is not None:
rotary_pos_emb = self.rotary_pos_emb.forward_from_seq_len(x.shape[1], dtype=x.dtype, device=x.device)
else:
rotary_pos_emb = None
if self.use_sinusoidal_emb or self.use_abs_pos_emb:
x = x + self.pos_emb(x)
# Iterate over the transformer layers
for layer in self.layers:
x = layer(x, rotary_pos_emb = rotary_pos_emb, global_cond=global_cond, **kwargs)
# x = checkpoint(layer, x, rotary_pos_emb = rotary_pos_emb, global_cond=global_cond, **kwargs)
if return_info:
info["hidden_states"].append(x)
x = self.project_out(x)
if return_info:
return x, info
return x
class AudioDiffusionTransformer(nn.Module):
def __init__(self,
io_channels=64,
patch_size=1,
embed_dim=1536,
cond_token_dim=768,
project_cond_tokens=False,
global_cond_dim=1536,
project_global_cond=True,
input_concat_dim=0,
prepend_cond_dim=0,
depth=24,
num_heads=24,
transformer_type: tp.Literal["continuous_transformer"] = "continuous_transformer",
global_cond_type: tp.Literal["prepend", "adaLN"] = "prepend",
audio_model="",
dtype=None,
device=None,
operations=None,
**kwargs):
super().__init__()
self.dtype = dtype
self.cond_token_dim = cond_token_dim
# Timestep embeddings
timestep_features_dim = 256
self.timestep_features = FourierFeatures(1, timestep_features_dim, dtype=dtype, device=device)
self.to_timestep_embed = nn.Sequential(
operations.Linear(timestep_features_dim, embed_dim, bias=True, dtype=dtype, device=device),
nn.SiLU(),
operations.Linear(embed_dim, embed_dim, bias=True, dtype=dtype, device=device),
)
if cond_token_dim > 0:
# Conditioning tokens
cond_embed_dim = cond_token_dim if not project_cond_tokens else embed_dim
self.to_cond_embed = nn.Sequential(
operations.Linear(cond_token_dim, cond_embed_dim, bias=False, dtype=dtype, device=device),
nn.SiLU(),
operations.Linear(cond_embed_dim, cond_embed_dim, bias=False, dtype=dtype, device=device)
)
else:
cond_embed_dim = 0
if global_cond_dim > 0:
# Global conditioning
global_embed_dim = global_cond_dim if not project_global_cond else embed_dim
self.to_global_embed = nn.Sequential(
operations.Linear(global_cond_dim, global_embed_dim, bias=False, dtype=dtype, device=device),
nn.SiLU(),
operations.Linear(global_embed_dim, global_embed_dim, bias=False, dtype=dtype, device=device)
)
if prepend_cond_dim > 0:
# Prepend conditioning
self.to_prepend_embed = nn.Sequential(
operations.Linear(prepend_cond_dim, embed_dim, bias=False, dtype=dtype, device=device),
nn.SiLU(),
operations.Linear(embed_dim, embed_dim, bias=False, dtype=dtype, device=device)
)
self.input_concat_dim = input_concat_dim
dim_in = io_channels + self.input_concat_dim
self.patch_size = patch_size
# Transformer
self.transformer_type = transformer_type
self.global_cond_type = global_cond_type
if self.transformer_type == "continuous_transformer":
global_dim = None
if self.global_cond_type == "adaLN":
# The global conditioning is projected to the embed_dim already at this point
global_dim = embed_dim
self.transformer = ContinuousTransformer(
dim=embed_dim,
depth=depth,
dim_heads=embed_dim // num_heads,
dim_in=dim_in * patch_size,
dim_out=io_channels * patch_size,
cross_attend = cond_token_dim > 0,
cond_token_dim = cond_embed_dim,
global_cond_dim=global_dim,
dtype=dtype,
device=device,
operations=operations,
**kwargs
)
else:
raise ValueError(f"Unknown transformer type: {self.transformer_type}")
self.preprocess_conv = operations.Conv1d(dim_in, dim_in, 1, bias=False, dtype=dtype, device=device)
self.postprocess_conv = operations.Conv1d(io_channels, io_channels, 1, bias=False, dtype=dtype, device=device)
def _forward(
self,
x,
t,
mask=None,
cross_attn_cond=None,
cross_attn_cond_mask=None,
input_concat_cond=None,
global_embed=None,
prepend_cond=None,
prepend_cond_mask=None,
return_info=False,
**kwargs):
if cross_attn_cond is not None:
cross_attn_cond = self.to_cond_embed(cross_attn_cond)
if global_embed is not None:
# Project the global conditioning to the embedding dimension
global_embed = self.to_global_embed(global_embed)
prepend_inputs = None
prepend_mask = None
prepend_length = 0
if prepend_cond is not None:
# Project the prepend conditioning to the embedding dimension
prepend_cond = self.to_prepend_embed(prepend_cond)
prepend_inputs = prepend_cond
if prepend_cond_mask is not None:
prepend_mask = prepend_cond_mask
if input_concat_cond is not None:
# Interpolate input_concat_cond to the same length as x
if input_concat_cond.shape[2] != x.shape[2]:
input_concat_cond = F.interpolate(input_concat_cond, (x.shape[2], ), mode='nearest')
x = torch.cat([x, input_concat_cond], dim=1)
# Get the batch of timestep embeddings
timestep_embed = self.to_timestep_embed(self.timestep_features(t[:, None]).to(x.dtype)) # (b, embed_dim)
# Timestep embedding is considered a global embedding. Add to the global conditioning if it exists
if global_embed is not None:
global_embed = global_embed + timestep_embed
else:
global_embed = timestep_embed
# Add the global_embed to the prepend inputs if there is no global conditioning support in the transformer
if self.global_cond_type == "prepend":
if prepend_inputs is None:
# Prepend inputs are just the global embed, and the mask is all ones
prepend_inputs = global_embed.unsqueeze(1)
prepend_mask = torch.ones((x.shape[0], 1), device=x.device, dtype=torch.bool)
else:
# Prepend inputs are the prepend conditioning + the global embed
prepend_inputs = torch.cat([prepend_inputs, global_embed.unsqueeze(1)], dim=1)
prepend_mask = torch.cat([prepend_mask, torch.ones((x.shape[0], 1), device=x.device, dtype=torch.bool)], dim=1)
prepend_length = prepend_inputs.shape[1]
x = self.preprocess_conv(x) + x
x = rearrange(x, "b c t -> b t c")
extra_args = {}
if self.global_cond_type == "adaLN":
extra_args["global_cond"] = global_embed
if self.patch_size > 1:
x = rearrange(x, "b (t p) c -> b t (c p)", p=self.patch_size)
if self.transformer_type == "x-transformers":
output = self.transformer(x, prepend_embeds=prepend_inputs, context=cross_attn_cond, context_mask=cross_attn_cond_mask, mask=mask, prepend_mask=prepend_mask, **extra_args, **kwargs)
elif self.transformer_type == "continuous_transformer":
output = self.transformer(x, prepend_embeds=prepend_inputs, context=cross_attn_cond, context_mask=cross_attn_cond_mask, mask=mask, prepend_mask=prepend_mask, return_info=return_info, **extra_args, **kwargs)
if return_info:
output, info = output
elif self.transformer_type == "mm_transformer":
output = self.transformer(x, context=cross_attn_cond, mask=mask, context_mask=cross_attn_cond_mask, **extra_args, **kwargs)
output = rearrange(output, "b t c -> b c t")[:,:,prepend_length:]
if self.patch_size > 1:
output = rearrange(output, "b (c p) t -> b c (t p)", p=self.patch_size)
output = self.postprocess_conv(output) + output
if return_info:
return output, info
return output
def forward(
self,
x,
timestep,
context=None,
context_mask=None,
input_concat_cond=None,
global_embed=None,
negative_global_embed=None,
prepend_cond=None,
prepend_cond_mask=None,
mask=None,
return_info=False,
control=None,
transformer_options={},
**kwargs):
return self._forward(
x,
timestep,
cross_attn_cond=context,
cross_attn_cond_mask=context_mask,
input_concat_cond=input_concat_cond,
global_embed=global_embed,
prepend_cond=prepend_cond,
prepend_cond_mask=prepend_cond_mask,
mask=mask,
return_info=return_info,
**kwargs
)

View File

@ -0,0 +1,108 @@
# code adapted from: https://github.com/Stability-AI/stable-audio-tools
import torch
import torch.nn as nn
from torch import Tensor, einsum
from typing import Any, Callable, Dict, List, Optional, Sequence, Tuple, TypeVar, Union
from einops import rearrange
import math
import comfy.ops
class LearnedPositionalEmbedding(nn.Module):
"""Used for continuous time"""
def __init__(self, dim: int):
super().__init__()
assert (dim % 2) == 0
half_dim = dim // 2
self.weights = nn.Parameter(torch.empty(half_dim))
def forward(self, x: Tensor) -> Tensor:
x = rearrange(x, "b -> b 1")
freqs = x * rearrange(self.weights, "d -> 1 d") * 2 * math.pi
fouriered = torch.cat((freqs.sin(), freqs.cos()), dim=-1)
fouriered = torch.cat((x, fouriered), dim=-1)
return fouriered
def TimePositionalEmbedding(dim: int, out_features: int) -> nn.Module:
return nn.Sequential(
LearnedPositionalEmbedding(dim),
comfy.ops.manual_cast.Linear(in_features=dim + 1, out_features=out_features),
)
class NumberEmbedder(nn.Module):
def __init__(
self,
features: int,
dim: int = 256,
):
super().__init__()
self.features = features
self.embedding = TimePositionalEmbedding(dim=dim, out_features=features)
def forward(self, x: Union[List[float], Tensor]) -> Tensor:
if not torch.is_tensor(x):
device = next(self.embedding.parameters()).device
x = torch.tensor(x, device=device)
assert isinstance(x, Tensor)
shape = x.shape
x = rearrange(x, "... -> (...)")
embedding = self.embedding(x)
x = embedding.view(*shape, self.features)
return x # type: ignore
class Conditioner(nn.Module):
def __init__(
self,
dim: int,
output_dim: int,
project_out: bool = False
):
super().__init__()
self.dim = dim
self.output_dim = output_dim
self.proj_out = nn.Linear(dim, output_dim) if (dim != output_dim or project_out) else nn.Identity()
def forward(self, x):
raise NotImplementedError()
class NumberConditioner(Conditioner):
'''
Conditioner that takes a list of floats, normalizes them for a given range, and returns a list of embeddings
'''
def __init__(self,
output_dim: int,
min_val: float=0,
max_val: float=1
):
super().__init__(output_dim, output_dim)
self.min_val = min_val
self.max_val = max_val
self.embedder = NumberEmbedder(features=output_dim)
def forward(self, floats, device=None):
# Cast the inputs to floats
floats = [float(x) for x in floats]
if device is None:
device = next(self.embedder.parameters()).device
floats = torch.tensor(floats).to(device)
floats = floats.clamp(self.min_val, self.max_val)
normalized_floats = (floats - self.min_val) / (self.max_val - self.min_val)
# Cast floats to same type as embedder
embedder_dtype = next(self.embedder.parameters()).dtype
normalized_floats = normalized_floats.to(embedder_dtype)
float_embeds = self.embedder(normalized_floats).unsqueeze(1)
return [float_embeds, torch.ones(float_embeds.shape[0], 1).to(device)]

478
comfy/ldm/aura/mmdit.py Normal file
View File

@ -0,0 +1,478 @@
#AuraFlow MMDiT
#Originally written by the AuraFlow Authors
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
from comfy.ldm.modules.attention import optimized_attention
import comfy.ops
import comfy.ldm.common_dit
def modulate(x, shift, scale):
return x * (1 + scale.unsqueeze(1)) + shift.unsqueeze(1)
def find_multiple(n: int, k: int) -> int:
if n % k == 0:
return n
return n + k - (n % k)
class MLP(nn.Module):
def __init__(self, dim, hidden_dim=None, dtype=None, device=None, operations=None) -> None:
super().__init__()
if hidden_dim is None:
hidden_dim = 4 * dim
n_hidden = int(2 * hidden_dim / 3)
n_hidden = find_multiple(n_hidden, 256)
self.c_fc1 = operations.Linear(dim, n_hidden, bias=False, dtype=dtype, device=device)
self.c_fc2 = operations.Linear(dim, n_hidden, bias=False, dtype=dtype, device=device)
self.c_proj = operations.Linear(n_hidden, dim, bias=False, dtype=dtype, device=device)
def forward(self, x: torch.Tensor) -> torch.Tensor:
x = F.silu(self.c_fc1(x)) * self.c_fc2(x)
x = self.c_proj(x)
return x
class MultiHeadLayerNorm(nn.Module):
def __init__(self, hidden_size=None, eps=1e-5, dtype=None, device=None):
# Copy pasta from https://github.com/huggingface/transformers/blob/e5f71ecaae50ea476d1e12351003790273c4b2ed/src/transformers/models/cohere/modeling_cohere.py#L78
super().__init__()
self.weight = nn.Parameter(torch.empty(hidden_size, dtype=dtype, device=device))
self.variance_epsilon = eps
def forward(self, hidden_states):
input_dtype = hidden_states.dtype
hidden_states = hidden_states.to(torch.float32)
mean = hidden_states.mean(-1, keepdim=True)
variance = (hidden_states - mean).pow(2).mean(-1, keepdim=True)
hidden_states = (hidden_states - mean) * torch.rsqrt(
variance + self.variance_epsilon
)
hidden_states = self.weight.to(torch.float32) * hidden_states
return hidden_states.to(input_dtype)
class SingleAttention(nn.Module):
def __init__(self, dim, n_heads, mh_qknorm=False, dtype=None, device=None, operations=None):
super().__init__()
self.n_heads = n_heads
self.head_dim = dim // n_heads
# this is for cond
self.w1q = operations.Linear(dim, dim, bias=False, dtype=dtype, device=device)
self.w1k = operations.Linear(dim, dim, bias=False, dtype=dtype, device=device)
self.w1v = operations.Linear(dim, dim, bias=False, dtype=dtype, device=device)
self.w1o = operations.Linear(dim, dim, bias=False, dtype=dtype, device=device)
self.q_norm1 = (
MultiHeadLayerNorm((self.n_heads, self.head_dim), dtype=dtype, device=device)
if mh_qknorm
else operations.LayerNorm(self.head_dim, elementwise_affine=False, dtype=dtype, device=device)
)
self.k_norm1 = (
MultiHeadLayerNorm((self.n_heads, self.head_dim), dtype=dtype, device=device)
if mh_qknorm
else operations.LayerNorm(self.head_dim, elementwise_affine=False, dtype=dtype, device=device)
)
#@torch.compile()
def forward(self, c):
bsz, seqlen1, _ = c.shape
q, k, v = self.w1q(c), self.w1k(c), self.w1v(c)
q = q.view(bsz, seqlen1, self.n_heads, self.head_dim)
k = k.view(bsz, seqlen1, self.n_heads, self.head_dim)
v = v.view(bsz, seqlen1, self.n_heads, self.head_dim)
q, k = self.q_norm1(q), self.k_norm1(k)
output = optimized_attention(q.permute(0, 2, 1, 3), k.permute(0, 2, 1, 3), v.permute(0, 2, 1, 3), self.n_heads, skip_reshape=True)
c = self.w1o(output)
return c
class DoubleAttention(nn.Module):
def __init__(self, dim, n_heads, mh_qknorm=False, dtype=None, device=None, operations=None):
super().__init__()
self.n_heads = n_heads
self.head_dim = dim // n_heads
# this is for cond
self.w1q = operations.Linear(dim, dim, bias=False, dtype=dtype, device=device)
self.w1k = operations.Linear(dim, dim, bias=False, dtype=dtype, device=device)
self.w1v = operations.Linear(dim, dim, bias=False, dtype=dtype, device=device)
self.w1o = operations.Linear(dim, dim, bias=False, dtype=dtype, device=device)
# this is for x
self.w2q = operations.Linear(dim, dim, bias=False, dtype=dtype, device=device)
self.w2k = operations.Linear(dim, dim, bias=False, dtype=dtype, device=device)
self.w2v = operations.Linear(dim, dim, bias=False, dtype=dtype, device=device)
self.w2o = operations.Linear(dim, dim, bias=False, dtype=dtype, device=device)
self.q_norm1 = (
MultiHeadLayerNorm((self.n_heads, self.head_dim), dtype=dtype, device=device)
if mh_qknorm
else operations.LayerNorm(self.head_dim, elementwise_affine=False, dtype=dtype, device=device)
)
self.k_norm1 = (
MultiHeadLayerNorm((self.n_heads, self.head_dim), dtype=dtype, device=device)
if mh_qknorm
else operations.LayerNorm(self.head_dim, elementwise_affine=False, dtype=dtype, device=device)
)
self.q_norm2 = (
MultiHeadLayerNorm((self.n_heads, self.head_dim), dtype=dtype, device=device)
if mh_qknorm
else operations.LayerNorm(self.head_dim, elementwise_affine=False, dtype=dtype, device=device)
)
self.k_norm2 = (
MultiHeadLayerNorm((self.n_heads, self.head_dim), dtype=dtype, device=device)
if mh_qknorm
else operations.LayerNorm(self.head_dim, elementwise_affine=False, dtype=dtype, device=device)
)
#@torch.compile()
def forward(self, c, x):
bsz, seqlen1, _ = c.shape
bsz, seqlen2, _ = x.shape
seqlen = seqlen1 + seqlen2
cq, ck, cv = self.w1q(c), self.w1k(c), self.w1v(c)
cq = cq.view(bsz, seqlen1, self.n_heads, self.head_dim)
ck = ck.view(bsz, seqlen1, self.n_heads, self.head_dim)
cv = cv.view(bsz, seqlen1, self.n_heads, self.head_dim)
cq, ck = self.q_norm1(cq), self.k_norm1(ck)
xq, xk, xv = self.w2q(x), self.w2k(x), self.w2v(x)
xq = xq.view(bsz, seqlen2, self.n_heads, self.head_dim)
xk = xk.view(bsz, seqlen2, self.n_heads, self.head_dim)
xv = xv.view(bsz, seqlen2, self.n_heads, self.head_dim)
xq, xk = self.q_norm2(xq), self.k_norm2(xk)
# concat all
q, k, v = (
torch.cat([cq, xq], dim=1),
torch.cat([ck, xk], dim=1),
torch.cat([cv, xv], dim=1),
)
output = optimized_attention(q.permute(0, 2, 1, 3), k.permute(0, 2, 1, 3), v.permute(0, 2, 1, 3), self.n_heads, skip_reshape=True)
c, x = output.split([seqlen1, seqlen2], dim=1)
c = self.w1o(c)
x = self.w2o(x)
return c, x
class MMDiTBlock(nn.Module):
def __init__(self, dim, heads=8, global_conddim=1024, is_last=False, dtype=None, device=None, operations=None):
super().__init__()
self.normC1 = operations.LayerNorm(dim, elementwise_affine=False, dtype=dtype, device=device)
self.normC2 = operations.LayerNorm(dim, elementwise_affine=False, dtype=dtype, device=device)
if not is_last:
self.mlpC = MLP(dim, hidden_dim=dim * 4, dtype=dtype, device=device, operations=operations)
self.modC = nn.Sequential(
nn.SiLU(),
operations.Linear(global_conddim, 6 * dim, bias=False, dtype=dtype, device=device),
)
else:
self.modC = nn.Sequential(
nn.SiLU(),
operations.Linear(global_conddim, 2 * dim, bias=False, dtype=dtype, device=device),
)
self.normX1 = operations.LayerNorm(dim, elementwise_affine=False, dtype=dtype, device=device)
self.normX2 = operations.LayerNorm(dim, elementwise_affine=False, dtype=dtype, device=device)
self.mlpX = MLP(dim, hidden_dim=dim * 4, dtype=dtype, device=device, operations=operations)
self.modX = nn.Sequential(
nn.SiLU(),
operations.Linear(global_conddim, 6 * dim, bias=False, dtype=dtype, device=device),
)
self.attn = DoubleAttention(dim, heads, dtype=dtype, device=device, operations=operations)
self.is_last = is_last
#@torch.compile()
def forward(self, c, x, global_cond, **kwargs):
cres, xres = c, x
cshift_msa, cscale_msa, cgate_msa, cshift_mlp, cscale_mlp, cgate_mlp = (
self.modC(global_cond).chunk(6, dim=1)
)
c = modulate(self.normC1(c), cshift_msa, cscale_msa)
# xpath
xshift_msa, xscale_msa, xgate_msa, xshift_mlp, xscale_mlp, xgate_mlp = (
self.modX(global_cond).chunk(6, dim=1)
)
x = modulate(self.normX1(x), xshift_msa, xscale_msa)
# attention
c, x = self.attn(c, x)
c = self.normC2(cres + cgate_msa.unsqueeze(1) * c)
c = cgate_mlp.unsqueeze(1) * self.mlpC(modulate(c, cshift_mlp, cscale_mlp))
c = cres + c
x = self.normX2(xres + xgate_msa.unsqueeze(1) * x)
x = xgate_mlp.unsqueeze(1) * self.mlpX(modulate(x, xshift_mlp, xscale_mlp))
x = xres + x
return c, x
class DiTBlock(nn.Module):
# like MMDiTBlock, but it only has X
def __init__(self, dim, heads=8, global_conddim=1024, dtype=None, device=None, operations=None):
super().__init__()
self.norm1 = operations.LayerNorm(dim, elementwise_affine=False, dtype=dtype, device=device)
self.norm2 = operations.LayerNorm(dim, elementwise_affine=False, dtype=dtype, device=device)
self.modCX = nn.Sequential(
nn.SiLU(),
operations.Linear(global_conddim, 6 * dim, bias=False, dtype=dtype, device=device),
)
self.attn = SingleAttention(dim, heads, dtype=dtype, device=device, operations=operations)
self.mlp = MLP(dim, hidden_dim=dim * 4, dtype=dtype, device=device, operations=operations)
#@torch.compile()
def forward(self, cx, global_cond, **kwargs):
cxres = cx
shift_msa, scale_msa, gate_msa, shift_mlp, scale_mlp, gate_mlp = self.modCX(
global_cond
).chunk(6, dim=1)
cx = modulate(self.norm1(cx), shift_msa, scale_msa)
cx = self.attn(cx)
cx = self.norm2(cxres + gate_msa.unsqueeze(1) * cx)
mlpout = self.mlp(modulate(cx, shift_mlp, scale_mlp))
cx = gate_mlp.unsqueeze(1) * mlpout
cx = cxres + cx
return cx
class TimestepEmbedder(nn.Module):
def __init__(self, hidden_size, frequency_embedding_size=256, dtype=None, device=None, operations=None):
super().__init__()
self.mlp = nn.Sequential(
operations.Linear(frequency_embedding_size, hidden_size, dtype=dtype, device=device),
nn.SiLU(),
operations.Linear(hidden_size, hidden_size, dtype=dtype, device=device),
)
self.frequency_embedding_size = frequency_embedding_size
@staticmethod
def timestep_embedding(t, dim, max_period=10000):
half = dim // 2
freqs = 1000 * torch.exp(
-math.log(max_period) * torch.arange(start=0, end=half) / half
).to(t.device)
args = t[:, None] * freqs[None]
embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)
if dim % 2:
embedding = torch.cat(
[embedding, torch.zeros_like(embedding[:, :1])], dim=-1
)
return embedding
#@torch.compile()
def forward(self, t, dtype):
t_freq = self.timestep_embedding(t, self.frequency_embedding_size).to(dtype)
t_emb = self.mlp(t_freq)
return t_emb
class MMDiT(nn.Module):
def __init__(
self,
in_channels=4,
out_channels=4,
patch_size=2,
dim=3072,
n_layers=36,
n_double_layers=4,
n_heads=12,
global_conddim=3072,
cond_seq_dim=2048,
max_seq=32 * 32,
device=None,
dtype=None,
operations=None,
):
super().__init__()
self.dtype = dtype
self.t_embedder = TimestepEmbedder(global_conddim, dtype=dtype, device=device, operations=operations)
self.cond_seq_linear = operations.Linear(
cond_seq_dim, dim, bias=False, dtype=dtype, device=device
) # linear for something like text sequence.
self.init_x_linear = operations.Linear(
patch_size * patch_size * in_channels, dim, dtype=dtype, device=device
) # init linear for patchified image.
self.positional_encoding = nn.Parameter(torch.empty(1, max_seq, dim, dtype=dtype, device=device))
self.register_tokens = nn.Parameter(torch.empty(1, 8, dim, dtype=dtype, device=device))
self.double_layers = nn.ModuleList([])
self.single_layers = nn.ModuleList([])
for idx in range(n_double_layers):
self.double_layers.append(
MMDiTBlock(dim, n_heads, global_conddim, is_last=(idx == n_layers - 1), dtype=dtype, device=device, operations=operations)
)
for idx in range(n_double_layers, n_layers):
self.single_layers.append(
DiTBlock(dim, n_heads, global_conddim, dtype=dtype, device=device, operations=operations)
)
self.final_linear = operations.Linear(
dim, patch_size * patch_size * out_channels, bias=False, dtype=dtype, device=device
)
self.modF = nn.Sequential(
nn.SiLU(),
operations.Linear(global_conddim, 2 * dim, bias=False, dtype=dtype, device=device),
)
self.out_channels = out_channels
self.patch_size = patch_size
self.n_double_layers = n_double_layers
self.n_layers = n_layers
self.h_max = round(max_seq**0.5)
self.w_max = round(max_seq**0.5)
@torch.no_grad()
def extend_pe(self, init_dim=(16, 16), target_dim=(64, 64)):
# extend pe
pe_data = self.positional_encoding.data.squeeze(0)[: init_dim[0] * init_dim[1]]
pe_as_2d = pe_data.view(init_dim[0], init_dim[1], -1).permute(2, 0, 1)
# now we need to extend this to target_dim. for this we will use interpolation.
# we will use torch.nn.functional.interpolate
pe_as_2d = F.interpolate(
pe_as_2d.unsqueeze(0), size=target_dim, mode="bilinear"
)
pe_new = pe_as_2d.squeeze(0).permute(1, 2, 0).flatten(0, 1)
self.positional_encoding.data = pe_new.unsqueeze(0).contiguous()
self.h_max, self.w_max = target_dim
print("PE extended to", target_dim)
def pe_selection_index_based_on_dim(self, h, w):
h_p, w_p = h // self.patch_size, w // self.patch_size
original_pe_indexes = torch.arange(self.positional_encoding.shape[1])
original_pe_indexes = original_pe_indexes.view(self.h_max, self.w_max)
starth = self.h_max // 2 - h_p // 2
endh =starth + h_p
startw = self.w_max // 2 - w_p // 2
endw = startw + w_p
original_pe_indexes = original_pe_indexes[
starth:endh, startw:endw
]
return original_pe_indexes.flatten()
def unpatchify(self, x, h, w):
c = self.out_channels
p = self.patch_size
x = x.reshape(shape=(x.shape[0], h, w, p, p, c))
x = torch.einsum("nhwpqc->nchpwq", x)
imgs = x.reshape(shape=(x.shape[0], c, h * p, w * p))
return imgs
def patchify(self, x):
B, C, H, W = x.size()
x = comfy.ldm.common_dit.pad_to_patch_size(x, (self.patch_size, self.patch_size))
x = x.view(
B,
C,
(H + 1) // self.patch_size,
self.patch_size,
(W + 1) // self.patch_size,
self.patch_size,
)
x = x.permute(0, 2, 4, 1, 3, 5).flatten(-3).flatten(1, 2)
return x
def apply_pos_embeds(self, x, h, w):
h = (h + 1) // self.patch_size
w = (w + 1) // self.patch_size
max_dim = max(h, w)
cur_dim = self.h_max
pos_encoding = comfy.ops.cast_to_input(self.positional_encoding.reshape(1, cur_dim, cur_dim, -1), x)
if max_dim > cur_dim:
pos_encoding = F.interpolate(pos_encoding.movedim(-1, 1), (max_dim, max_dim), mode="bilinear").movedim(1, -1)
cur_dim = max_dim
from_h = (cur_dim - h) // 2
from_w = (cur_dim - w) // 2
pos_encoding = pos_encoding[:,from_h:from_h+h,from_w:from_w+w]
return x + pos_encoding.reshape(1, -1, self.positional_encoding.shape[-1])
def forward(self, x, timestep, context, **kwargs):
# patchify x, add PE
b, c, h, w = x.shape
# pe_indexes = self.pe_selection_index_based_on_dim(h, w)
# print(pe_indexes, pe_indexes.shape)
x = self.init_x_linear(self.patchify(x)) # B, T_x, D
x = self.apply_pos_embeds(x, h, w)
# x = x + self.positional_encoding[:, : x.size(1)].to(device=x.device, dtype=x.dtype)
# x = x + self.positional_encoding[:, pe_indexes].to(device=x.device, dtype=x.dtype)
# process conditions for MMDiT Blocks
c_seq = context # B, T_c, D_c
t = timestep
c = self.cond_seq_linear(c_seq) # B, T_c, D
c = torch.cat([comfy.ops.cast_to_input(self.register_tokens, c).repeat(c.size(0), 1, 1), c], dim=1)
global_cond = self.t_embedder(t, x.dtype) # B, D
if len(self.double_layers) > 0:
for layer in self.double_layers:
c, x = layer(c, x, global_cond, **kwargs)
if len(self.single_layers) > 0:
c_len = c.size(1)
cx = torch.cat([c, x], dim=1)
for layer in self.single_layers:
cx = layer(cx, global_cond, **kwargs)
x = cx[:, c_len:]
fshift, fscale = self.modF(global_cond).chunk(2, dim=1)
x = modulate(x, fshift, fscale)
x = self.final_linear(x)
x = self.unpatchify(x, (h + 1) // self.patch_size, (w + 1) // self.patch_size)[:,:,:h,:w]
return x

154
comfy/ldm/cascade/common.py Normal file
View File

@ -0,0 +1,154 @@
"""
This file is part of ComfyUI.
Copyright (C) 2024 Stability AI
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
import torch
import torch.nn as nn
from comfy.ldm.modules.attention import optimized_attention
import comfy.ops
class OptimizedAttention(nn.Module):
def __init__(self, c, nhead, dropout=0.0, dtype=None, device=None, operations=None):
super().__init__()
self.heads = nhead
self.to_q = operations.Linear(c, c, bias=True, dtype=dtype, device=device)
self.to_k = operations.Linear(c, c, bias=True, dtype=dtype, device=device)
self.to_v = operations.Linear(c, c, bias=True, dtype=dtype, device=device)
self.out_proj = operations.Linear(c, c, bias=True, dtype=dtype, device=device)
def forward(self, q, k, v):
q = self.to_q(q)
k = self.to_k(k)
v = self.to_v(v)
out = optimized_attention(q, k, v, self.heads)
return self.out_proj(out)
class Attention2D(nn.Module):
def __init__(self, c, nhead, dropout=0.0, dtype=None, device=None, operations=None):
super().__init__()
self.attn = OptimizedAttention(c, nhead, dtype=dtype, device=device, operations=operations)
# self.attn = nn.MultiheadAttention(c, nhead, dropout=dropout, bias=True, batch_first=True, dtype=dtype, device=device)
def forward(self, x, kv, self_attn=False):
orig_shape = x.shape
x = x.view(x.size(0), x.size(1), -1).permute(0, 2, 1) # Bx4xHxW -> Bx(HxW)x4
if self_attn:
kv = torch.cat([x, kv], dim=1)
# x = self.attn(x, kv, kv, need_weights=False)[0]
x = self.attn(x, kv, kv)
x = x.permute(0, 2, 1).view(*orig_shape)
return x
def LayerNorm2d_op(operations):
class LayerNorm2d(operations.LayerNorm):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
def forward(self, x):
return super().forward(x.permute(0, 2, 3, 1)).permute(0, 3, 1, 2)
return LayerNorm2d
class GlobalResponseNorm(nn.Module):
"from https://github.com/facebookresearch/ConvNeXt-V2/blob/3608f67cc1dae164790c5d0aead7bf2d73d9719b/models/utils.py#L105"
def __init__(self, dim, dtype=None, device=None):
super().__init__()
self.gamma = nn.Parameter(torch.empty(1, 1, 1, dim, dtype=dtype, device=device))
self.beta = nn.Parameter(torch.empty(1, 1, 1, dim, dtype=dtype, device=device))
def forward(self, x):
Gx = torch.norm(x, p=2, dim=(1, 2), keepdim=True)
Nx = Gx / (Gx.mean(dim=-1, keepdim=True) + 1e-6)
return comfy.ops.cast_to_input(self.gamma, x) * (x * Nx) + comfy.ops.cast_to_input(self.beta, x) + x
class ResBlock(nn.Module):
def __init__(self, c, c_skip=0, kernel_size=3, dropout=0.0, dtype=None, device=None, operations=None): # , num_heads=4, expansion=2):
super().__init__()
self.depthwise = operations.Conv2d(c, c, kernel_size=kernel_size, padding=kernel_size // 2, groups=c, dtype=dtype, device=device)
# self.depthwise = SAMBlock(c, num_heads, expansion)
self.norm = LayerNorm2d_op(operations)(c, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
self.channelwise = nn.Sequential(
operations.Linear(c + c_skip, c * 4, dtype=dtype, device=device),
nn.GELU(),
GlobalResponseNorm(c * 4, dtype=dtype, device=device),
nn.Dropout(dropout),
operations.Linear(c * 4, c, dtype=dtype, device=device)
)
def forward(self, x, x_skip=None):
x_res = x
x = self.norm(self.depthwise(x))
if x_skip is not None:
x = torch.cat([x, x_skip], dim=1)
x = self.channelwise(x.permute(0, 2, 3, 1)).permute(0, 3, 1, 2)
return x + x_res
class AttnBlock(nn.Module):
def __init__(self, c, c_cond, nhead, self_attn=True, dropout=0.0, dtype=None, device=None, operations=None):
super().__init__()
self.self_attn = self_attn
self.norm = LayerNorm2d_op(operations)(c, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
self.attention = Attention2D(c, nhead, dropout, dtype=dtype, device=device, operations=operations)
self.kv_mapper = nn.Sequential(
nn.SiLU(),
operations.Linear(c_cond, c, dtype=dtype, device=device)
)
def forward(self, x, kv):
kv = self.kv_mapper(kv)
x = x + self.attention(self.norm(x), kv, self_attn=self.self_attn)
return x
class FeedForwardBlock(nn.Module):
def __init__(self, c, dropout=0.0, dtype=None, device=None, operations=None):
super().__init__()
self.norm = LayerNorm2d_op(operations)(c, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
self.channelwise = nn.Sequential(
operations.Linear(c, c * 4, dtype=dtype, device=device),
nn.GELU(),
GlobalResponseNorm(c * 4, dtype=dtype, device=device),
nn.Dropout(dropout),
operations.Linear(c * 4, c, dtype=dtype, device=device)
)
def forward(self, x):
x = x + self.channelwise(self.norm(x).permute(0, 2, 3, 1)).permute(0, 3, 1, 2)
return x
class TimestepBlock(nn.Module):
def __init__(self, c, c_timestep, conds=['sca'], dtype=None, device=None, operations=None):
super().__init__()
self.mapper = operations.Linear(c_timestep, c * 2, dtype=dtype, device=device)
self.conds = conds
for cname in conds:
setattr(self, f"mapper_{cname}", operations.Linear(c_timestep, c * 2, dtype=dtype, device=device))
def forward(self, x, t):
t = t.chunk(len(self.conds) + 1, dim=1)
a, b = self.mapper(t[0])[:, :, None, None].chunk(2, dim=1)
for i, c in enumerate(self.conds):
ac, bc = getattr(self, f"mapper_{c}")(t[i + 1])[:, :, None, None].chunk(2, dim=1)
a, b = a + ac, b + bc
return x * (1 + a) + b

View File

@ -0,0 +1,93 @@
"""
This file is part of ComfyUI.
Copyright (C) 2024 Stability AI
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
import torch
import torchvision
from torch import nn
from .common import LayerNorm2d_op
class CNetResBlock(nn.Module):
def __init__(self, c, dtype=None, device=None, operations=None):
super().__init__()
self.blocks = nn.Sequential(
LayerNorm2d_op(operations)(c, dtype=dtype, device=device),
nn.GELU(),
operations.Conv2d(c, c, kernel_size=3, padding=1),
LayerNorm2d_op(operations)(c, dtype=dtype, device=device),
nn.GELU(),
operations.Conv2d(c, c, kernel_size=3, padding=1),
)
def forward(self, x):
return x + self.blocks(x)
class ControlNet(nn.Module):
def __init__(self, c_in=3, c_proj=2048, proj_blocks=None, bottleneck_mode=None, dtype=None, device=None, operations=nn):
super().__init__()
if bottleneck_mode is None:
bottleneck_mode = 'effnet'
self.proj_blocks = proj_blocks
if bottleneck_mode == 'effnet':
embd_channels = 1280
self.backbone = torchvision.models.efficientnet_v2_s().features.eval()
if c_in != 3:
in_weights = self.backbone[0][0].weight.data
self.backbone[0][0] = operations.Conv2d(c_in, 24, kernel_size=3, stride=2, bias=False, dtype=dtype, device=device)
if c_in > 3:
# nn.init.constant_(self.backbone[0][0].weight, 0)
self.backbone[0][0].weight.data[:, :3] = in_weights[:, :3].clone()
else:
self.backbone[0][0].weight.data = in_weights[:, :c_in].clone()
elif bottleneck_mode == 'simple':
embd_channels = c_in
self.backbone = nn.Sequential(
operations.Conv2d(embd_channels, embd_channels * 4, kernel_size=3, padding=1, dtype=dtype, device=device),
nn.LeakyReLU(0.2, inplace=True),
operations.Conv2d(embd_channels * 4, embd_channels, kernel_size=3, padding=1, dtype=dtype, device=device),
)
elif bottleneck_mode == 'large':
self.backbone = nn.Sequential(
operations.Conv2d(c_in, 4096 * 4, kernel_size=1, dtype=dtype, device=device),
nn.LeakyReLU(0.2, inplace=True),
operations.Conv2d(4096 * 4, 1024, kernel_size=1, dtype=dtype, device=device),
*[CNetResBlock(1024, dtype=dtype, device=device, operations=operations) for _ in range(8)],
operations.Conv2d(1024, 1280, kernel_size=1, dtype=dtype, device=device),
)
embd_channels = 1280
else:
raise ValueError(f'Unknown bottleneck mode: {bottleneck_mode}')
self.projections = nn.ModuleList()
for _ in range(len(proj_blocks)):
self.projections.append(nn.Sequential(
operations.Conv2d(embd_channels, embd_channels, kernel_size=1, bias=False, dtype=dtype, device=device),
nn.LeakyReLU(0.2, inplace=True),
operations.Conv2d(embd_channels, c_proj, kernel_size=1, bias=False, dtype=dtype, device=device),
))
# nn.init.constant_(self.projections[-1][-1].weight, 0) # zero output projection
self.xl = False
self.input_channels = c_in
self.unshuffle_amount = 8
def forward(self, x):
x = self.backbone(x)
proj_outputs = [None for _ in range(max(self.proj_blocks) + 1)]
for i, idx in enumerate(self.proj_blocks):
proj_outputs[idx] = self.projections[i](x)
return {"input": proj_outputs[::-1]}

View File

@ -0,0 +1,255 @@
"""
This file is part of ComfyUI.
Copyright (C) 2024 Stability AI
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
import torch
from torch import nn
from torch.autograd import Function
class vector_quantize(Function):
@staticmethod
def forward(ctx, x, codebook):
with torch.no_grad():
codebook_sqr = torch.sum(codebook ** 2, dim=1)
x_sqr = torch.sum(x ** 2, dim=1, keepdim=True)
dist = torch.addmm(codebook_sqr + x_sqr, x, codebook.t(), alpha=-2.0, beta=1.0)
_, indices = dist.min(dim=1)
ctx.save_for_backward(indices, codebook)
ctx.mark_non_differentiable(indices)
nn = torch.index_select(codebook, 0, indices)
return nn, indices
@staticmethod
def backward(ctx, grad_output, grad_indices):
grad_inputs, grad_codebook = None, None
if ctx.needs_input_grad[0]:
grad_inputs = grad_output.clone()
if ctx.needs_input_grad[1]:
# Gradient wrt. the codebook
indices, codebook = ctx.saved_tensors
grad_codebook = torch.zeros_like(codebook)
grad_codebook.index_add_(0, indices, grad_output)
return (grad_inputs, grad_codebook)
class VectorQuantize(nn.Module):
def __init__(self, embedding_size, k, ema_decay=0.99, ema_loss=False):
"""
Takes an input of variable size (as long as the last dimension matches the embedding size).
Returns one tensor containing the nearest neigbour embeddings to each of the inputs,
with the same size as the input, vq and commitment components for the loss as a touple
in the second output and the indices of the quantized vectors in the third:
quantized, (vq_loss, commit_loss), indices
"""
super(VectorQuantize, self).__init__()
self.codebook = nn.Embedding(k, embedding_size)
self.codebook.weight.data.uniform_(-1./k, 1./k)
self.vq = vector_quantize.apply
self.ema_decay = ema_decay
self.ema_loss = ema_loss
if ema_loss:
self.register_buffer('ema_element_count', torch.ones(k))
self.register_buffer('ema_weight_sum', torch.zeros_like(self.codebook.weight))
def _laplace_smoothing(self, x, epsilon):
n = torch.sum(x)
return ((x + epsilon) / (n + x.size(0) * epsilon) * n)
def _updateEMA(self, z_e_x, indices):
mask = nn.functional.one_hot(indices, self.ema_element_count.size(0)).float()
elem_count = mask.sum(dim=0)
weight_sum = torch.mm(mask.t(), z_e_x)
self.ema_element_count = (self.ema_decay * self.ema_element_count) + ((1-self.ema_decay) * elem_count)
self.ema_element_count = self._laplace_smoothing(self.ema_element_count, 1e-5)
self.ema_weight_sum = (self.ema_decay * self.ema_weight_sum) + ((1-self.ema_decay) * weight_sum)
self.codebook.weight.data = self.ema_weight_sum / self.ema_element_count.unsqueeze(-1)
def idx2vq(self, idx, dim=-1):
q_idx = self.codebook(idx)
if dim != -1:
q_idx = q_idx.movedim(-1, dim)
return q_idx
def forward(self, x, get_losses=True, dim=-1):
if dim != -1:
x = x.movedim(dim, -1)
z_e_x = x.contiguous().view(-1, x.size(-1)) if len(x.shape) > 2 else x
z_q_x, indices = self.vq(z_e_x, self.codebook.weight.detach())
vq_loss, commit_loss = None, None
if self.ema_loss and self.training:
self._updateEMA(z_e_x.detach(), indices.detach())
# pick the graded embeddings after updating the codebook in order to have a more accurate commitment loss
z_q_x_grd = torch.index_select(self.codebook.weight, dim=0, index=indices)
if get_losses:
vq_loss = (z_q_x_grd - z_e_x.detach()).pow(2).mean()
commit_loss = (z_e_x - z_q_x_grd.detach()).pow(2).mean()
z_q_x = z_q_x.view(x.shape)
if dim != -1:
z_q_x = z_q_x.movedim(-1, dim)
return z_q_x, (vq_loss, commit_loss), indices.view(x.shape[:-1])
class ResBlock(nn.Module):
def __init__(self, c, c_hidden):
super().__init__()
# depthwise/attention
self.norm1 = nn.LayerNorm(c, elementwise_affine=False, eps=1e-6)
self.depthwise = nn.Sequential(
nn.ReplicationPad2d(1),
nn.Conv2d(c, c, kernel_size=3, groups=c)
)
# channelwise
self.norm2 = nn.LayerNorm(c, elementwise_affine=False, eps=1e-6)
self.channelwise = nn.Sequential(
nn.Linear(c, c_hidden),
nn.GELU(),
nn.Linear(c_hidden, c),
)
self.gammas = nn.Parameter(torch.zeros(6), requires_grad=True)
# Init weights
def _basic_init(module):
if isinstance(module, nn.Linear) or isinstance(module, nn.Conv2d):
torch.nn.init.xavier_uniform_(module.weight)
if module.bias is not None:
nn.init.constant_(module.bias, 0)
self.apply(_basic_init)
def _norm(self, x, norm):
return norm(x.permute(0, 2, 3, 1)).permute(0, 3, 1, 2)
def forward(self, x):
mods = self.gammas
x_temp = self._norm(x, self.norm1) * (1 + mods[0]) + mods[1]
try:
x = x + self.depthwise(x_temp) * mods[2]
except: #operation not implemented for bf16
x_temp = self.depthwise[0](x_temp.float()).to(x.dtype)
x = x + self.depthwise[1](x_temp) * mods[2]
x_temp = self._norm(x, self.norm2) * (1 + mods[3]) + mods[4]
x = x + self.channelwise(x_temp.permute(0, 2, 3, 1)).permute(0, 3, 1, 2) * mods[5]
return x
class StageA(nn.Module):
def __init__(self, levels=2, bottleneck_blocks=12, c_hidden=384, c_latent=4, codebook_size=8192):
super().__init__()
self.c_latent = c_latent
c_levels = [c_hidden // (2 ** i) for i in reversed(range(levels))]
# Encoder blocks
self.in_block = nn.Sequential(
nn.PixelUnshuffle(2),
nn.Conv2d(3 * 4, c_levels[0], kernel_size=1)
)
down_blocks = []
for i in range(levels):
if i > 0:
down_blocks.append(nn.Conv2d(c_levels[i - 1], c_levels[i], kernel_size=4, stride=2, padding=1))
block = ResBlock(c_levels[i], c_levels[i] * 4)
down_blocks.append(block)
down_blocks.append(nn.Sequential(
nn.Conv2d(c_levels[-1], c_latent, kernel_size=1, bias=False),
nn.BatchNorm2d(c_latent), # then normalize them to have mean 0 and std 1
))
self.down_blocks = nn.Sequential(*down_blocks)
self.down_blocks[0]
self.codebook_size = codebook_size
self.vquantizer = VectorQuantize(c_latent, k=codebook_size)
# Decoder blocks
up_blocks = [nn.Sequential(
nn.Conv2d(c_latent, c_levels[-1], kernel_size=1)
)]
for i in range(levels):
for j in range(bottleneck_blocks if i == 0 else 1):
block = ResBlock(c_levels[levels - 1 - i], c_levels[levels - 1 - i] * 4)
up_blocks.append(block)
if i < levels - 1:
up_blocks.append(
nn.ConvTranspose2d(c_levels[levels - 1 - i], c_levels[levels - 2 - i], kernel_size=4, stride=2,
padding=1))
self.up_blocks = nn.Sequential(*up_blocks)
self.out_block = nn.Sequential(
nn.Conv2d(c_levels[0], 3 * 4, kernel_size=1),
nn.PixelShuffle(2),
)
def encode(self, x, quantize=False):
x = self.in_block(x)
x = self.down_blocks(x)
if quantize:
qe, (vq_loss, commit_loss), indices = self.vquantizer.forward(x, dim=1)
return qe, x, indices, vq_loss + commit_loss * 0.25
else:
return x
def decode(self, x):
x = self.up_blocks(x)
x = self.out_block(x)
return x
def forward(self, x, quantize=False):
qe, x, _, vq_loss = self.encode(x, quantize)
x = self.decode(qe)
return x, vq_loss
class Discriminator(nn.Module):
def __init__(self, c_in=3, c_cond=0, c_hidden=512, depth=6):
super().__init__()
d = max(depth - 3, 3)
layers = [
nn.utils.spectral_norm(nn.Conv2d(c_in, c_hidden // (2 ** d), kernel_size=3, stride=2, padding=1)),
nn.LeakyReLU(0.2),
]
for i in range(depth - 1):
c_in = c_hidden // (2 ** max((d - i), 0))
c_out = c_hidden // (2 ** max((d - 1 - i), 0))
layers.append(nn.utils.spectral_norm(nn.Conv2d(c_in, c_out, kernel_size=3, stride=2, padding=1)))
layers.append(nn.InstanceNorm2d(c_out))
layers.append(nn.LeakyReLU(0.2))
self.encoder = nn.Sequential(*layers)
self.shuffle = nn.Conv2d((c_hidden + c_cond) if c_cond > 0 else c_hidden, 1, kernel_size=1)
self.logits = nn.Sigmoid()
def forward(self, x, cond=None):
x = self.encoder(x)
if cond is not None:
cond = cond.view(cond.size(0), cond.size(1), 1, 1, ).expand(-1, -1, x.size(-2), x.size(-1))
x = torch.cat([x, cond], dim=1)
x = self.shuffle(x)
x = self.logits(x)
return x

View File

@ -0,0 +1,256 @@
"""
This file is part of ComfyUI.
Copyright (C) 2024 Stability AI
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
import math
import torch
from torch import nn
from .common import AttnBlock, LayerNorm2d_op, ResBlock, FeedForwardBlock, TimestepBlock
class StageB(nn.Module):
def __init__(self, c_in=4, c_out=4, c_r=64, patch_size=2, c_cond=1280, c_hidden=[320, 640, 1280, 1280],
nhead=[-1, -1, 20, 20], blocks=[[2, 6, 28, 6], [6, 28, 6, 2]],
block_repeat=[[1, 1, 1, 1], [3, 3, 2, 2]], level_config=['CT', 'CT', 'CTA', 'CTA'], c_clip=1280,
c_clip_seq=4, c_effnet=16, c_pixels=3, kernel_size=3, dropout=[0, 0, 0.0, 0.0], self_attn=True,
t_conds=['sca'], stable_cascade_stage=None, dtype=None, device=None, operations=None):
super().__init__()
self.dtype = dtype
self.c_r = c_r
self.t_conds = t_conds
self.c_clip_seq = c_clip_seq
if not isinstance(dropout, list):
dropout = [dropout] * len(c_hidden)
if not isinstance(self_attn, list):
self_attn = [self_attn] * len(c_hidden)
# CONDITIONING
self.effnet_mapper = nn.Sequential(
operations.Conv2d(c_effnet, c_hidden[0] * 4, kernel_size=1, dtype=dtype, device=device),
nn.GELU(),
operations.Conv2d(c_hidden[0] * 4, c_hidden[0], kernel_size=1, dtype=dtype, device=device),
LayerNorm2d_op(operations)(c_hidden[0], elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
)
self.pixels_mapper = nn.Sequential(
operations.Conv2d(c_pixels, c_hidden[0] * 4, kernel_size=1, dtype=dtype, device=device),
nn.GELU(),
operations.Conv2d(c_hidden[0] * 4, c_hidden[0], kernel_size=1, dtype=dtype, device=device),
LayerNorm2d_op(operations)(c_hidden[0], elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
)
self.clip_mapper = operations.Linear(c_clip, c_cond * c_clip_seq, dtype=dtype, device=device)
self.clip_norm = operations.LayerNorm(c_cond, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
self.embedding = nn.Sequential(
nn.PixelUnshuffle(patch_size),
operations.Conv2d(c_in * (patch_size ** 2), c_hidden[0], kernel_size=1, dtype=dtype, device=device),
LayerNorm2d_op(operations)(c_hidden[0], elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
)
def get_block(block_type, c_hidden, nhead, c_skip=0, dropout=0, self_attn=True):
if block_type == 'C':
return ResBlock(c_hidden, c_skip, kernel_size=kernel_size, dropout=dropout, dtype=dtype, device=device, operations=operations)
elif block_type == 'A':
return AttnBlock(c_hidden, c_cond, nhead, self_attn=self_attn, dropout=dropout, dtype=dtype, device=device, operations=operations)
elif block_type == 'F':
return FeedForwardBlock(c_hidden, dropout=dropout, dtype=dtype, device=device, operations=operations)
elif block_type == 'T':
return TimestepBlock(c_hidden, c_r, conds=t_conds, dtype=dtype, device=device, operations=operations)
else:
raise Exception(f'Block type {block_type} not supported')
# BLOCKS
# -- down blocks
self.down_blocks = nn.ModuleList()
self.down_downscalers = nn.ModuleList()
self.down_repeat_mappers = nn.ModuleList()
for i in range(len(c_hidden)):
if i > 0:
self.down_downscalers.append(nn.Sequential(
LayerNorm2d_op(operations)(c_hidden[i - 1], elementwise_affine=False, eps=1e-6, dtype=dtype, device=device),
operations.Conv2d(c_hidden[i - 1], c_hidden[i], kernel_size=2, stride=2, dtype=dtype, device=device),
))
else:
self.down_downscalers.append(nn.Identity())
down_block = nn.ModuleList()
for _ in range(blocks[0][i]):
for block_type in level_config[i]:
block = get_block(block_type, c_hidden[i], nhead[i], dropout=dropout[i], self_attn=self_attn[i])
down_block.append(block)
self.down_blocks.append(down_block)
if block_repeat is not None:
block_repeat_mappers = nn.ModuleList()
for _ in range(block_repeat[0][i] - 1):
block_repeat_mappers.append(operations.Conv2d(c_hidden[i], c_hidden[i], kernel_size=1, dtype=dtype, device=device))
self.down_repeat_mappers.append(block_repeat_mappers)
# -- up blocks
self.up_blocks = nn.ModuleList()
self.up_upscalers = nn.ModuleList()
self.up_repeat_mappers = nn.ModuleList()
for i in reversed(range(len(c_hidden))):
if i > 0:
self.up_upscalers.append(nn.Sequential(
LayerNorm2d_op(operations)(c_hidden[i], elementwise_affine=False, eps=1e-6, dtype=dtype, device=device),
operations.ConvTranspose2d(c_hidden[i], c_hidden[i - 1], kernel_size=2, stride=2, dtype=dtype, device=device),
))
else:
self.up_upscalers.append(nn.Identity())
up_block = nn.ModuleList()
for j in range(blocks[1][::-1][i]):
for k, block_type in enumerate(level_config[i]):
c_skip = c_hidden[i] if i < len(c_hidden) - 1 and j == k == 0 else 0
block = get_block(block_type, c_hidden[i], nhead[i], c_skip=c_skip, dropout=dropout[i],
self_attn=self_attn[i])
up_block.append(block)
self.up_blocks.append(up_block)
if block_repeat is not None:
block_repeat_mappers = nn.ModuleList()
for _ in range(block_repeat[1][::-1][i] - 1):
block_repeat_mappers.append(operations.Conv2d(c_hidden[i], c_hidden[i], kernel_size=1, dtype=dtype, device=device))
self.up_repeat_mappers.append(block_repeat_mappers)
# OUTPUT
self.clf = nn.Sequential(
LayerNorm2d_op(operations)(c_hidden[0], elementwise_affine=False, eps=1e-6, dtype=dtype, device=device),
operations.Conv2d(c_hidden[0], c_out * (patch_size ** 2), kernel_size=1, dtype=dtype, device=device),
nn.PixelShuffle(patch_size),
)
# --- WEIGHT INIT ---
# self.apply(self._init_weights) # General init
# nn.init.normal_(self.clip_mapper.weight, std=0.02) # conditionings
# nn.init.normal_(self.effnet_mapper[0].weight, std=0.02) # conditionings
# nn.init.normal_(self.effnet_mapper[2].weight, std=0.02) # conditionings
# nn.init.normal_(self.pixels_mapper[0].weight, std=0.02) # conditionings
# nn.init.normal_(self.pixels_mapper[2].weight, std=0.02) # conditionings
# torch.nn.init.xavier_uniform_(self.embedding[1].weight, 0.02) # inputs
# nn.init.constant_(self.clf[1].weight, 0) # outputs
#
# # blocks
# for level_block in self.down_blocks + self.up_blocks:
# for block in level_block:
# if isinstance(block, ResBlock) or isinstance(block, FeedForwardBlock):
# block.channelwise[-1].weight.data *= np.sqrt(1 / sum(blocks[0]))
# elif isinstance(block, TimestepBlock):
# for layer in block.modules():
# if isinstance(layer, nn.Linear):
# nn.init.constant_(layer.weight, 0)
#
# def _init_weights(self, m):
# if isinstance(m, (nn.Conv2d, nn.Linear)):
# torch.nn.init.xavier_uniform_(m.weight)
# if m.bias is not None:
# nn.init.constant_(m.bias, 0)
def gen_r_embedding(self, r, max_positions=10000):
r = r * max_positions
half_dim = self.c_r // 2
emb = math.log(max_positions) / (half_dim - 1)
emb = torch.arange(half_dim, device=r.device).float().mul(-emb).exp()
emb = r[:, None] * emb[None, :]
emb = torch.cat([emb.sin(), emb.cos()], dim=1)
if self.c_r % 2 == 1: # zero pad
emb = nn.functional.pad(emb, (0, 1), mode='constant')
return emb
def gen_c_embeddings(self, clip):
if len(clip.shape) == 2:
clip = clip.unsqueeze(1)
clip = self.clip_mapper(clip).view(clip.size(0), clip.size(1) * self.c_clip_seq, -1)
clip = self.clip_norm(clip)
return clip
def _down_encode(self, x, r_embed, clip):
level_outputs = []
block_group = zip(self.down_blocks, self.down_downscalers, self.down_repeat_mappers)
for down_block, downscaler, repmap in block_group:
x = downscaler(x)
for i in range(len(repmap) + 1):
for block in down_block:
if isinstance(block, ResBlock) or (
hasattr(block, '_fsdp_wrapped_module') and isinstance(block._fsdp_wrapped_module,
ResBlock)):
x = block(x)
elif isinstance(block, AttnBlock) or (
hasattr(block, '_fsdp_wrapped_module') and isinstance(block._fsdp_wrapped_module,
AttnBlock)):
x = block(x, clip)
elif isinstance(block, TimestepBlock) or (
hasattr(block, '_fsdp_wrapped_module') and isinstance(block._fsdp_wrapped_module,
TimestepBlock)):
x = block(x, r_embed)
else:
x = block(x)
if i < len(repmap):
x = repmap[i](x)
level_outputs.insert(0, x)
return level_outputs
def _up_decode(self, level_outputs, r_embed, clip):
x = level_outputs[0]
block_group = zip(self.up_blocks, self.up_upscalers, self.up_repeat_mappers)
for i, (up_block, upscaler, repmap) in enumerate(block_group):
for j in range(len(repmap) + 1):
for k, block in enumerate(up_block):
if isinstance(block, ResBlock) or (
hasattr(block, '_fsdp_wrapped_module') and isinstance(block._fsdp_wrapped_module,
ResBlock)):
skip = level_outputs[i] if k == 0 and i > 0 else None
if skip is not None and (x.size(-1) != skip.size(-1) or x.size(-2) != skip.size(-2)):
x = torch.nn.functional.interpolate(x, skip.shape[-2:], mode='bilinear',
align_corners=True)
x = block(x, skip)
elif isinstance(block, AttnBlock) or (
hasattr(block, '_fsdp_wrapped_module') and isinstance(block._fsdp_wrapped_module,
AttnBlock)):
x = block(x, clip)
elif isinstance(block, TimestepBlock) or (
hasattr(block, '_fsdp_wrapped_module') and isinstance(block._fsdp_wrapped_module,
TimestepBlock)):
x = block(x, r_embed)
else:
x = block(x)
if j < len(repmap):
x = repmap[j](x)
x = upscaler(x)
return x
def forward(self, x, r, effnet, clip, pixels=None, **kwargs):
if pixels is None:
pixels = x.new_zeros(x.size(0), 3, 8, 8)
# Process the conditioning embeddings
r_embed = self.gen_r_embedding(r).to(dtype=x.dtype)
for c in self.t_conds:
t_cond = kwargs.get(c, torch.zeros_like(r))
r_embed = torch.cat([r_embed, self.gen_r_embedding(t_cond).to(dtype=x.dtype)], dim=1)
clip = self.gen_c_embeddings(clip)
# Model Blocks
x = self.embedding(x)
x = x + self.effnet_mapper(
nn.functional.interpolate(effnet, size=x.shape[-2:], mode='bilinear', align_corners=True))
x = x + nn.functional.interpolate(self.pixels_mapper(pixels), size=x.shape[-2:], mode='bilinear',
align_corners=True)
level_outputs = self._down_encode(x, r_embed, clip)
x = self._up_decode(level_outputs, r_embed, clip)
return self.clf(x)
def update_weights_ema(self, src_model, beta=0.999):
for self_params, src_params in zip(self.parameters(), src_model.parameters()):
self_params.data = self_params.data * beta + src_params.data.clone().to(self_params.device) * (1 - beta)
for self_buffers, src_buffers in zip(self.buffers(), src_model.buffers()):
self_buffers.data = self_buffers.data * beta + src_buffers.data.clone().to(self_buffers.device) * (1 - beta)

View File

@ -0,0 +1,273 @@
"""
This file is part of ComfyUI.
Copyright (C) 2024 Stability AI
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
import torch
from torch import nn
import math
from .common import AttnBlock, LayerNorm2d_op, ResBlock, FeedForwardBlock, TimestepBlock
# from .controlnet import ControlNetDeliverer
class UpDownBlock2d(nn.Module):
def __init__(self, c_in, c_out, mode, enabled=True, dtype=None, device=None, operations=None):
super().__init__()
assert mode in ['up', 'down']
interpolation = nn.Upsample(scale_factor=2 if mode == 'up' else 0.5, mode='bilinear',
align_corners=True) if enabled else nn.Identity()
mapping = operations.Conv2d(c_in, c_out, kernel_size=1, dtype=dtype, device=device)
self.blocks = nn.ModuleList([interpolation, mapping] if mode == 'up' else [mapping, interpolation])
def forward(self, x):
for block in self.blocks:
x = block(x)
return x
class StageC(nn.Module):
def __init__(self, c_in=16, c_out=16, c_r=64, patch_size=1, c_cond=2048, c_hidden=[2048, 2048], nhead=[32, 32],
blocks=[[8, 24], [24, 8]], block_repeat=[[1, 1], [1, 1]], level_config=['CTA', 'CTA'],
c_clip_text=1280, c_clip_text_pooled=1280, c_clip_img=768, c_clip_seq=4, kernel_size=3,
dropout=[0.0, 0.0], self_attn=True, t_conds=['sca', 'crp'], switch_level=[False], stable_cascade_stage=None,
dtype=None, device=None, operations=None):
super().__init__()
self.dtype = dtype
self.c_r = c_r
self.t_conds = t_conds
self.c_clip_seq = c_clip_seq
if not isinstance(dropout, list):
dropout = [dropout] * len(c_hidden)
if not isinstance(self_attn, list):
self_attn = [self_attn] * len(c_hidden)
# CONDITIONING
self.clip_txt_mapper = operations.Linear(c_clip_text, c_cond, dtype=dtype, device=device)
self.clip_txt_pooled_mapper = operations.Linear(c_clip_text_pooled, c_cond * c_clip_seq, dtype=dtype, device=device)
self.clip_img_mapper = operations.Linear(c_clip_img, c_cond * c_clip_seq, dtype=dtype, device=device)
self.clip_norm = operations.LayerNorm(c_cond, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
self.embedding = nn.Sequential(
nn.PixelUnshuffle(patch_size),
operations.Conv2d(c_in * (patch_size ** 2), c_hidden[0], kernel_size=1, dtype=dtype, device=device),
LayerNorm2d_op(operations)(c_hidden[0], elementwise_affine=False, eps=1e-6)
)
def get_block(block_type, c_hidden, nhead, c_skip=0, dropout=0, self_attn=True):
if block_type == 'C':
return ResBlock(c_hidden, c_skip, kernel_size=kernel_size, dropout=dropout, dtype=dtype, device=device, operations=operations)
elif block_type == 'A':
return AttnBlock(c_hidden, c_cond, nhead, self_attn=self_attn, dropout=dropout, dtype=dtype, device=device, operations=operations)
elif block_type == 'F':
return FeedForwardBlock(c_hidden, dropout=dropout, dtype=dtype, device=device, operations=operations)
elif block_type == 'T':
return TimestepBlock(c_hidden, c_r, conds=t_conds, dtype=dtype, device=device, operations=operations)
else:
raise Exception(f'Block type {block_type} not supported')
# BLOCKS
# -- down blocks
self.down_blocks = nn.ModuleList()
self.down_downscalers = nn.ModuleList()
self.down_repeat_mappers = nn.ModuleList()
for i in range(len(c_hidden)):
if i > 0:
self.down_downscalers.append(nn.Sequential(
LayerNorm2d_op(operations)(c_hidden[i - 1], elementwise_affine=False, eps=1e-6),
UpDownBlock2d(c_hidden[i - 1], c_hidden[i], mode='down', enabled=switch_level[i - 1], dtype=dtype, device=device, operations=operations)
))
else:
self.down_downscalers.append(nn.Identity())
down_block = nn.ModuleList()
for _ in range(blocks[0][i]):
for block_type in level_config[i]:
block = get_block(block_type, c_hidden[i], nhead[i], dropout=dropout[i], self_attn=self_attn[i])
down_block.append(block)
self.down_blocks.append(down_block)
if block_repeat is not None:
block_repeat_mappers = nn.ModuleList()
for _ in range(block_repeat[0][i] - 1):
block_repeat_mappers.append(operations.Conv2d(c_hidden[i], c_hidden[i], kernel_size=1, dtype=dtype, device=device))
self.down_repeat_mappers.append(block_repeat_mappers)
# -- up blocks
self.up_blocks = nn.ModuleList()
self.up_upscalers = nn.ModuleList()
self.up_repeat_mappers = nn.ModuleList()
for i in reversed(range(len(c_hidden))):
if i > 0:
self.up_upscalers.append(nn.Sequential(
LayerNorm2d_op(operations)(c_hidden[i], elementwise_affine=False, eps=1e-6),
UpDownBlock2d(c_hidden[i], c_hidden[i - 1], mode='up', enabled=switch_level[i - 1], dtype=dtype, device=device, operations=operations)
))
else:
self.up_upscalers.append(nn.Identity())
up_block = nn.ModuleList()
for j in range(blocks[1][::-1][i]):
for k, block_type in enumerate(level_config[i]):
c_skip = c_hidden[i] if i < len(c_hidden) - 1 and j == k == 0 else 0
block = get_block(block_type, c_hidden[i], nhead[i], c_skip=c_skip, dropout=dropout[i],
self_attn=self_attn[i])
up_block.append(block)
self.up_blocks.append(up_block)
if block_repeat is not None:
block_repeat_mappers = nn.ModuleList()
for _ in range(block_repeat[1][::-1][i] - 1):
block_repeat_mappers.append(operations.Conv2d(c_hidden[i], c_hidden[i], kernel_size=1, dtype=dtype, device=device))
self.up_repeat_mappers.append(block_repeat_mappers)
# OUTPUT
self.clf = nn.Sequential(
LayerNorm2d_op(operations)(c_hidden[0], elementwise_affine=False, eps=1e-6, dtype=dtype, device=device),
operations.Conv2d(c_hidden[0], c_out * (patch_size ** 2), kernel_size=1, dtype=dtype, device=device),
nn.PixelShuffle(patch_size),
)
# --- WEIGHT INIT ---
# self.apply(self._init_weights) # General init
# nn.init.normal_(self.clip_txt_mapper.weight, std=0.02) # conditionings
# nn.init.normal_(self.clip_txt_pooled_mapper.weight, std=0.02) # conditionings
# nn.init.normal_(self.clip_img_mapper.weight, std=0.02) # conditionings
# torch.nn.init.xavier_uniform_(self.embedding[1].weight, 0.02) # inputs
# nn.init.constant_(self.clf[1].weight, 0) # outputs
#
# # blocks
# for level_block in self.down_blocks + self.up_blocks:
# for block in level_block:
# if isinstance(block, ResBlock) or isinstance(block, FeedForwardBlock):
# block.channelwise[-1].weight.data *= np.sqrt(1 / sum(blocks[0]))
# elif isinstance(block, TimestepBlock):
# for layer in block.modules():
# if isinstance(layer, nn.Linear):
# nn.init.constant_(layer.weight, 0)
#
# def _init_weights(self, m):
# if isinstance(m, (nn.Conv2d, nn.Linear)):
# torch.nn.init.xavier_uniform_(m.weight)
# if m.bias is not None:
# nn.init.constant_(m.bias, 0)
def gen_r_embedding(self, r, max_positions=10000):
r = r * max_positions
half_dim = self.c_r // 2
emb = math.log(max_positions) / (half_dim - 1)
emb = torch.arange(half_dim, device=r.device).float().mul(-emb).exp()
emb = r[:, None] * emb[None, :]
emb = torch.cat([emb.sin(), emb.cos()], dim=1)
if self.c_r % 2 == 1: # zero pad
emb = nn.functional.pad(emb, (0, 1), mode='constant')
return emb
def gen_c_embeddings(self, clip_txt, clip_txt_pooled, clip_img):
clip_txt = self.clip_txt_mapper(clip_txt)
if len(clip_txt_pooled.shape) == 2:
clip_txt_pooled = clip_txt_pooled.unsqueeze(1)
if len(clip_img.shape) == 2:
clip_img = clip_img.unsqueeze(1)
clip_txt_pool = self.clip_txt_pooled_mapper(clip_txt_pooled).view(clip_txt_pooled.size(0), clip_txt_pooled.size(1) * self.c_clip_seq, -1)
clip_img = self.clip_img_mapper(clip_img).view(clip_img.size(0), clip_img.size(1) * self.c_clip_seq, -1)
clip = torch.cat([clip_txt, clip_txt_pool, clip_img], dim=1)
clip = self.clip_norm(clip)
return clip
def _down_encode(self, x, r_embed, clip, cnet=None):
level_outputs = []
block_group = zip(self.down_blocks, self.down_downscalers, self.down_repeat_mappers)
for down_block, downscaler, repmap in block_group:
x = downscaler(x)
for i in range(len(repmap) + 1):
for block in down_block:
if isinstance(block, ResBlock) or (
hasattr(block, '_fsdp_wrapped_module') and isinstance(block._fsdp_wrapped_module,
ResBlock)):
if cnet is not None:
next_cnet = cnet.pop()
if next_cnet is not None:
x = x + nn.functional.interpolate(next_cnet, size=x.shape[-2:], mode='bilinear',
align_corners=True).to(x.dtype)
x = block(x)
elif isinstance(block, AttnBlock) or (
hasattr(block, '_fsdp_wrapped_module') and isinstance(block._fsdp_wrapped_module,
AttnBlock)):
x = block(x, clip)
elif isinstance(block, TimestepBlock) or (
hasattr(block, '_fsdp_wrapped_module') and isinstance(block._fsdp_wrapped_module,
TimestepBlock)):
x = block(x, r_embed)
else:
x = block(x)
if i < len(repmap):
x = repmap[i](x)
level_outputs.insert(0, x)
return level_outputs
def _up_decode(self, level_outputs, r_embed, clip, cnet=None):
x = level_outputs[0]
block_group = zip(self.up_blocks, self.up_upscalers, self.up_repeat_mappers)
for i, (up_block, upscaler, repmap) in enumerate(block_group):
for j in range(len(repmap) + 1):
for k, block in enumerate(up_block):
if isinstance(block, ResBlock) or (
hasattr(block, '_fsdp_wrapped_module') and isinstance(block._fsdp_wrapped_module,
ResBlock)):
skip = level_outputs[i] if k == 0 and i > 0 else None
if skip is not None and (x.size(-1) != skip.size(-1) or x.size(-2) != skip.size(-2)):
x = torch.nn.functional.interpolate(x, skip.shape[-2:], mode='bilinear',
align_corners=True)
if cnet is not None:
next_cnet = cnet.pop()
if next_cnet is not None:
x = x + nn.functional.interpolate(next_cnet, size=x.shape[-2:], mode='bilinear',
align_corners=True).to(x.dtype)
x = block(x, skip)
elif isinstance(block, AttnBlock) or (
hasattr(block, '_fsdp_wrapped_module') and isinstance(block._fsdp_wrapped_module,
AttnBlock)):
x = block(x, clip)
elif isinstance(block, TimestepBlock) or (
hasattr(block, '_fsdp_wrapped_module') and isinstance(block._fsdp_wrapped_module,
TimestepBlock)):
x = block(x, r_embed)
else:
x = block(x)
if j < len(repmap):
x = repmap[j](x)
x = upscaler(x)
return x
def forward(self, x, r, clip_text, clip_text_pooled, clip_img, control=None, **kwargs):
# Process the conditioning embeddings
r_embed = self.gen_r_embedding(r).to(dtype=x.dtype)
for c in self.t_conds:
t_cond = kwargs.get(c, torch.zeros_like(r))
r_embed = torch.cat([r_embed, self.gen_r_embedding(t_cond).to(dtype=x.dtype)], dim=1)
clip = self.gen_c_embeddings(clip_text, clip_text_pooled, clip_img)
if control is not None:
cnet = control.get("input")
else:
cnet = None
# Model Blocks
x = self.embedding(x)
level_outputs = self._down_encode(x, r_embed, clip, cnet)
x = self._up_decode(level_outputs, r_embed, clip, cnet)
return self.clf(x)
def update_weights_ema(self, src_model, beta=0.999):
for self_params, src_params in zip(self.parameters(), src_model.parameters()):
self_params.data = self_params.data * beta + src_params.data.clone().to(self_params.device) * (1 - beta)
for self_buffers, src_buffers in zip(self.buffers(), src_model.buffers()):
self_buffers.data = self_buffers.data * beta + src_buffers.data.clone().to(self_buffers.device) * (1 - beta)

View File

@ -0,0 +1,95 @@
"""
This file is part of ComfyUI.
Copyright (C) 2024 Stability AI
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
"""
import torch
import torchvision
from torch import nn
# EfficientNet
class EfficientNetEncoder(nn.Module):
def __init__(self, c_latent=16):
super().__init__()
self.backbone = torchvision.models.efficientnet_v2_s().features.eval()
self.mapper = nn.Sequential(
nn.Conv2d(1280, c_latent, kernel_size=1, bias=False),
nn.BatchNorm2d(c_latent, affine=False), # then normalize them to have mean 0 and std 1
)
self.mean = nn.Parameter(torch.tensor([0.485, 0.456, 0.406]))
self.std = nn.Parameter(torch.tensor([0.229, 0.224, 0.225]))
def forward(self, x):
x = x * 0.5 + 0.5
x = (x - self.mean.view([3,1,1])) / self.std.view([3,1,1])
o = self.mapper(self.backbone(x))
return o
# Fast Decoder for Stage C latents. E.g. 16 x 24 x 24 -> 3 x 192 x 192
class Previewer(nn.Module):
def __init__(self, c_in=16, c_hidden=512, c_out=3):
super().__init__()
self.blocks = nn.Sequential(
nn.Conv2d(c_in, c_hidden, kernel_size=1), # 16 channels to 512 channels
nn.GELU(),
nn.BatchNorm2d(c_hidden),
nn.Conv2d(c_hidden, c_hidden, kernel_size=3, padding=1),
nn.GELU(),
nn.BatchNorm2d(c_hidden),
nn.ConvTranspose2d(c_hidden, c_hidden // 2, kernel_size=2, stride=2), # 16 -> 32
nn.GELU(),
nn.BatchNorm2d(c_hidden // 2),
nn.Conv2d(c_hidden // 2, c_hidden // 2, kernel_size=3, padding=1),
nn.GELU(),
nn.BatchNorm2d(c_hidden // 2),
nn.ConvTranspose2d(c_hidden // 2, c_hidden // 4, kernel_size=2, stride=2), # 32 -> 64
nn.GELU(),
nn.BatchNorm2d(c_hidden // 4),
nn.Conv2d(c_hidden // 4, c_hidden // 4, kernel_size=3, padding=1),
nn.GELU(),
nn.BatchNorm2d(c_hidden // 4),
nn.ConvTranspose2d(c_hidden // 4, c_hidden // 4, kernel_size=2, stride=2), # 64 -> 128
nn.GELU(),
nn.BatchNorm2d(c_hidden // 4),
nn.Conv2d(c_hidden // 4, c_hidden // 4, kernel_size=3, padding=1),
nn.GELU(),
nn.BatchNorm2d(c_hidden // 4),
nn.Conv2d(c_hidden // 4, c_out, kernel_size=1),
)
def forward(self, x):
return (self.blocks(x) - 0.5) * 2.0
class StageC_coder(nn.Module):
def __init__(self):
super().__init__()
self.previewer = Previewer()
self.encoder = EfficientNetEncoder()
def encode(self, x):
return self.encoder(x)
def decode(self, x):
return self.previewer(x)

21
comfy/ldm/common_dit.py Normal file
View File

@ -0,0 +1,21 @@
import torch
import comfy.ops
def pad_to_patch_size(img, patch_size=(2, 2), padding_mode="circular"):
if padding_mode == "circular" and torch.jit.is_tracing() or torch.jit.is_scripting():
padding_mode = "reflect"
pad_h = (patch_size[0] - img.shape[-2] % patch_size[0]) % patch_size[0]
pad_w = (patch_size[1] - img.shape[-1] % patch_size[1]) % patch_size[1]
return torch.nn.functional.pad(img, (0, pad_w, 0, pad_h), mode=padding_mode)
try:
rms_norm_torch = torch.nn.functional.rms_norm
except:
rms_norm_torch = None
def rms_norm(x, weight, eps=1e-6):
if rms_norm_torch is not None:
return rms_norm_torch(x, weight.shape, weight=comfy.ops.cast_to(weight, dtype=x.dtype, device=x.device), eps=eps)
else:
rrms = torch.rsqrt(torch.mean(x**2, dim=-1, keepdim=True) + eps)
return (x * rrms) * comfy.ops.cast_to(weight, dtype=x.dtype, device=x.device)

View File

@ -1,24 +0,0 @@
import torch
from ldm.modules.midas.api import load_midas_transform
class AddMiDaS(object):
def __init__(self, model_type):
super().__init__()
self.transform = load_midas_transform(model_type)
def pt2np(self, x):
x = ((x + 1.0) * .5).detach().cpu().numpy()
return x
def np2pt(self, x):
x = torch.from_numpy(x) * 2 - 1.
return x
def __call__(self, sample):
# sample['jpg'] is tensor hwc in [-1, 1] at this point
x = self.pt2np(sample['jpg'])
x = self.transform({"image": x})["image"]
sample['midas_in'] = x
return sample

View File

@ -0,0 +1,151 @@
#Original code can be found on: https://github.com/XLabs-AI/x-flux/blob/main/src/flux/controlnet.py
import torch
import math
from torch import Tensor, nn
from einops import rearrange, repeat
from .layers import (DoubleStreamBlock, EmbedND, LastLayer,
MLPEmbedder, SingleStreamBlock,
timestep_embedding)
from .model import Flux
import comfy.ldm.common_dit
class ControlNetFlux(Flux):
def __init__(self, latent_input=False, num_union_modes=0, image_model=None, dtype=None, device=None, operations=None, **kwargs):
super().__init__(final_layer=False, dtype=dtype, device=device, operations=operations, **kwargs)
self.main_model_double = 19
self.main_model_single = 38
# add ControlNet blocks
self.controlnet_blocks = nn.ModuleList([])
for _ in range(self.params.depth):
controlnet_block = operations.Linear(self.hidden_size, self.hidden_size, dtype=dtype, device=device)
self.controlnet_blocks.append(controlnet_block)
self.controlnet_single_blocks = nn.ModuleList([])
for _ in range(self.params.depth_single_blocks):
self.controlnet_single_blocks.append(operations.Linear(self.hidden_size, self.hidden_size, dtype=dtype, device=device))
self.num_union_modes = num_union_modes
self.controlnet_mode_embedder = None
if self.num_union_modes > 0:
self.controlnet_mode_embedder = operations.Embedding(self.num_union_modes, self.hidden_size, dtype=dtype, device=device)
self.gradient_checkpointing = False
self.latent_input = latent_input
self.pos_embed_input = operations.Linear(self.in_channels, self.hidden_size, bias=True, dtype=dtype, device=device)
if not self.latent_input:
self.input_hint_block = nn.Sequential(
operations.Conv2d(3, 16, 3, padding=1, dtype=dtype, device=device),
nn.SiLU(),
operations.Conv2d(16, 16, 3, padding=1, dtype=dtype, device=device),
nn.SiLU(),
operations.Conv2d(16, 16, 3, padding=1, stride=2, dtype=dtype, device=device),
nn.SiLU(),
operations.Conv2d(16, 16, 3, padding=1, dtype=dtype, device=device),
nn.SiLU(),
operations.Conv2d(16, 16, 3, padding=1, stride=2, dtype=dtype, device=device),
nn.SiLU(),
operations.Conv2d(16, 16, 3, padding=1, dtype=dtype, device=device),
nn.SiLU(),
operations.Conv2d(16, 16, 3, padding=1, stride=2, dtype=dtype, device=device),
nn.SiLU(),
operations.Conv2d(16, 16, 3, padding=1, dtype=dtype, device=device)
)
def forward_orig(
self,
img: Tensor,
img_ids: Tensor,
controlnet_cond: Tensor,
txt: Tensor,
txt_ids: Tensor,
timesteps: Tensor,
y: Tensor,
guidance: Tensor = None,
control_type: Tensor = None,
) -> Tensor:
if img.ndim != 3 or txt.ndim != 3:
raise ValueError("Input img and txt tensors must have 3 dimensions.")
# running on sequences img
img = self.img_in(img)
if not self.latent_input:
controlnet_cond = self.input_hint_block(controlnet_cond)
controlnet_cond = rearrange(controlnet_cond, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=2, pw=2)
controlnet_cond = self.pos_embed_input(controlnet_cond)
img = img + controlnet_cond
vec = self.time_in(timestep_embedding(timesteps, 256))
if self.params.guidance_embed:
vec = vec + self.guidance_in(timestep_embedding(guidance, 256))
vec = vec + self.vector_in(y)
txt = self.txt_in(txt)
if self.controlnet_mode_embedder is not None and len(control_type) > 0:
control_cond = self.controlnet_mode_embedder(torch.tensor(control_type, device=img.device), out_dtype=img.dtype).unsqueeze(0).repeat((txt.shape[0], 1, 1))
txt = torch.cat([control_cond, txt], dim=1)
txt_ids = torch.cat([txt_ids[:,:1], txt_ids], dim=1)
ids = torch.cat((txt_ids, img_ids), dim=1)
pe = self.pe_embedder(ids)
controlnet_double = ()
for i in range(len(self.double_blocks)):
img, txt = self.double_blocks[i](img=img, txt=txt, vec=vec, pe=pe)
controlnet_double = controlnet_double + (self.controlnet_blocks[i](img),)
img = torch.cat((txt, img), 1)
controlnet_single = ()
for i in range(len(self.single_blocks)):
img = self.single_blocks[i](img, vec=vec, pe=pe)
controlnet_single = controlnet_single + (self.controlnet_single_blocks[i](img[:, txt.shape[1] :, ...]),)
repeat = math.ceil(self.main_model_double / len(controlnet_double))
if self.latent_input:
out_input = ()
for x in controlnet_double:
out_input += (x,) * repeat
else:
out_input = (controlnet_double * repeat)
out = {"input": out_input[:self.main_model_double]}
if len(controlnet_single) > 0:
repeat = math.ceil(self.main_model_single / len(controlnet_single))
out_output = ()
if self.latent_input:
for x in controlnet_single:
out_output += (x,) * repeat
else:
out_output = (controlnet_single * repeat)
out["output"] = out_output[:self.main_model_single]
return out
def forward(self, x, timesteps, context, y, guidance=None, hint=None, **kwargs):
patch_size = 2
if self.latent_input:
hint = comfy.ldm.common_dit.pad_to_patch_size(hint, (patch_size, patch_size))
hint = rearrange(hint, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size)
else:
hint = hint * 2.0 - 1.0
bs, c, h, w = x.shape
x = comfy.ldm.common_dit.pad_to_patch_size(x, (patch_size, patch_size))
img = rearrange(x, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size)
h_len = ((h + (patch_size // 2)) // patch_size)
w_len = ((w + (patch_size // 2)) // patch_size)
img_ids = torch.zeros((h_len, w_len, 3), device=x.device, dtype=x.dtype)
img_ids[..., 1] = img_ids[..., 1] + torch.linspace(0, h_len - 1, steps=h_len, device=x.device, dtype=x.dtype)[:, None]
img_ids[..., 2] = img_ids[..., 2] + torch.linspace(0, w_len - 1, steps=w_len, device=x.device, dtype=x.dtype)[None, :]
img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs)
txt_ids = torch.zeros((bs, context.shape[1], 3), device=x.device, dtype=x.dtype)
return self.forward_orig(img, img_ids, hint, context, txt_ids, timesteps, y, guidance, control_type=kwargs.get("control_type", []))

249
comfy/ldm/flux/layers.py Normal file
View File

@ -0,0 +1,249 @@
import math
from dataclasses import dataclass
import torch
from torch import Tensor, nn
from .math import attention, rope
import comfy.ops
import comfy.ldm.common_dit
class EmbedND(nn.Module):
def __init__(self, dim: int, theta: int, axes_dim: list):
super().__init__()
self.dim = dim
self.theta = theta
self.axes_dim = axes_dim
def forward(self, ids: Tensor) -> Tensor:
n_axes = ids.shape[-1]
emb = torch.cat(
[rope(ids[..., i], self.axes_dim[i], self.theta) for i in range(n_axes)],
dim=-3,
)
return emb.unsqueeze(1)
def timestep_embedding(t: Tensor, dim, max_period=10000, time_factor: float = 1000.0):
"""
Create sinusoidal timestep embeddings.
:param t: a 1-D Tensor of N indices, one per batch element.
These may be fractional.
:param dim: the dimension of the output.
:param max_period: controls the minimum frequency of the embeddings.
:return: an (N, D) Tensor of positional embeddings.
"""
t = time_factor * t
half = dim // 2
freqs = torch.exp(-math.log(max_period) * torch.arange(start=0, end=half, dtype=torch.float32, device=t.device) / half)
args = t[:, None].float() * freqs[None]
embedding = torch.cat([torch.cos(args), torch.sin(args)], dim=-1)
if dim % 2:
embedding = torch.cat([embedding, torch.zeros_like(embedding[:, :1])], dim=-1)
if torch.is_floating_point(t):
embedding = embedding.to(t)
return embedding
class MLPEmbedder(nn.Module):
def __init__(self, in_dim: int, hidden_dim: int, dtype=None, device=None, operations=None):
super().__init__()
self.in_layer = operations.Linear(in_dim, hidden_dim, bias=True, dtype=dtype, device=device)
self.silu = nn.SiLU()
self.out_layer = operations.Linear(hidden_dim, hidden_dim, bias=True, dtype=dtype, device=device)
def forward(self, x: Tensor) -> Tensor:
return self.out_layer(self.silu(self.in_layer(x)))
class RMSNorm(torch.nn.Module):
def __init__(self, dim: int, dtype=None, device=None, operations=None):
super().__init__()
self.scale = nn.Parameter(torch.empty((dim), dtype=dtype, device=device))
def forward(self, x: Tensor):
return comfy.ldm.common_dit.rms_norm(x, self.scale, 1e-6)
class QKNorm(torch.nn.Module):
def __init__(self, dim: int, dtype=None, device=None, operations=None):
super().__init__()
self.query_norm = RMSNorm(dim, dtype=dtype, device=device, operations=operations)
self.key_norm = RMSNorm(dim, dtype=dtype, device=device, operations=operations)
def forward(self, q: Tensor, k: Tensor, v: Tensor) -> tuple:
q = self.query_norm(q)
k = self.key_norm(k)
return q.to(v), k.to(v)
class SelfAttention(nn.Module):
def __init__(self, dim: int, num_heads: int = 8, qkv_bias: bool = False, dtype=None, device=None, operations=None):
super().__init__()
self.num_heads = num_heads
head_dim = dim // num_heads
self.qkv = operations.Linear(dim, dim * 3, bias=qkv_bias, dtype=dtype, device=device)
self.norm = QKNorm(head_dim, dtype=dtype, device=device, operations=operations)
self.proj = operations.Linear(dim, dim, dtype=dtype, device=device)
@dataclass
class ModulationOut:
shift: Tensor
scale: Tensor
gate: Tensor
class Modulation(nn.Module):
def __init__(self, dim: int, double: bool, dtype=None, device=None, operations=None):
super().__init__()
self.is_double = double
self.multiplier = 6 if double else 3
self.lin = operations.Linear(dim, self.multiplier * dim, bias=True, dtype=dtype, device=device)
def forward(self, vec: Tensor) -> tuple:
out = self.lin(nn.functional.silu(vec))[:, None, :].chunk(self.multiplier, dim=-1)
return (
ModulationOut(*out[:3]),
ModulationOut(*out[3:]) if self.is_double else None,
)
class DoubleStreamBlock(nn.Module):
def __init__(self, hidden_size: int, num_heads: int, mlp_ratio: float, qkv_bias: bool = False, dtype=None, device=None, operations=None):
super().__init__()
mlp_hidden_dim = int(hidden_size * mlp_ratio)
self.num_heads = num_heads
self.hidden_size = hidden_size
self.img_mod = Modulation(hidden_size, double=True, dtype=dtype, device=device, operations=operations)
self.img_norm1 = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
self.img_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias, dtype=dtype, device=device, operations=operations)
self.img_norm2 = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
self.img_mlp = nn.Sequential(
operations.Linear(hidden_size, mlp_hidden_dim, bias=True, dtype=dtype, device=device),
nn.GELU(approximate="tanh"),
operations.Linear(mlp_hidden_dim, hidden_size, bias=True, dtype=dtype, device=device),
)
self.txt_mod = Modulation(hidden_size, double=True, dtype=dtype, device=device, operations=operations)
self.txt_norm1 = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
self.txt_attn = SelfAttention(dim=hidden_size, num_heads=num_heads, qkv_bias=qkv_bias, dtype=dtype, device=device, operations=operations)
self.txt_norm2 = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
self.txt_mlp = nn.Sequential(
operations.Linear(hidden_size, mlp_hidden_dim, bias=True, dtype=dtype, device=device),
nn.GELU(approximate="tanh"),
operations.Linear(mlp_hidden_dim, hidden_size, bias=True, dtype=dtype, device=device),
)
def forward(self, img: Tensor, txt: Tensor, vec: Tensor, pe: Tensor):
img_mod1, img_mod2 = self.img_mod(vec)
txt_mod1, txt_mod2 = self.txt_mod(vec)
# prepare image for attention
img_modulated = self.img_norm1(img)
img_modulated = (1 + img_mod1.scale) * img_modulated + img_mod1.shift
img_qkv = self.img_attn.qkv(img_modulated)
img_q, img_k, img_v = img_qkv.view(img_qkv.shape[0], img_qkv.shape[1], 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
img_q, img_k = self.img_attn.norm(img_q, img_k, img_v)
# prepare txt for attention
txt_modulated = self.txt_norm1(txt)
txt_modulated = (1 + txt_mod1.scale) * txt_modulated + txt_mod1.shift
txt_qkv = self.txt_attn.qkv(txt_modulated)
txt_q, txt_k, txt_v = txt_qkv.view(txt_qkv.shape[0], txt_qkv.shape[1], 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
txt_q, txt_k = self.txt_attn.norm(txt_q, txt_k, txt_v)
# run actual attention
attn = attention(torch.cat((txt_q, img_q), dim=2),
torch.cat((txt_k, img_k), dim=2),
torch.cat((txt_v, img_v), dim=2), pe=pe)
txt_attn, img_attn = attn[:, : txt.shape[1]], attn[:, txt.shape[1] :]
# calculate the img bloks
img = img + img_mod1.gate * self.img_attn.proj(img_attn)
img = img + img_mod2.gate * self.img_mlp((1 + img_mod2.scale) * self.img_norm2(img) + img_mod2.shift)
# calculate the txt bloks
txt += txt_mod1.gate * self.txt_attn.proj(txt_attn)
txt += txt_mod2.gate * self.txt_mlp((1 + txt_mod2.scale) * self.txt_norm2(txt) + txt_mod2.shift)
if txt.dtype == torch.float16:
txt = torch.nan_to_num(txt, nan=0.0, posinf=65504, neginf=-65504)
return img, txt
class SingleStreamBlock(nn.Module):
"""
A DiT block with parallel linear layers as described in
https://arxiv.org/abs/2302.05442 and adapted modulation interface.
"""
def __init__(
self,
hidden_size: int,
num_heads: int,
mlp_ratio: float = 4.0,
qk_scale: float = None,
dtype=None,
device=None,
operations=None
):
super().__init__()
self.hidden_dim = hidden_size
self.num_heads = num_heads
head_dim = hidden_size // num_heads
self.scale = qk_scale or head_dim**-0.5
self.mlp_hidden_dim = int(hidden_size * mlp_ratio)
# qkv and mlp_in
self.linear1 = operations.Linear(hidden_size, hidden_size * 3 + self.mlp_hidden_dim, dtype=dtype, device=device)
# proj and mlp_out
self.linear2 = operations.Linear(hidden_size + self.mlp_hidden_dim, hidden_size, dtype=dtype, device=device)
self.norm = QKNorm(head_dim, dtype=dtype, device=device, operations=operations)
self.hidden_size = hidden_size
self.pre_norm = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
self.mlp_act = nn.GELU(approximate="tanh")
self.modulation = Modulation(hidden_size, double=False, dtype=dtype, device=device, operations=operations)
def forward(self, x: Tensor, vec: Tensor, pe: Tensor) -> Tensor:
mod, _ = self.modulation(vec)
x_mod = (1 + mod.scale) * self.pre_norm(x) + mod.shift
qkv, mlp = torch.split(self.linear1(x_mod), [3 * self.hidden_size, self.mlp_hidden_dim], dim=-1)
q, k, v = qkv.view(qkv.shape[0], qkv.shape[1], 3, self.num_heads, -1).permute(2, 0, 3, 1, 4)
q, k = self.norm(q, k, v)
# compute attention
attn = attention(q, k, v, pe=pe)
# compute activation in mlp stream, cat again and run second linear layer
output = self.linear2(torch.cat((attn, self.mlp_act(mlp)), 2))
x += mod.gate * output
if x.dtype == torch.float16:
x = torch.nan_to_num(x, nan=0.0, posinf=65504, neginf=-65504)
return x
class LastLayer(nn.Module):
def __init__(self, hidden_size: int, patch_size: int, out_channels: int, dtype=None, device=None, operations=None):
super().__init__()
self.norm_final = operations.LayerNorm(hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
self.linear = operations.Linear(hidden_size, patch_size * patch_size * out_channels, bias=True, dtype=dtype, device=device)
self.adaLN_modulation = nn.Sequential(nn.SiLU(), operations.Linear(hidden_size, 2 * hidden_size, bias=True, dtype=dtype, device=device))
def forward(self, x: Tensor, vec: Tensor) -> Tensor:
shift, scale = self.adaLN_modulation(vec).chunk(2, dim=1)
x = (1 + scale[:, None, :]) * self.norm_final(x) + shift[:, None, :]
x = self.linear(x)
return x

35
comfy/ldm/flux/math.py Normal file
View File

@ -0,0 +1,35 @@
import torch
from einops import rearrange
from torch import Tensor
from comfy.ldm.modules.attention import optimized_attention
import comfy.model_management
def attention(q: Tensor, k: Tensor, v: Tensor, pe: Tensor) -> Tensor:
q, k = apply_rope(q, k, pe)
heads = q.shape[1]
x = optimized_attention(q, k, v, heads, skip_reshape=True)
return x
def rope(pos: Tensor, dim: int, theta: int) -> Tensor:
assert dim % 2 == 0
if comfy.model_management.is_device_mps(pos.device) or comfy.model_management.is_intel_xpu():
device = torch.device("cpu")
else:
device = pos.device
scale = torch.linspace(0, (dim - 2) / dim, steps=dim//2, dtype=torch.float64, device=device)
omega = 1.0 / (theta**scale)
out = torch.einsum("...n,d->...nd", pos.to(dtype=torch.float32, device=device), omega)
out = torch.stack([torch.cos(out), -torch.sin(out), torch.sin(out), torch.cos(out)], dim=-1)
out = rearrange(out, "b n d (i j) -> b n d i j", i=2, j=2)
return out.to(dtype=torch.float32, device=pos.device)
def apply_rope(xq: Tensor, xk: Tensor, freqs_cis: Tensor):
xq_ = xq.float().reshape(*xq.shape[:-1], -1, 1, 2)
xk_ = xk.float().reshape(*xk.shape[:-1], -1, 1, 2)
xq_out = freqs_cis[..., 0] * xq_[..., 0] + freqs_cis[..., 1] * xq_[..., 1]
xk_out = freqs_cis[..., 0] * xk_[..., 0] + freqs_cis[..., 1] * xk_[..., 1]
return xq_out.reshape(*xq.shape).type_as(xq), xk_out.reshape(*xk.shape).type_as(xk)

160
comfy/ldm/flux/model.py Normal file
View File

@ -0,0 +1,160 @@
#Original code can be found on: https://github.com/black-forest-labs/flux
from dataclasses import dataclass
import torch
from torch import Tensor, nn
from .layers import (
DoubleStreamBlock,
EmbedND,
LastLayer,
MLPEmbedder,
SingleStreamBlock,
timestep_embedding,
)
from einops import rearrange, repeat
import comfy.ldm.common_dit
@dataclass
class FluxParams:
in_channels: int
vec_in_dim: int
context_in_dim: int
hidden_size: int
mlp_ratio: float
num_heads: int
depth: int
depth_single_blocks: int
axes_dim: list
theta: int
qkv_bias: bool
guidance_embed: bool
class Flux(nn.Module):
"""
Transformer model for flow matching on sequences.
"""
def __init__(self, image_model=None, final_layer=True, dtype=None, device=None, operations=None, **kwargs):
super().__init__()
self.dtype = dtype
params = FluxParams(**kwargs)
self.params = params
self.in_channels = params.in_channels * 2 * 2
self.out_channels = self.in_channels
if params.hidden_size % params.num_heads != 0:
raise ValueError(
f"Hidden size {params.hidden_size} must be divisible by num_heads {params.num_heads}"
)
pe_dim = params.hidden_size // params.num_heads
if sum(params.axes_dim) != pe_dim:
raise ValueError(f"Got {params.axes_dim} but expected positional dim {pe_dim}")
self.hidden_size = params.hidden_size
self.num_heads = params.num_heads
self.pe_embedder = EmbedND(dim=pe_dim, theta=params.theta, axes_dim=params.axes_dim)
self.img_in = operations.Linear(self.in_channels, self.hidden_size, bias=True, dtype=dtype, device=device)
self.time_in = MLPEmbedder(in_dim=256, hidden_dim=self.hidden_size, dtype=dtype, device=device, operations=operations)
self.vector_in = MLPEmbedder(params.vec_in_dim, self.hidden_size, dtype=dtype, device=device, operations=operations)
self.guidance_in = (
MLPEmbedder(in_dim=256, hidden_dim=self.hidden_size, dtype=dtype, device=device, operations=operations) if params.guidance_embed else nn.Identity()
)
self.txt_in = operations.Linear(params.context_in_dim, self.hidden_size, dtype=dtype, device=device)
self.double_blocks = nn.ModuleList(
[
DoubleStreamBlock(
self.hidden_size,
self.num_heads,
mlp_ratio=params.mlp_ratio,
qkv_bias=params.qkv_bias,
dtype=dtype, device=device, operations=operations
)
for _ in range(params.depth)
]
)
self.single_blocks = nn.ModuleList(
[
SingleStreamBlock(self.hidden_size, self.num_heads, mlp_ratio=params.mlp_ratio, dtype=dtype, device=device, operations=operations)
for _ in range(params.depth_single_blocks)
]
)
if final_layer:
self.final_layer = LastLayer(self.hidden_size, 1, self.out_channels, dtype=dtype, device=device, operations=operations)
def forward_orig(
self,
img: Tensor,
img_ids: Tensor,
txt: Tensor,
txt_ids: Tensor,
timesteps: Tensor,
y: Tensor,
guidance: Tensor = None,
control=None,
) -> Tensor:
if img.ndim != 3 or txt.ndim != 3:
raise ValueError("Input img and txt tensors must have 3 dimensions.")
# running on sequences img
img = self.img_in(img)
vec = self.time_in(timestep_embedding(timesteps, 256).to(img.dtype))
if self.params.guidance_embed:
if guidance is None:
raise ValueError("Didn't get guidance strength for guidance distilled model.")
vec = vec + self.guidance_in(timestep_embedding(guidance, 256).to(img.dtype))
vec = vec + self.vector_in(y)
txt = self.txt_in(txt)
ids = torch.cat((txt_ids, img_ids), dim=1)
pe = self.pe_embedder(ids)
for i, block in enumerate(self.double_blocks):
img, txt = block(img=img, txt=txt, vec=vec, pe=pe)
if control is not None: # Controlnet
control_i = control.get("input")
if i < len(control_i):
add = control_i[i]
if add is not None:
img += add
img = torch.cat((txt, img), 1)
for i, block in enumerate(self.single_blocks):
img = block(img, vec=vec, pe=pe)
if control is not None: # Controlnet
control_o = control.get("output")
if i < len(control_o):
add = control_o[i]
if add is not None:
img[:, txt.shape[1] :, ...] += add
img = img[:, txt.shape[1] :, ...]
img = self.final_layer(img, vec) # (N, T, patch_size ** 2 * out_channels)
return img
def forward(self, x, timestep, context, y, guidance, control=None, **kwargs):
bs, c, h, w = x.shape
patch_size = 2
x = comfy.ldm.common_dit.pad_to_patch_size(x, (patch_size, patch_size))
img = rearrange(x, "b c (h ph) (w pw) -> b (h w) (c ph pw)", ph=patch_size, pw=patch_size)
h_len = ((h + (patch_size // 2)) // patch_size)
w_len = ((w + (patch_size // 2)) // patch_size)
img_ids = torch.zeros((h_len, w_len, 3), device=x.device, dtype=x.dtype)
img_ids[..., 1] = img_ids[..., 1] + torch.linspace(0, h_len - 1, steps=h_len, device=x.device, dtype=x.dtype)[:, None]
img_ids[..., 2] = img_ids[..., 2] + torch.linspace(0, w_len - 1, steps=w_len, device=x.device, dtype=x.dtype)[None, :]
img_ids = repeat(img_ids, "h w c -> b (h w) c", b=bs)
txt_ids = torch.zeros((bs, context.shape[1], 3), device=x.device, dtype=x.dtype)
out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance, control)
return rearrange(out, "b (h w) (c ph pw) -> b c (h ph) (w pw)", h=h_len, w=w_len, ph=2, pw=2)[:,:,:h,:w]

View File

@ -0,0 +1,218 @@
import torch
import torch.nn as nn
from typing import Tuple, Union, Optional
from comfy.ldm.modules.attention import optimized_attention
def reshape_for_broadcast(freqs_cis: Union[torch.Tensor, Tuple[torch.Tensor]], x: torch.Tensor, head_first=False):
"""
Reshape frequency tensor for broadcasting it with another tensor.
This function reshapes the frequency tensor to have the same shape as the target tensor 'x'
for the purpose of broadcasting the frequency tensor during element-wise operations.
Args:
freqs_cis (Union[torch.Tensor, Tuple[torch.Tensor]]): Frequency tensor to be reshaped.
x (torch.Tensor): Target tensor for broadcasting compatibility.
head_first (bool): head dimension first (except batch dim) or not.
Returns:
torch.Tensor: Reshaped frequency tensor.
Raises:
AssertionError: If the frequency tensor doesn't match the expected shape.
AssertionError: If the target tensor 'x' doesn't have the expected number of dimensions.
"""
ndim = x.ndim
assert 0 <= 1 < ndim
if isinstance(freqs_cis, tuple):
# freqs_cis: (cos, sin) in real space
if head_first:
assert freqs_cis[0].shape == (x.shape[-2], x.shape[-1]), f'freqs_cis shape {freqs_cis[0].shape} does not match x shape {x.shape}'
shape = [d if i == ndim - 2 or i == ndim - 1 else 1 for i, d in enumerate(x.shape)]
else:
assert freqs_cis[0].shape == (x.shape[1], x.shape[-1]), f'freqs_cis shape {freqs_cis[0].shape} does not match x shape {x.shape}'
shape = [d if i == 1 or i == ndim - 1 else 1 for i, d in enumerate(x.shape)]
return freqs_cis[0].view(*shape), freqs_cis[1].view(*shape)
else:
# freqs_cis: values in complex space
if head_first:
assert freqs_cis.shape == (x.shape[-2], x.shape[-1]), f'freqs_cis shape {freqs_cis.shape} does not match x shape {x.shape}'
shape = [d if i == ndim - 2 or i == ndim - 1 else 1 for i, d in enumerate(x.shape)]
else:
assert freqs_cis.shape == (x.shape[1], x.shape[-1]), f'freqs_cis shape {freqs_cis.shape} does not match x shape {x.shape}'
shape = [d if i == 1 or i == ndim - 1 else 1 for i, d in enumerate(x.shape)]
return freqs_cis.view(*shape)
def rotate_half(x):
x_real, x_imag = x.reshape(*x.shape[:-1], -1, 2).unbind(-1) # [B, S, H, D//2]
return torch.stack([-x_imag, x_real], dim=-1).flatten(3)
def apply_rotary_emb(
xq: torch.Tensor,
xk: Optional[torch.Tensor],
freqs_cis: Union[torch.Tensor, Tuple[torch.Tensor]],
head_first: bool = False,
) -> Tuple[torch.Tensor, torch.Tensor]:
"""
Apply rotary embeddings to input tensors using the given frequency tensor.
This function applies rotary embeddings to the given query 'xq' and key 'xk' tensors using the provided
frequency tensor 'freqs_cis'. The input tensors are reshaped as complex numbers, and the frequency tensor
is reshaped for broadcasting compatibility. The resulting tensors contain rotary embeddings and are
returned as real tensors.
Args:
xq (torch.Tensor): Query tensor to apply rotary embeddings. [B, S, H, D]
xk (torch.Tensor): Key tensor to apply rotary embeddings. [B, S, H, D]
freqs_cis (Union[torch.Tensor, Tuple[torch.Tensor]]): Precomputed frequency tensor for complex exponentials.
head_first (bool): head dimension first (except batch dim) or not.
Returns:
Tuple[torch.Tensor, torch.Tensor]: Tuple of modified query tensor and key tensor with rotary embeddings.
"""
xk_out = None
if isinstance(freqs_cis, tuple):
cos, sin = reshape_for_broadcast(freqs_cis, xq, head_first) # [S, D]
xq_out = (xq * cos + rotate_half(xq) * sin)
if xk is not None:
xk_out = (xk * cos + rotate_half(xk) * sin)
else:
xq_ = torch.view_as_complex(xq.float().reshape(*xq.shape[:-1], -1, 2)) # [B, S, H, D//2]
freqs_cis = reshape_for_broadcast(freqs_cis, xq_, head_first).to(xq.device) # [S, D//2] --> [1, S, 1, D//2]
xq_out = torch.view_as_real(xq_ * freqs_cis).flatten(3).type_as(xq)
if xk is not None:
xk_ = torch.view_as_complex(xk.float().reshape(*xk.shape[:-1], -1, 2)) # [B, S, H, D//2]
xk_out = torch.view_as_real(xk_ * freqs_cis).flatten(3).type_as(xk)
return xq_out, xk_out
class CrossAttention(nn.Module):
"""
Use QK Normalization.
"""
def __init__(self,
qdim,
kdim,
num_heads,
qkv_bias=True,
qk_norm=False,
attn_drop=0.0,
proj_drop=0.0,
attn_precision=None,
device=None,
dtype=None,
operations=None,
):
factory_kwargs = {'device': device, 'dtype': dtype}
super().__init__()
self.attn_precision = attn_precision
self.qdim = qdim
self.kdim = kdim
self.num_heads = num_heads
assert self.qdim % num_heads == 0, "self.qdim must be divisible by num_heads"
self.head_dim = self.qdim // num_heads
assert self.head_dim % 8 == 0 and self.head_dim <= 128, "Only support head_dim <= 128 and divisible by 8"
self.scale = self.head_dim ** -0.5
self.q_proj = operations.Linear(qdim, qdim, bias=qkv_bias, **factory_kwargs)
self.kv_proj = operations.Linear(kdim, 2 * qdim, bias=qkv_bias, **factory_kwargs)
# TODO: eps should be 1 / 65530 if using fp16
self.q_norm = operations.LayerNorm(self.head_dim, elementwise_affine=True, eps=1e-6, dtype=dtype, device=device) if qk_norm else nn.Identity()
self.k_norm = operations.LayerNorm(self.head_dim, elementwise_affine=True, eps=1e-6, dtype=dtype, device=device) if qk_norm else nn.Identity()
self.attn_drop = nn.Dropout(attn_drop)
self.out_proj = operations.Linear(qdim, qdim, bias=qkv_bias, **factory_kwargs)
self.proj_drop = nn.Dropout(proj_drop)
def forward(self, x, y, freqs_cis_img=None):
"""
Parameters
----------
x: torch.Tensor
(batch, seqlen1, hidden_dim) (where hidden_dim = num heads * head dim)
y: torch.Tensor
(batch, seqlen2, hidden_dim2)
freqs_cis_img: torch.Tensor
(batch, hidden_dim // 2), RoPE for image
"""
b, s1, c = x.shape # [b, s1, D]
_, s2, c = y.shape # [b, s2, 1024]
q = self.q_proj(x).view(b, s1, self.num_heads, self.head_dim) # [b, s1, h, d]
kv = self.kv_proj(y).view(b, s2, 2, self.num_heads, self.head_dim) # [b, s2, 2, h, d]
k, v = kv.unbind(dim=2) # [b, s, h, d]
q = self.q_norm(q)
k = self.k_norm(k)
# Apply RoPE if needed
if freqs_cis_img is not None:
qq, _ = apply_rotary_emb(q, None, freqs_cis_img)
assert qq.shape == q.shape, f'qq: {qq.shape}, q: {q.shape}'
q = qq
q = q.transpose(-2, -3).contiguous() # q -> B, L1, H, C - B, H, L1, C
k = k.transpose(-2, -3).contiguous() # k -> B, L2, H, C - B, H, C, L2
v = v.transpose(-2, -3).contiguous()
context = optimized_attention(q, k, v, self.num_heads, skip_reshape=True, attn_precision=self.attn_precision)
out = self.out_proj(context) # context.reshape - B, L1, -1
out = self.proj_drop(out)
out_tuple = (out,)
return out_tuple
class Attention(nn.Module):
"""
We rename some layer names to align with flash attention
"""
def __init__(self, dim, num_heads, qkv_bias=True, qk_norm=False, attn_drop=0., proj_drop=0., attn_precision=None, dtype=None, device=None, operations=None):
super().__init__()
self.attn_precision = attn_precision
self.dim = dim
self.num_heads = num_heads
assert self.dim % num_heads == 0, 'dim should be divisible by num_heads'
self.head_dim = self.dim // num_heads
# This assertion is aligned with flash attention
assert self.head_dim % 8 == 0 and self.head_dim <= 128, "Only support head_dim <= 128 and divisible by 8"
self.scale = self.head_dim ** -0.5
# qkv --> Wqkv
self.Wqkv = operations.Linear(dim, dim * 3, bias=qkv_bias, dtype=dtype, device=device)
# TODO: eps should be 1 / 65530 if using fp16
self.q_norm = operations.LayerNorm(self.head_dim, elementwise_affine=True, eps=1e-6, dtype=dtype, device=device) if qk_norm else nn.Identity()
self.k_norm = operations.LayerNorm(self.head_dim, elementwise_affine=True, eps=1e-6, dtype=dtype, device=device) if qk_norm else nn.Identity()
self.attn_drop = nn.Dropout(attn_drop)
self.out_proj = operations.Linear(dim, dim, dtype=dtype, device=device)
self.proj_drop = nn.Dropout(proj_drop)
def forward(self, x, freqs_cis_img=None):
B, N, C = x.shape
qkv = self.Wqkv(x).reshape(B, N, 3, self.num_heads, self.head_dim).permute(2, 0, 3, 1, 4) # [3, b, h, s, d]
q, k, v = qkv.unbind(0) # [b, h, s, d]
q = self.q_norm(q) # [b, h, s, d]
k = self.k_norm(k) # [b, h, s, d]
# Apply RoPE if needed
if freqs_cis_img is not None:
qq, kk = apply_rotary_emb(q, k, freqs_cis_img, head_first=True)
assert qq.shape == q.shape and kk.shape == k.shape, \
f'qq: {qq.shape}, q: {q.shape}, kk: {kk.shape}, k: {k.shape}'
q, k = qq, kk
x = optimized_attention(q, k, v, self.num_heads, skip_reshape=True, attn_precision=self.attn_precision)
x = self.out_proj(x)
x = self.proj_drop(x)
out_tuple = (x,)
return out_tuple

View File

@ -0,0 +1,321 @@
from typing import Any, Optional
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils import checkpoint
from comfy.ldm.modules.diffusionmodules.mmdit import (
Mlp,
TimestepEmbedder,
PatchEmbed,
RMSNorm,
)
from comfy.ldm.modules.diffusionmodules.util import timestep_embedding
from .poolers import AttentionPool
import comfy.latent_formats
from .models import HunYuanDiTBlock, calc_rope
from .posemb_layers import get_2d_rotary_pos_embed, get_fill_resize_and_crop
class HunYuanControlNet(nn.Module):
"""
HunYuanDiT: Diffusion model with a Transformer backbone.
Inherit ModelMixin and ConfigMixin to be compatible with the sampler StableDiffusionPipeline of diffusers.
Inherit PeftAdapterMixin to be compatible with the PEFT training pipeline.
Parameters
----------
args: argparse.Namespace
The arguments parsed by argparse.
input_size: tuple
The size of the input image.
patch_size: int
The size of the patch.
in_channels: int
The number of input channels.
hidden_size: int
The hidden size of the transformer backbone.
depth: int
The number of transformer blocks.
num_heads: int
The number of attention heads.
mlp_ratio: float
The ratio of the hidden size of the MLP in the transformer block.
log_fn: callable
The logging function.
"""
def __init__(
self,
input_size: tuple = 128,
patch_size: int = 2,
in_channels: int = 4,
hidden_size: int = 1408,
depth: int = 40,
num_heads: int = 16,
mlp_ratio: float = 4.3637,
text_states_dim=1024,
text_states_dim_t5=2048,
text_len=77,
text_len_t5=256,
qk_norm=True, # See http://arxiv.org/abs/2302.05442 for details.
size_cond=False,
use_style_cond=False,
learn_sigma=True,
norm="layer",
log_fn: callable = print,
attn_precision=None,
dtype=None,
device=None,
operations=None,
**kwargs,
):
super().__init__()
self.log_fn = log_fn
self.depth = depth
self.learn_sigma = learn_sigma
self.in_channels = in_channels
self.out_channels = in_channels * 2 if learn_sigma else in_channels
self.patch_size = patch_size
self.num_heads = num_heads
self.hidden_size = hidden_size
self.text_states_dim = text_states_dim
self.text_states_dim_t5 = text_states_dim_t5
self.text_len = text_len
self.text_len_t5 = text_len_t5
self.size_cond = size_cond
self.use_style_cond = use_style_cond
self.norm = norm
self.dtype = dtype
self.latent_format = comfy.latent_formats.SDXL
self.mlp_t5 = nn.Sequential(
nn.Linear(
self.text_states_dim_t5,
self.text_states_dim_t5 * 4,
bias=True,
dtype=dtype,
device=device,
),
nn.SiLU(),
nn.Linear(
self.text_states_dim_t5 * 4,
self.text_states_dim,
bias=True,
dtype=dtype,
device=device,
),
)
# learnable replace
self.text_embedding_padding = nn.Parameter(
torch.randn(
self.text_len + self.text_len_t5,
self.text_states_dim,
dtype=dtype,
device=device,
)
)
# Attention pooling
pooler_out_dim = 1024
self.pooler = AttentionPool(
self.text_len_t5,
self.text_states_dim_t5,
num_heads=8,
output_dim=pooler_out_dim,
dtype=dtype,
device=device,
operations=operations,
)
# Dimension of the extra input vectors
self.extra_in_dim = pooler_out_dim
if self.size_cond:
# Image size and crop size conditions
self.extra_in_dim += 6 * 256
if self.use_style_cond:
# Here we use a default learned embedder layer for future extension.
self.style_embedder = nn.Embedding(
1, hidden_size, dtype=dtype, device=device
)
self.extra_in_dim += hidden_size
# Text embedding for `add`
self.x_embedder = PatchEmbed(
input_size,
patch_size,
in_channels,
hidden_size,
dtype=dtype,
device=device,
operations=operations,
)
self.t_embedder = TimestepEmbedder(
hidden_size, dtype=dtype, device=device, operations=operations
)
self.extra_embedder = nn.Sequential(
operations.Linear(
self.extra_in_dim, hidden_size * 4, dtype=dtype, device=device
),
nn.SiLU(),
operations.Linear(
hidden_size * 4, hidden_size, bias=True, dtype=dtype, device=device
),
)
# Image embedding
num_patches = self.x_embedder.num_patches
# HUnYuanDiT Blocks
self.blocks = nn.ModuleList(
[
HunYuanDiTBlock(
hidden_size=hidden_size,
c_emb_size=hidden_size,
num_heads=num_heads,
mlp_ratio=mlp_ratio,
text_states_dim=self.text_states_dim,
qk_norm=qk_norm,
norm_type=self.norm,
skip=False,
attn_precision=attn_precision,
dtype=dtype,
device=device,
operations=operations,
)
for _ in range(19)
]
)
# Input zero linear for the first block
self.before_proj = operations.Linear(self.hidden_size, self.hidden_size, dtype=dtype, device=device)
# Output zero linear for the every block
self.after_proj_list = nn.ModuleList(
[
operations.Linear(
self.hidden_size, self.hidden_size, dtype=dtype, device=device
)
for _ in range(len(self.blocks))
]
)
def forward(
self,
x,
hint,
timesteps,
context,#encoder_hidden_states=None,
text_embedding_mask=None,
encoder_hidden_states_t5=None,
text_embedding_mask_t5=None,
image_meta_size=None,
style=None,
return_dict=False,
**kwarg,
):
"""
Forward pass of the encoder.
Parameters
----------
x: torch.Tensor
(B, D, H, W)
t: torch.Tensor
(B)
encoder_hidden_states: torch.Tensor
CLIP text embedding, (B, L_clip, D)
text_embedding_mask: torch.Tensor
CLIP text embedding mask, (B, L_clip)
encoder_hidden_states_t5: torch.Tensor
T5 text embedding, (B, L_t5, D)
text_embedding_mask_t5: torch.Tensor
T5 text embedding mask, (B, L_t5)
image_meta_size: torch.Tensor
(B, 6)
style: torch.Tensor
(B)
cos_cis_img: torch.Tensor
sin_cis_img: torch.Tensor
return_dict: bool
Whether to return a dictionary.
"""
condition = hint
if condition.shape[0] == 1:
condition = torch.repeat_interleave(condition, x.shape[0], dim=0)
text_states = context # 2,77,1024
text_states_t5 = encoder_hidden_states_t5 # 2,256,2048
text_states_mask = text_embedding_mask.bool() # 2,77
text_states_t5_mask = text_embedding_mask_t5.bool() # 2,256
b_t5, l_t5, c_t5 = text_states_t5.shape
text_states_t5 = self.mlp_t5(text_states_t5.view(-1, c_t5)).view(b_t5, l_t5, -1)
padding = comfy.ops.cast_to_input(self.text_embedding_padding, text_states)
text_states[:, -self.text_len :] = torch.where(
text_states_mask[:, -self.text_len :].unsqueeze(2),
text_states[:, -self.text_len :],
padding[: self.text_len],
)
text_states_t5[:, -self.text_len_t5 :] = torch.where(
text_states_t5_mask[:, -self.text_len_t5 :].unsqueeze(2),
text_states_t5[:, -self.text_len_t5 :],
padding[self.text_len :],
)
text_states = torch.cat([text_states, text_states_t5], dim=1) # 2,2051024
# _, _, oh, ow = x.shape
# th, tw = oh // self.patch_size, ow // self.patch_size
# Get image RoPE embedding according to `reso`lution.
freqs_cis_img = calc_rope(
x, self.patch_size, self.hidden_size // self.num_heads
) # (cos_cis_img, sin_cis_img)
# ========================= Build time and image embedding =========================
t = self.t_embedder(timesteps, dtype=self.dtype)
x = self.x_embedder(x)
# ========================= Concatenate all extra vectors =========================
# Build text tokens with pooling
extra_vec = self.pooler(encoder_hidden_states_t5)
# Build image meta size tokens if applicable
# if image_meta_size is not None:
# image_meta_size = timestep_embedding(image_meta_size.view(-1), 256) # [B * 6, 256]
# if image_meta_size.dtype != self.dtype:
# image_meta_size = image_meta_size.half()
# image_meta_size = image_meta_size.view(-1, 6 * 256)
# extra_vec = torch.cat([extra_vec, image_meta_size], dim=1) # [B, D + 6 * 256]
# Build style tokens
if style is not None:
style_embedding = self.style_embedder(style)
extra_vec = torch.cat([extra_vec, style_embedding], dim=1)
# Concatenate all extra vectors
c = t + self.extra_embedder(extra_vec) # [B, D]
# ========================= Deal with Condition =========================
condition = self.x_embedder(condition)
# ========================= Forward pass through HunYuanDiT blocks =========================
controls = []
x = x + self.before_proj(condition) # add condition
for layer, block in enumerate(self.blocks):
x = block(x, c, text_states, freqs_cis_img)
controls.append(self.after_proj_list[layer](x)) # zero linear for output
return {"output": controls}

410
comfy/ldm/hydit/models.py Normal file
View File

@ -0,0 +1,410 @@
from typing import Any
import torch
import torch.nn as nn
import torch.nn.functional as F
import comfy.ops
from comfy.ldm.modules.diffusionmodules.mmdit import Mlp, TimestepEmbedder, PatchEmbed, RMSNorm
from comfy.ldm.modules.diffusionmodules.util import timestep_embedding
from torch.utils import checkpoint
from .attn_layers import Attention, CrossAttention
from .poolers import AttentionPool
from .posemb_layers import get_2d_rotary_pos_embed, get_fill_resize_and_crop
def calc_rope(x, patch_size, head_size):
th = (x.shape[2] + (patch_size // 2)) // patch_size
tw = (x.shape[3] + (patch_size // 2)) // patch_size
base_size = 512 // 8 // patch_size
start, stop = get_fill_resize_and_crop((th, tw), base_size)
sub_args = [start, stop, (th, tw)]
# head_size = HUNYUAN_DIT_CONFIG['DiT-g/2']['hidden_size'] // HUNYUAN_DIT_CONFIG['DiT-g/2']['num_heads']
rope = get_2d_rotary_pos_embed(head_size, *sub_args)
rope = (rope[0].to(x), rope[1].to(x))
return rope
def modulate(x, shift, scale):
return x * (1 + scale.unsqueeze(1)) + shift.unsqueeze(1)
class HunYuanDiTBlock(nn.Module):
"""
A HunYuanDiT block with `add` conditioning.
"""
def __init__(self,
hidden_size,
c_emb_size,
num_heads,
mlp_ratio=4.0,
text_states_dim=1024,
qk_norm=False,
norm_type="layer",
skip=False,
attn_precision=None,
dtype=None,
device=None,
operations=None,
):
super().__init__()
use_ele_affine = True
if norm_type == "layer":
norm_layer = operations.LayerNorm
elif norm_type == "rms":
norm_layer = RMSNorm
else:
raise ValueError(f"Unknown norm_type: {norm_type}")
# ========================= Self-Attention =========================
self.norm1 = norm_layer(hidden_size, elementwise_affine=use_ele_affine, eps=1e-6, dtype=dtype, device=device)
self.attn1 = Attention(hidden_size, num_heads=num_heads, qkv_bias=True, qk_norm=qk_norm, attn_precision=attn_precision, dtype=dtype, device=device, operations=operations)
# ========================= FFN =========================
self.norm2 = norm_layer(hidden_size, elementwise_affine=use_ele_affine, eps=1e-6, dtype=dtype, device=device)
mlp_hidden_dim = int(hidden_size * mlp_ratio)
approx_gelu = lambda: nn.GELU(approximate="tanh")
self.mlp = Mlp(in_features=hidden_size, hidden_features=mlp_hidden_dim, act_layer=approx_gelu, drop=0, dtype=dtype, device=device, operations=operations)
# ========================= Add =========================
# Simply use add like SDXL.
self.default_modulation = nn.Sequential(
nn.SiLU(),
operations.Linear(c_emb_size, hidden_size, bias=True, dtype=dtype, device=device)
)
# ========================= Cross-Attention =========================
self.attn2 = CrossAttention(hidden_size, text_states_dim, num_heads=num_heads, qkv_bias=True,
qk_norm=qk_norm, attn_precision=attn_precision, dtype=dtype, device=device, operations=operations)
self.norm3 = norm_layer(hidden_size, elementwise_affine=True, eps=1e-6, dtype=dtype, device=device)
# ========================= Skip Connection =========================
if skip:
self.skip_norm = norm_layer(2 * hidden_size, elementwise_affine=True, eps=1e-6, dtype=dtype, device=device)
self.skip_linear = operations.Linear(2 * hidden_size, hidden_size, dtype=dtype, device=device)
else:
self.skip_linear = None
self.gradient_checkpointing = False
def _forward(self, x, c=None, text_states=None, freq_cis_img=None, skip=None):
# Long Skip Connection
if self.skip_linear is not None:
cat = torch.cat([x, skip], dim=-1)
if cat.dtype != x.dtype:
cat = cat.to(x.dtype)
cat = self.skip_norm(cat)
x = self.skip_linear(cat)
# Self-Attention
shift_msa = self.default_modulation(c).unsqueeze(dim=1)
attn_inputs = (
self.norm1(x) + shift_msa, freq_cis_img,
)
x = x + self.attn1(*attn_inputs)[0]
# Cross-Attention
cross_inputs = (
self.norm3(x), text_states, freq_cis_img
)
x = x + self.attn2(*cross_inputs)[0]
# FFN Layer
mlp_inputs = self.norm2(x)
x = x + self.mlp(mlp_inputs)
return x
def forward(self, x, c=None, text_states=None, freq_cis_img=None, skip=None):
if self.gradient_checkpointing and self.training:
return checkpoint.checkpoint(self._forward, x, c, text_states, freq_cis_img, skip)
return self._forward(x, c, text_states, freq_cis_img, skip)
class FinalLayer(nn.Module):
"""
The final layer of HunYuanDiT.
"""
def __init__(self, final_hidden_size, c_emb_size, patch_size, out_channels, dtype=None, device=None, operations=None):
super().__init__()
self.norm_final = operations.LayerNorm(final_hidden_size, elementwise_affine=False, eps=1e-6, dtype=dtype, device=device)
self.linear = operations.Linear(final_hidden_size, patch_size * patch_size * out_channels, bias=True, dtype=dtype, device=device)
self.adaLN_modulation = nn.Sequential(
nn.SiLU(),
operations.Linear(c_emb_size, 2 * final_hidden_size, bias=True, dtype=dtype, device=device)
)
def forward(self, x, c):
shift, scale = self.adaLN_modulation(c).chunk(2, dim=1)
x = modulate(self.norm_final(x), shift, scale)
x = self.linear(x)
return x
class HunYuanDiT(nn.Module):
"""
HunYuanDiT: Diffusion model with a Transformer backbone.
Inherit ModelMixin and ConfigMixin to be compatible with the sampler StableDiffusionPipeline of diffusers.
Inherit PeftAdapterMixin to be compatible with the PEFT training pipeline.
Parameters
----------
args: argparse.Namespace
The arguments parsed by argparse.
input_size: tuple
The size of the input image.
patch_size: int
The size of the patch.
in_channels: int
The number of input channels.
hidden_size: int
The hidden size of the transformer backbone.
depth: int
The number of transformer blocks.
num_heads: int
The number of attention heads.
mlp_ratio: float
The ratio of the hidden size of the MLP in the transformer block.
log_fn: callable
The logging function.
"""
#@register_to_config
def __init__(self,
input_size: tuple = 32,
patch_size: int = 2,
in_channels: int = 4,
hidden_size: int = 1152,
depth: int = 28,
num_heads: int = 16,
mlp_ratio: float = 4.0,
text_states_dim = 1024,
text_states_dim_t5 = 2048,
text_len = 77,
text_len_t5 = 256,
qk_norm = True,# See http://arxiv.org/abs/2302.05442 for details.
size_cond = False,
use_style_cond = False,
learn_sigma = True,
norm = "layer",
log_fn: callable = print,
attn_precision=None,
dtype=None,
device=None,
operations=None,
**kwargs,
):
super().__init__()
self.log_fn = log_fn
self.depth = depth
self.learn_sigma = learn_sigma
self.in_channels = in_channels
self.out_channels = in_channels * 2 if learn_sigma else in_channels
self.patch_size = patch_size
self.num_heads = num_heads
self.hidden_size = hidden_size
self.text_states_dim = text_states_dim
self.text_states_dim_t5 = text_states_dim_t5
self.text_len = text_len
self.text_len_t5 = text_len_t5
self.size_cond = size_cond
self.use_style_cond = use_style_cond
self.norm = norm
self.dtype = dtype
#import pdb
#pdb.set_trace()
self.mlp_t5 = nn.Sequential(
operations.Linear(self.text_states_dim_t5, self.text_states_dim_t5 * 4, bias=True, dtype=dtype, device=device),
nn.SiLU(),
operations.Linear(self.text_states_dim_t5 * 4, self.text_states_dim, bias=True, dtype=dtype, device=device),
)
# learnable replace
self.text_embedding_padding = nn.Parameter(
torch.empty(self.text_len + self.text_len_t5, self.text_states_dim, dtype=dtype, device=device))
# Attention pooling
pooler_out_dim = 1024
self.pooler = AttentionPool(self.text_len_t5, self.text_states_dim_t5, num_heads=8, output_dim=pooler_out_dim, dtype=dtype, device=device, operations=operations)
# Dimension of the extra input vectors
self.extra_in_dim = pooler_out_dim
if self.size_cond:
# Image size and crop size conditions
self.extra_in_dim += 6 * 256
if self.use_style_cond:
# Here we use a default learned embedder layer for future extension.
self.style_embedder = operations.Embedding(1, hidden_size, dtype=dtype, device=device)
self.extra_in_dim += hidden_size
# Text embedding for `add`
self.x_embedder = PatchEmbed(input_size, patch_size, in_channels, hidden_size, dtype=dtype, device=device, operations=operations)
self.t_embedder = TimestepEmbedder(hidden_size, dtype=dtype, device=device, operations=operations)
self.extra_embedder = nn.Sequential(
operations.Linear(self.extra_in_dim, hidden_size * 4, dtype=dtype, device=device),
nn.SiLU(),
operations.Linear(hidden_size * 4, hidden_size, bias=True, dtype=dtype, device=device),
)
# Image embedding
num_patches = self.x_embedder.num_patches
# HUnYuanDiT Blocks
self.blocks = nn.ModuleList([
HunYuanDiTBlock(hidden_size=hidden_size,
c_emb_size=hidden_size,
num_heads=num_heads,
mlp_ratio=mlp_ratio,
text_states_dim=self.text_states_dim,
qk_norm=qk_norm,
norm_type=self.norm,
skip=layer > depth // 2,
attn_precision=attn_precision,
dtype=dtype,
device=device,
operations=operations,
)
for layer in range(depth)
])
self.final_layer = FinalLayer(hidden_size, hidden_size, patch_size, self.out_channels, dtype=dtype, device=device, operations=operations)
self.unpatchify_channels = self.out_channels
def forward(self,
x,
t,
context,#encoder_hidden_states=None,
text_embedding_mask=None,
encoder_hidden_states_t5=None,
text_embedding_mask_t5=None,
image_meta_size=None,
style=None,
return_dict=False,
control=None,
transformer_options=None,
):
"""
Forward pass of the encoder.
Parameters
----------
x: torch.Tensor
(B, D, H, W)
t: torch.Tensor
(B)
encoder_hidden_states: torch.Tensor
CLIP text embedding, (B, L_clip, D)
text_embedding_mask: torch.Tensor
CLIP text embedding mask, (B, L_clip)
encoder_hidden_states_t5: torch.Tensor
T5 text embedding, (B, L_t5, D)
text_embedding_mask_t5: torch.Tensor
T5 text embedding mask, (B, L_t5)
image_meta_size: torch.Tensor
(B, 6)
style: torch.Tensor
(B)
cos_cis_img: torch.Tensor
sin_cis_img: torch.Tensor
return_dict: bool
Whether to return a dictionary.
"""
#import pdb
#pdb.set_trace()
encoder_hidden_states = context
text_states = encoder_hidden_states # 2,77,1024
text_states_t5 = encoder_hidden_states_t5 # 2,256,2048
text_states_mask = text_embedding_mask.bool() # 2,77
text_states_t5_mask = text_embedding_mask_t5.bool() # 2,256
b_t5, l_t5, c_t5 = text_states_t5.shape
text_states_t5 = self.mlp_t5(text_states_t5.view(-1, c_t5)).view(b_t5, l_t5, -1)
padding = comfy.ops.cast_to_input(self.text_embedding_padding, text_states)
text_states[:,-self.text_len:] = torch.where(text_states_mask[:,-self.text_len:].unsqueeze(2), text_states[:,-self.text_len:], padding[:self.text_len])
text_states_t5[:,-self.text_len_t5:] = torch.where(text_states_t5_mask[:,-self.text_len_t5:].unsqueeze(2), text_states_t5[:,-self.text_len_t5:], padding[self.text_len:])
text_states = torch.cat([text_states, text_states_t5], dim=1) # 2,2051024
# clip_t5_mask = torch.cat([text_states_mask, text_states_t5_mask], dim=-1)
_, _, oh, ow = x.shape
th, tw = (oh + (self.patch_size // 2)) // self.patch_size, (ow + (self.patch_size // 2)) // self.patch_size
# Get image RoPE embedding according to `reso`lution.
freqs_cis_img = calc_rope(x, self.patch_size, self.hidden_size // self.num_heads) #(cos_cis_img, sin_cis_img)
# ========================= Build time and image embedding =========================
t = self.t_embedder(t, dtype=x.dtype)
x = self.x_embedder(x)
# ========================= Concatenate all extra vectors =========================
# Build text tokens with pooling
extra_vec = self.pooler(encoder_hidden_states_t5)
# Build image meta size tokens if applicable
if self.size_cond:
image_meta_size = timestep_embedding(image_meta_size.view(-1), 256).to(x.dtype) # [B * 6, 256]
image_meta_size = image_meta_size.view(-1, 6 * 256)
extra_vec = torch.cat([extra_vec, image_meta_size], dim=1) # [B, D + 6 * 256]
# Build style tokens
if self.use_style_cond:
if style is None:
style = torch.zeros((extra_vec.shape[0],), device=x.device, dtype=torch.int)
style_embedding = self.style_embedder(style, out_dtype=x.dtype)
extra_vec = torch.cat([extra_vec, style_embedding], dim=1)
# Concatenate all extra vectors
c = t + self.extra_embedder(extra_vec) # [B, D]
controls = None
if control:
controls = control.get("output", None)
# ========================= Forward pass through HunYuanDiT blocks =========================
skips = []
for layer, block in enumerate(self.blocks):
if layer > self.depth // 2:
if controls is not None:
skip = skips.pop() + controls.pop().to(dtype=x.dtype)
else:
skip = skips.pop()
x = block(x, c, text_states, freqs_cis_img, skip) # (N, L, D)
else:
x = block(x, c, text_states, freqs_cis_img) # (N, L, D)
if layer < (self.depth // 2 - 1):
skips.append(x)
if controls is not None and len(controls) != 0:
raise ValueError("The number of controls is not equal to the number of skip connections.")
# ========================= Final layer =========================
x = self.final_layer(x, c) # (N, L, patch_size ** 2 * out_channels)
x = self.unpatchify(x, th, tw) # (N, out_channels, H, W)
if return_dict:
return {'x': x}
if self.learn_sigma:
return x[:,:self.out_channels // 2,:oh,:ow]
return x[:,:,:oh,:ow]
def unpatchify(self, x, h, w):
"""
x: (N, T, patch_size**2 * C)
imgs: (N, H, W, C)
"""
c = self.unpatchify_channels
p = self.x_embedder.patch_size[0]
# h = w = int(x.shape[1] ** 0.5)
assert h * w == x.shape[1]
x = x.reshape(shape=(x.shape[0], h, w, p, p, c))
x = torch.einsum('nhwpqc->nchpwq', x)
imgs = x.reshape(shape=(x.shape[0], c, h * p, w * p))
return imgs

View File

@ -0,0 +1,37 @@
import torch
import torch.nn as nn
import torch.nn.functional as F
from comfy.ldm.modules.attention import optimized_attention
import comfy.ops
class AttentionPool(nn.Module):
def __init__(self, spacial_dim: int, embed_dim: int, num_heads: int, output_dim: int = None, dtype=None, device=None, operations=None):
super().__init__()
self.positional_embedding = nn.Parameter(torch.empty(spacial_dim + 1, embed_dim, dtype=dtype, device=device))
self.k_proj = operations.Linear(embed_dim, embed_dim, dtype=dtype, device=device)
self.q_proj = operations.Linear(embed_dim, embed_dim, dtype=dtype, device=device)
self.v_proj = operations.Linear(embed_dim, embed_dim, dtype=dtype, device=device)
self.c_proj = operations.Linear(embed_dim, output_dim or embed_dim, dtype=dtype, device=device)
self.num_heads = num_heads
self.embed_dim = embed_dim
def forward(self, x):
x = x[:,:self.positional_embedding.shape[0] - 1]
x = x.permute(1, 0, 2) # NLC -> LNC
x = torch.cat([x.mean(dim=0, keepdim=True), x], dim=0) # (L+1)NC
x = x + comfy.ops.cast_to_input(self.positional_embedding[:, None, :], x) # (L+1)NC
q = self.q_proj(x[:1])
k = self.k_proj(x)
v = self.v_proj(x)
batch_size = q.shape[1]
head_dim = self.embed_dim // self.num_heads
q = q.view(1, batch_size * self.num_heads, head_dim).transpose(0, 1).view(batch_size, self.num_heads, -1, head_dim)
k = k.view(k.shape[0], batch_size * self.num_heads, head_dim).transpose(0, 1).view(batch_size, self.num_heads, -1, head_dim)
v = v.view(v.shape[0], batch_size * self.num_heads, head_dim).transpose(0, 1).view(batch_size, self.num_heads, -1, head_dim)
attn_output = optimized_attention(q, k, v, self.num_heads, skip_reshape=True).transpose(0, 1)
attn_output = self.c_proj(attn_output)
return attn_output.squeeze(0)

View File

@ -0,0 +1,224 @@
import torch
import numpy as np
from typing import Union
def _to_tuple(x):
if isinstance(x, int):
return x, x
else:
return x
def get_fill_resize_and_crop(src, tgt):
th, tw = _to_tuple(tgt)
h, w = _to_tuple(src)
tr = th / tw # base resolution
r = h / w # target resolution
# resize
if r > tr:
resize_height = th
resize_width = int(round(th / h * w))
else:
resize_width = tw
resize_height = int(round(tw / w * h)) # resize the target resolution down based on the base resolution
crop_top = int(round((th - resize_height) / 2.0))
crop_left = int(round((tw - resize_width) / 2.0))
return (crop_top, crop_left), (crop_top + resize_height, crop_left + resize_width)
def get_meshgrid(start, *args):
if len(args) == 0:
# start is grid_size
num = _to_tuple(start)
start = (0, 0)
stop = num
elif len(args) == 1:
# start is start, args[0] is stop, step is 1
start = _to_tuple(start)
stop = _to_tuple(args[0])
num = (stop[0] - start[0], stop[1] - start[1])
elif len(args) == 2:
# start is start, args[0] is stop, args[1] is num
start = _to_tuple(start)
stop = _to_tuple(args[0])
num = _to_tuple(args[1])
else:
raise ValueError(f"len(args) should be 0, 1 or 2, but got {len(args)}")
grid_h = np.linspace(start[0], stop[0], num[0], endpoint=False, dtype=np.float32)
grid_w = np.linspace(start[1], stop[1], num[1], endpoint=False, dtype=np.float32)
grid = np.meshgrid(grid_w, grid_h) # here w goes first
grid = np.stack(grid, axis=0) # [2, W, H]
return grid
#################################################################################
# Sine/Cosine Positional Embedding Functions #
#################################################################################
# https://github.com/facebookresearch/mae/blob/main/util/pos_embed.py
def get_2d_sincos_pos_embed(embed_dim, start, *args, cls_token=False, extra_tokens=0):
"""
grid_size: int of the grid height and width
return:
pos_embed: [grid_size*grid_size, embed_dim] or [1+grid_size*grid_size, embed_dim] (w/ or w/o cls_token)
"""
grid = get_meshgrid(start, *args) # [2, H, w]
# grid_h = np.arange(grid_size, dtype=np.float32)
# grid_w = np.arange(grid_size, dtype=np.float32)
# grid = np.meshgrid(grid_w, grid_h) # here w goes first
# grid = np.stack(grid, axis=0) # [2, W, H]
grid = grid.reshape([2, 1, *grid.shape[1:]])
pos_embed = get_2d_sincos_pos_embed_from_grid(embed_dim, grid)
if cls_token and extra_tokens > 0:
pos_embed = np.concatenate([np.zeros([extra_tokens, embed_dim]), pos_embed], axis=0)
return pos_embed
def get_2d_sincos_pos_embed_from_grid(embed_dim, grid):
assert embed_dim % 2 == 0
# use half of dimensions to encode grid_h
emb_h = get_1d_sincos_pos_embed_from_grid(embed_dim // 2, grid[0]) # (H*W, D/2)
emb_w = get_1d_sincos_pos_embed_from_grid(embed_dim // 2, grid[1]) # (H*W, D/2)
emb = np.concatenate([emb_h, emb_w], axis=1) # (H*W, D)
return emb
def get_1d_sincos_pos_embed_from_grid(embed_dim, pos):
"""
embed_dim: output dimension for each position
pos: a list of positions to be encoded: size (W,H)
out: (M, D)
"""
assert embed_dim % 2 == 0
omega = np.arange(embed_dim // 2, dtype=np.float64)
omega /= embed_dim / 2.
omega = 1. / 10000**omega # (D/2,)
pos = pos.reshape(-1) # (M,)
out = np.einsum('m,d->md', pos, omega) # (M, D/2), outer product
emb_sin = np.sin(out) # (M, D/2)
emb_cos = np.cos(out) # (M, D/2)
emb = np.concatenate([emb_sin, emb_cos], axis=1) # (M, D)
return emb
#################################################################################
# Rotary Positional Embedding Functions #
#################################################################################
# https://github.com/facebookresearch/llama/blob/main/llama/model.py#L443
def get_2d_rotary_pos_embed(embed_dim, start, *args, use_real=True):
"""
This is a 2d version of precompute_freqs_cis, which is a RoPE for image tokens with 2d structure.
Parameters
----------
embed_dim: int
embedding dimension size
start: int or tuple of int
If len(args) == 0, start is num; If len(args) == 1, start is start, args[0] is stop, step is 1;
If len(args) == 2, start is start, args[0] is stop, args[1] is num.
use_real: bool
If True, return real part and imaginary part separately. Otherwise, return complex numbers.
Returns
-------
pos_embed: torch.Tensor
[HW, D/2]
"""
grid = get_meshgrid(start, *args) # [2, H, w]
grid = grid.reshape([2, 1, *grid.shape[1:]]) # Returns a sampling matrix with the same resolution as the target resolution
pos_embed = get_2d_rotary_pos_embed_from_grid(embed_dim, grid, use_real=use_real)
return pos_embed
def get_2d_rotary_pos_embed_from_grid(embed_dim, grid, use_real=False):
assert embed_dim % 4 == 0
# use half of dimensions to encode grid_h
emb_h = get_1d_rotary_pos_embed(embed_dim // 2, grid[0].reshape(-1), use_real=use_real) # (H*W, D/4)
emb_w = get_1d_rotary_pos_embed(embed_dim // 2, grid[1].reshape(-1), use_real=use_real) # (H*W, D/4)
if use_real:
cos = torch.cat([emb_h[0], emb_w[0]], dim=1) # (H*W, D/2)
sin = torch.cat([emb_h[1], emb_w[1]], dim=1) # (H*W, D/2)
return cos, sin
else:
emb = torch.cat([emb_h, emb_w], dim=1) # (H*W, D/2)
return emb
def get_1d_rotary_pos_embed(dim: int, pos: Union[np.ndarray, int], theta: float = 10000.0, use_real=False):
"""
Precompute the frequency tensor for complex exponentials (cis) with given dimensions.
This function calculates a frequency tensor with complex exponentials using the given dimension 'dim'
and the end index 'end'. The 'theta' parameter scales the frequencies.
The returned tensor contains complex values in complex64 data type.
Args:
dim (int): Dimension of the frequency tensor.
pos (np.ndarray, int): Position indices for the frequency tensor. [S] or scalar
theta (float, optional): Scaling factor for frequency computation. Defaults to 10000.0.
use_real (bool, optional): If True, return real part and imaginary part separately.
Otherwise, return complex numbers.
Returns:
torch.Tensor: Precomputed frequency tensor with complex exponentials. [S, D/2]
"""
if isinstance(pos, int):
pos = np.arange(pos)
freqs = 1.0 / (theta ** (torch.arange(0, dim, 2)[: (dim // 2)].float() / dim)) # [D/2]
t = torch.from_numpy(pos).to(freqs.device) # type: ignore # [S]
freqs = torch.outer(t, freqs).float() # type: ignore # [S, D/2]
if use_real:
freqs_cos = freqs.cos().repeat_interleave(2, dim=1) # [S, D]
freqs_sin = freqs.sin().repeat_interleave(2, dim=1) # [S, D]
return freqs_cos, freqs_sin
else:
freqs_cis = torch.polar(torch.ones_like(freqs), freqs) # complex64 # [S, D/2]
return freqs_cis
def calc_sizes(rope_img, patch_size, th, tw):
if rope_img == 'extend':
# Expansion mode
sub_args = [(th, tw)]
elif rope_img.startswith('base'):
# Based on the specified dimensions, other dimensions are obtained through interpolation.
base_size = int(rope_img[4:]) // 8 // patch_size
start, stop = get_fill_resize_and_crop((th, tw), base_size)
sub_args = [start, stop, (th, tw)]
else:
raise ValueError(f"Unknown rope_img: {rope_img}")
return sub_args
def init_image_posemb(rope_img,
resolutions,
patch_size,
hidden_size,
num_heads,
log_fn,
rope_real=True,
):
freqs_cis_img = {}
for reso in resolutions:
th, tw = reso.height // 8 // patch_size, reso.width // 8 // patch_size
sub_args = calc_sizes(rope_img, patch_size, th, tw)
freqs_cis_img[str(reso)] = get_2d_rotary_pos_embed(hidden_size // num_heads, *sub_args, use_real=rope_real)
log_fn(f" Using image RoPE ({rope_img}) ({'real' if rope_real else 'complex'}): {sub_args} | ({reso}) "
f"{freqs_cis_img[str(reso)][0].shape if rope_real else freqs_cis_img[str(reso)].shape}")
return freqs_cis_img

View File

@ -1,68 +1,66 @@
import torch
# import pytorch_lightning as pl
import torch.nn.functional as F
from contextlib import contextmanager
from typing import Any, Dict, List, Optional, Tuple, Union
from comfy.ldm.modules.diffusionmodules.model import Encoder, Decoder
from comfy.ldm.modules.distributions.distributions import DiagonalGaussianDistribution
from comfy.ldm.util import instantiate_from_config
from comfy.ldm.modules.ema import LitEma
import comfy.ops
# class AutoencoderKL(pl.LightningModule):
class AutoencoderKL(torch.nn.Module):
def __init__(self,
ddconfig,
lossconfig,
embed_dim,
ckpt_path=None,
ignore_keys=[],
image_key="image",
colorize_nlabels=None,
monitor=None,
ema_decay=None,
learn_logvar=False
):
class DiagonalGaussianRegularizer(torch.nn.Module):
def __init__(self, sample: bool = True):
super().__init__()
self.learn_logvar = learn_logvar
self.image_key = image_key
self.encoder = Encoder(**ddconfig)
self.decoder = Decoder(**ddconfig)
self.loss = instantiate_from_config(lossconfig)
assert ddconfig["double_z"]
self.quant_conv = torch.nn.Conv2d(2*ddconfig["z_channels"], 2*embed_dim, 1)
self.post_quant_conv = torch.nn.Conv2d(embed_dim, ddconfig["z_channels"], 1)
self.embed_dim = embed_dim
if colorize_nlabels is not None:
assert type(colorize_nlabels)==int
self.register_buffer("colorize", torch.randn(3, colorize_nlabels, 1, 1))
self.sample = sample
def get_trainable_parameters(self) -> Any:
yield from ()
def forward(self, z: torch.Tensor) -> Tuple[torch.Tensor, dict]:
log = dict()
posterior = DiagonalGaussianDistribution(z)
if self.sample:
z = posterior.sample()
else:
z = posterior.mode()
kl_loss = posterior.kl()
kl_loss = torch.sum(kl_loss) / kl_loss.shape[0]
log["kl_loss"] = kl_loss
return z, log
class AbstractAutoencoder(torch.nn.Module):
"""
This is the base class for all autoencoders, including image autoencoders, image autoencoders with discriminators,
unCLIP models, etc. Hence, it is fairly general, and specific features
(e.g. discriminator training, encoding, decoding) must be implemented in subclasses.
"""
def __init__(
self,
ema_decay: Union[None, float] = None,
monitor: Union[None, str] = None,
input_key: str = "jpg",
**kwargs,
):
super().__init__()
self.input_key = input_key
self.use_ema = ema_decay is not None
if monitor is not None:
self.monitor = monitor
self.use_ema = ema_decay is not None
if self.use_ema:
self.ema_decay = ema_decay
assert 0. < ema_decay < 1.
self.model_ema = LitEma(self, decay=ema_decay)
print(f"Keeping EMAs of {len(list(self.model_ema.buffers()))}.")
logpy.info(f"Keeping EMAs of {len(list(self.model_ema.buffers()))}.")
if ckpt_path is not None:
self.init_from_ckpt(ckpt_path, ignore_keys=ignore_keys)
def get_input(self, batch) -> Any:
raise NotImplementedError()
def init_from_ckpt(self, path, ignore_keys=list()):
if path.lower().endswith(".safetensors"):
import safetensors.torch
sd = safetensors.torch.load_file(path, device="cpu")
else:
sd = torch.load(path, map_location="cpu")["state_dict"]
keys = list(sd.keys())
for k in keys:
for ik in ignore_keys:
if k.startswith(ik):
print("Deleting key {} from state_dict.".format(k))
del sd[k]
self.load_state_dict(sd, strict=False)
print(f"Restored from {path}")
def on_train_batch_end(self, *args, **kwargs):
# for EMA computation
if self.use_ema:
self.model_ema(self)
@contextmanager
def ema_scope(self, context=None):
@ -70,154 +68,159 @@ class AutoencoderKL(torch.nn.Module):
self.model_ema.store(self.parameters())
self.model_ema.copy_to(self)
if context is not None:
print(f"{context}: Switched to EMA weights")
logpy.info(f"{context}: Switched to EMA weights")
try:
yield None
finally:
if self.use_ema:
self.model_ema.restore(self.parameters())
if context is not None:
print(f"{context}: Restored training weights")
logpy.info(f"{context}: Restored training weights")
def on_train_batch_end(self, *args, **kwargs):
if self.use_ema:
self.model_ema(self)
def encode(self, *args, **kwargs) -> torch.Tensor:
raise NotImplementedError("encode()-method of abstract base class called")
def encode(self, x):
h = self.encoder(x)
moments = self.quant_conv(h)
posterior = DiagonalGaussianDistribution(moments)
return posterior
def decode(self, *args, **kwargs) -> torch.Tensor:
raise NotImplementedError("decode()-method of abstract base class called")
def decode(self, z):
z = self.post_quant_conv(z)
dec = self.decoder(z)
return dec
def instantiate_optimizer_from_config(self, params, lr, cfg):
logpy.info(f"loading >>> {cfg['target']} <<< optimizer from config")
return get_obj_from_str(cfg["target"])(
params, lr=lr, **cfg.get("params", dict())
)
def forward(self, input, sample_posterior=True):
posterior = self.encode(input)
if sample_posterior:
z = posterior.sample()
else:
z = posterior.mode()
dec = self.decode(z)
return dec, posterior
def configure_optimizers(self) -> Any:
raise NotImplementedError()
def get_input(self, batch, k):
x = batch[k]
if len(x.shape) == 3:
x = x[..., None]
x = x.permute(0, 3, 1, 2).to(memory_format=torch.contiguous_format).float()
return x
def training_step(self, batch, batch_idx, optimizer_idx):
inputs = self.get_input(batch, self.image_key)
reconstructions, posterior = self(inputs)
class AutoencodingEngine(AbstractAutoencoder):
"""
Base class for all image autoencoders that we train, like VQGAN or AutoencoderKL
(we also restore them explicitly as special cases for legacy reasons).
Regularizations such as KL or VQ are moved to the regularizer class.
"""
if optimizer_idx == 0:
# train encoder+decoder+logvar
aeloss, log_dict_ae = self.loss(inputs, reconstructions, posterior, optimizer_idx, self.global_step,
last_layer=self.get_last_layer(), split="train")
self.log("aeloss", aeloss, prog_bar=True, logger=True, on_step=True, on_epoch=True)
self.log_dict(log_dict_ae, prog_bar=False, logger=True, on_step=True, on_epoch=False)
return aeloss
def __init__(
self,
*args,
encoder_config: Dict,
decoder_config: Dict,
regularizer_config: Dict,
**kwargs,
):
super().__init__(*args, **kwargs)
if optimizer_idx == 1:
# train the discriminator
discloss, log_dict_disc = self.loss(inputs, reconstructions, posterior, optimizer_idx, self.global_step,
last_layer=self.get_last_layer(), split="train")
self.log("discloss", discloss, prog_bar=True, logger=True, on_step=True, on_epoch=True)
self.log_dict(log_dict_disc, prog_bar=False, logger=True, on_step=True, on_epoch=False)
return discloss
def validation_step(self, batch, batch_idx):
log_dict = self._validation_step(batch, batch_idx)
with self.ema_scope():
log_dict_ema = self._validation_step(batch, batch_idx, postfix="_ema")
return log_dict
def _validation_step(self, batch, batch_idx, postfix=""):
inputs = self.get_input(batch, self.image_key)
reconstructions, posterior = self(inputs)
aeloss, log_dict_ae = self.loss(inputs, reconstructions, posterior, 0, self.global_step,
last_layer=self.get_last_layer(), split="val"+postfix)
discloss, log_dict_disc = self.loss(inputs, reconstructions, posterior, 1, self.global_step,
last_layer=self.get_last_layer(), split="val"+postfix)
self.log(f"val{postfix}/rec_loss", log_dict_ae[f"val{postfix}/rec_loss"])
self.log_dict(log_dict_ae)
self.log_dict(log_dict_disc)
return self.log_dict
def configure_optimizers(self):
lr = self.learning_rate
ae_params_list = list(self.encoder.parameters()) + list(self.decoder.parameters()) + list(
self.quant_conv.parameters()) + list(self.post_quant_conv.parameters())
if self.learn_logvar:
print(f"{self.__class__.__name__}: Learning logvar")
ae_params_list.append(self.loss.logvar)
opt_ae = torch.optim.Adam(ae_params_list,
lr=lr, betas=(0.5, 0.9))
opt_disc = torch.optim.Adam(self.loss.discriminator.parameters(),
lr=lr, betas=(0.5, 0.9))
return [opt_ae, opt_disc], []
self.encoder: torch.nn.Module = instantiate_from_config(encoder_config)
self.decoder: torch.nn.Module = instantiate_from_config(decoder_config)
self.regularization: AbstractRegularizer = instantiate_from_config(
regularizer_config
)
def get_last_layer(self):
return self.decoder.conv_out.weight
return self.decoder.get_last_layer()
@torch.no_grad()
def log_images(self, batch, only_inputs=False, log_ema=False, **kwargs):
log = dict()
x = self.get_input(batch, self.image_key)
x = x.to(self.device)
if not only_inputs:
xrec, posterior = self(x)
if x.shape[1] > 3:
# colorize with random projection
assert xrec.shape[1] > 3
x = self.to_rgb(x)
xrec = self.to_rgb(xrec)
log["samples"] = self.decode(torch.randn_like(posterior.sample()))
log["reconstructions"] = xrec
if log_ema or self.use_ema:
with self.ema_scope():
xrec_ema, posterior_ema = self(x)
if x.shape[1] > 3:
# colorize with random projection
assert xrec_ema.shape[1] > 3
xrec_ema = self.to_rgb(xrec_ema)
log["samples_ema"] = self.decode(torch.randn_like(posterior_ema.sample()))
log["reconstructions_ema"] = xrec_ema
log["inputs"] = x
return log
def encode(
self,
x: torch.Tensor,
return_reg_log: bool = False,
unregularized: bool = False,
) -> Union[torch.Tensor, Tuple[torch.Tensor, dict]]:
z = self.encoder(x)
if unregularized:
return z, dict()
z, reg_log = self.regularization(z)
if return_reg_log:
return z, reg_log
return z
def to_rgb(self, x):
assert self.image_key == "segmentation"
if not hasattr(self, "colorize"):
self.register_buffer("colorize", torch.randn(3, x.shape[1], 1, 1).to(x))
x = F.conv2d(x, weight=self.colorize)
x = 2.*(x-x.min())/(x.max()-x.min()) - 1.
def decode(self, z: torch.Tensor, **kwargs) -> torch.Tensor:
x = self.decoder(z, **kwargs)
return x
def forward(
self, x: torch.Tensor, **additional_decode_kwargs
) -> Tuple[torch.Tensor, torch.Tensor, dict]:
z, reg_log = self.encode(x, return_reg_log=True)
dec = self.decode(z, **additional_decode_kwargs)
return z, dec, reg_log
class IdentityFirstStage(torch.nn.Module):
def __init__(self, *args, vq_interface=False, **kwargs):
self.vq_interface = vq_interface
super().__init__()
def encode(self, x, *args, **kwargs):
return x
class AutoencodingEngineLegacy(AutoencodingEngine):
def __init__(self, embed_dim: int, **kwargs):
self.max_batch_size = kwargs.pop("max_batch_size", None)
ddconfig = kwargs.pop("ddconfig")
super().__init__(
encoder_config={
"target": "comfy.ldm.modules.diffusionmodules.model.Encoder",
"params": ddconfig,
},
decoder_config={
"target": "comfy.ldm.modules.diffusionmodules.model.Decoder",
"params": ddconfig,
},
**kwargs,
)
self.quant_conv = comfy.ops.disable_weight_init.Conv2d(
(1 + ddconfig["double_z"]) * ddconfig["z_channels"],
(1 + ddconfig["double_z"]) * embed_dim,
1,
)
self.post_quant_conv = comfy.ops.disable_weight_init.Conv2d(embed_dim, ddconfig["z_channels"], 1)
self.embed_dim = embed_dim
def decode(self, x, *args, **kwargs):
return x
def get_autoencoder_params(self) -> list:
params = super().get_autoencoder_params()
return params
def quantize(self, x, *args, **kwargs):
if self.vq_interface:
return x, None, [None, None, None]
return x
def encode(
self, x: torch.Tensor, return_reg_log: bool = False
) -> Union[torch.Tensor, Tuple[torch.Tensor, dict]]:
if self.max_batch_size is None:
z = self.encoder(x)
z = self.quant_conv(z)
else:
N = x.shape[0]
bs = self.max_batch_size
n_batches = int(math.ceil(N / bs))
z = list()
for i_batch in range(n_batches):
z_batch = self.encoder(x[i_batch * bs : (i_batch + 1) * bs])
z_batch = self.quant_conv(z_batch)
z.append(z_batch)
z = torch.cat(z, 0)
def forward(self, x, *args, **kwargs):
return x
z, reg_log = self.regularization(z)
if return_reg_log:
return z, reg_log
return z
def decode(self, z: torch.Tensor, **decoder_kwargs) -> torch.Tensor:
if self.max_batch_size is None:
dec = self.post_quant_conv(z)
dec = self.decoder(dec, **decoder_kwargs)
else:
N = z.shape[0]
bs = self.max_batch_size
n_batches = int(math.ceil(N / bs))
dec = list()
for i_batch in range(n_batches):
dec_batch = self.post_quant_conv(z[i_batch * bs : (i_batch + 1) * bs])
dec_batch = self.decoder(dec_batch, **decoder_kwargs)
dec.append(dec_batch)
dec = torch.cat(dec, 0)
return dec
class AutoencoderKL(AutoencodingEngineLegacy):
def __init__(self, **kwargs):
if "lossconfig" in kwargs:
kwargs["loss_config"] = kwargs.pop("lossconfig")
super().__init__(
regularizer_config={
"target": (
"comfy.ldm.models.autoencoder.DiagonalGaussianRegularizer"
)
},
**kwargs,
)

View File

@ -1,412 +0,0 @@
"""SAMPLING ONLY."""
import torch
import numpy as np
from tqdm import tqdm
from comfy.ldm.modules.diffusionmodules.util import make_ddim_sampling_parameters, make_ddim_timesteps, noise_like, extract_into_tensor
class DDIMSampler(object):
def __init__(self, model, schedule="linear", device=torch.device("cuda"), **kwargs):
super().__init__()
self.model = model
self.ddpm_num_timesteps = model.num_timesteps
self.schedule = schedule
self.device = device
def register_buffer(self, name, attr):
if type(attr) == torch.Tensor:
if attr.device != self.device:
attr = attr.float().to(self.device)
setattr(self, name, attr)
def make_schedule(self, ddim_num_steps, ddim_discretize="uniform", ddim_eta=0., verbose=True):
ddim_timesteps = make_ddim_timesteps(ddim_discr_method=ddim_discretize, num_ddim_timesteps=ddim_num_steps,
num_ddpm_timesteps=self.ddpm_num_timesteps,verbose=verbose)
self.make_schedule_timesteps(ddim_timesteps, ddim_eta=ddim_eta, verbose=verbose)
def make_schedule_timesteps(self, ddim_timesteps, ddim_eta=0., verbose=True):
self.ddim_timesteps = torch.tensor(ddim_timesteps)
alphas_cumprod = self.model.alphas_cumprod
assert alphas_cumprod.shape[0] == self.ddpm_num_timesteps, 'alphas have to be defined for each timestep'
to_torch = lambda x: x.clone().detach().to(torch.float32).to(self.device)
self.register_buffer('betas', to_torch(self.model.betas))
self.register_buffer('alphas_cumprod', to_torch(alphas_cumprod))
self.register_buffer('alphas_cumprod_prev', to_torch(self.model.alphas_cumprod_prev))
# calculations for diffusion q(x_t | x_{t-1}) and others
self.register_buffer('sqrt_alphas_cumprod', to_torch(np.sqrt(alphas_cumprod.cpu())))
self.register_buffer('sqrt_one_minus_alphas_cumprod', to_torch(np.sqrt(1. - alphas_cumprod.cpu())))
self.register_buffer('log_one_minus_alphas_cumprod', to_torch(np.log(1. - alphas_cumprod.cpu())))
self.register_buffer('sqrt_recip_alphas_cumprod', to_torch(np.sqrt(1. / alphas_cumprod.cpu())))
self.register_buffer('sqrt_recipm1_alphas_cumprod', to_torch(np.sqrt(1. / alphas_cumprod.cpu() - 1)))
# ddim sampling parameters
ddim_sigmas, ddim_alphas, ddim_alphas_prev = make_ddim_sampling_parameters(alphacums=alphas_cumprod.cpu(),
ddim_timesteps=self.ddim_timesteps,
eta=ddim_eta,verbose=verbose)
self.register_buffer('ddim_sigmas', ddim_sigmas)
self.register_buffer('ddim_alphas', ddim_alphas)
self.register_buffer('ddim_alphas_prev', ddim_alphas_prev)
self.register_buffer('ddim_sqrt_one_minus_alphas', np.sqrt(1. - ddim_alphas))
sigmas_for_original_sampling_steps = ddim_eta * torch.sqrt(
(1 - self.alphas_cumprod_prev) / (1 - self.alphas_cumprod) * (
1 - self.alphas_cumprod / self.alphas_cumprod_prev))
self.register_buffer('ddim_sigmas_for_original_num_steps', sigmas_for_original_sampling_steps)
@torch.no_grad()
def sample_custom(self,
ddim_timesteps,
conditioning,
callback=None,
img_callback=None,
quantize_x0=False,
eta=0.,
mask=None,
x0=None,
temperature=1.,
noise_dropout=0.,
score_corrector=None,
corrector_kwargs=None,
verbose=True,
x_T=None,
log_every_t=100,
unconditional_guidance_scale=1.,
unconditional_conditioning=None, # this has to come in the same format as the conditioning, # e.g. as encoded tokens, ...
dynamic_threshold=None,
ucg_schedule=None,
denoise_function=None,
extra_args=None,
to_zero=True,
end_step=None,
disable_pbar=False,
**kwargs
):
self.make_schedule_timesteps(ddim_timesteps=ddim_timesteps, ddim_eta=eta, verbose=verbose)
samples, intermediates = self.ddim_sampling(conditioning, x_T.shape,
callback=callback,
img_callback=img_callback,
quantize_denoised=quantize_x0,
mask=mask, x0=x0,
ddim_use_original_steps=False,
noise_dropout=noise_dropout,
temperature=temperature,
score_corrector=score_corrector,
corrector_kwargs=corrector_kwargs,
x_T=x_T,
log_every_t=log_every_t,
unconditional_guidance_scale=unconditional_guidance_scale,
unconditional_conditioning=unconditional_conditioning,
dynamic_threshold=dynamic_threshold,
ucg_schedule=ucg_schedule,
denoise_function=denoise_function,
extra_args=extra_args,
to_zero=to_zero,
end_step=end_step,
disable_pbar=disable_pbar
)
return samples, intermediates
@torch.no_grad()
def sample(self,
S,
batch_size,
shape,
conditioning=None,
callback=None,
normals_sequence=None,
img_callback=None,
quantize_x0=False,
eta=0.,
mask=None,
x0=None,
temperature=1.,
noise_dropout=0.,
score_corrector=None,
corrector_kwargs=None,
verbose=True,
x_T=None,
log_every_t=100,
unconditional_guidance_scale=1.,
unconditional_conditioning=None, # this has to come in the same format as the conditioning, # e.g. as encoded tokens, ...
dynamic_threshold=None,
ucg_schedule=None,
**kwargs
):
if conditioning is not None:
if isinstance(conditioning, dict):
ctmp = conditioning[list(conditioning.keys())[0]]
while isinstance(ctmp, list): ctmp = ctmp[0]
cbs = ctmp.shape[0]
if cbs != batch_size:
print(f"Warning: Got {cbs} conditionings but batch-size is {batch_size}")
elif isinstance(conditioning, list):
for ctmp in conditioning:
if ctmp.shape[0] != batch_size:
print(f"Warning: Got {cbs} conditionings but batch-size is {batch_size}")
else:
if conditioning.shape[0] != batch_size:
print(f"Warning: Got {conditioning.shape[0]} conditionings but batch-size is {batch_size}")
self.make_schedule(ddim_num_steps=S, ddim_eta=eta, verbose=verbose)
# sampling
C, H, W = shape
size = (batch_size, C, H, W)
print(f'Data shape for DDIM sampling is {size}, eta {eta}')
samples, intermediates = self.ddim_sampling(conditioning, size,
callback=callback,
img_callback=img_callback,
quantize_denoised=quantize_x0,
mask=mask, x0=x0,
ddim_use_original_steps=False,
noise_dropout=noise_dropout,
temperature=temperature,
score_corrector=score_corrector,
corrector_kwargs=corrector_kwargs,
x_T=x_T,
log_every_t=log_every_t,
unconditional_guidance_scale=unconditional_guidance_scale,
unconditional_conditioning=unconditional_conditioning,
dynamic_threshold=dynamic_threshold,
ucg_schedule=ucg_schedule,
denoise_function=None,
extra_args=None
)
return samples, intermediates
@torch.no_grad()
def ddim_sampling(self, cond, shape,
x_T=None, ddim_use_original_steps=False,
callback=None, timesteps=None, quantize_denoised=False,
mask=None, x0=None, img_callback=None, log_every_t=100,
temperature=1., noise_dropout=0., score_corrector=None, corrector_kwargs=None,
unconditional_guidance_scale=1., unconditional_conditioning=None, dynamic_threshold=None,
ucg_schedule=None, denoise_function=None, extra_args=None, to_zero=True, end_step=None, disable_pbar=False):
device = self.model.betas.device
b = shape[0]
if x_T is None:
img = torch.randn(shape, device=device)
else:
img = x_T
if timesteps is None:
timesteps = self.ddpm_num_timesteps if ddim_use_original_steps else self.ddim_timesteps
elif timesteps is not None and not ddim_use_original_steps:
subset_end = int(min(timesteps / self.ddim_timesteps.shape[0], 1) * self.ddim_timesteps.shape[0]) - 1
timesteps = self.ddim_timesteps[:subset_end]
intermediates = {'x_inter': [img], 'pred_x0': [img]}
time_range = reversed(range(0,timesteps)) if ddim_use_original_steps else timesteps.flip(0)
total_steps = timesteps if ddim_use_original_steps else timesteps.shape[0]
# print(f"Running DDIM Sampling with {total_steps} timesteps")
iterator = tqdm(time_range[:end_step], desc='DDIM Sampler', total=end_step, disable=disable_pbar)
for i, step in enumerate(iterator):
index = total_steps - i - 1
ts = torch.full((b,), step, device=device, dtype=torch.long)
if mask is not None:
assert x0 is not None
img_orig = self.model.q_sample(x0, ts) # TODO: deterministic forward pass?
img = img_orig * mask + (1. - mask) * img
if ucg_schedule is not None:
assert len(ucg_schedule) == len(time_range)
unconditional_guidance_scale = ucg_schedule[i]
outs = self.p_sample_ddim(img, cond, ts, index=index, use_original_steps=ddim_use_original_steps,
quantize_denoised=quantize_denoised, temperature=temperature,
noise_dropout=noise_dropout, score_corrector=score_corrector,
corrector_kwargs=corrector_kwargs,
unconditional_guidance_scale=unconditional_guidance_scale,
unconditional_conditioning=unconditional_conditioning,
dynamic_threshold=dynamic_threshold, denoise_function=denoise_function, extra_args=extra_args)
img, pred_x0 = outs
if callback: callback(i)
if img_callback: img_callback(pred_x0, i)
if index % log_every_t == 0 or index == total_steps - 1:
intermediates['x_inter'].append(img)
intermediates['pred_x0'].append(pred_x0)
if to_zero:
img = pred_x0
else:
if ddim_use_original_steps:
sqrt_alphas_cumprod = self.sqrt_alphas_cumprod
else:
sqrt_alphas_cumprod = torch.sqrt(self.ddim_alphas)
img /= sqrt_alphas_cumprod[index - 1]
return img, intermediates
@torch.no_grad()
def p_sample_ddim(self, x, c, t, index, repeat_noise=False, use_original_steps=False, quantize_denoised=False,
temperature=1., noise_dropout=0., score_corrector=None, corrector_kwargs=None,
unconditional_guidance_scale=1., unconditional_conditioning=None,
dynamic_threshold=None, denoise_function=None, extra_args=None):
b, *_, device = *x.shape, x.device
if denoise_function is not None:
model_output = denoise_function(self.model.apply_model, x, t, **extra_args)
elif unconditional_conditioning is None or unconditional_guidance_scale == 1.:
model_output = self.model.apply_model(x, t, c)
else:
x_in = torch.cat([x] * 2)
t_in = torch.cat([t] * 2)
if isinstance(c, dict):
assert isinstance(unconditional_conditioning, dict)
c_in = dict()
for k in c:
if isinstance(c[k], list):
c_in[k] = [torch.cat([
unconditional_conditioning[k][i],
c[k][i]]) for i in range(len(c[k]))]
else:
c_in[k] = torch.cat([
unconditional_conditioning[k],
c[k]])
elif isinstance(c, list):
c_in = list()
assert isinstance(unconditional_conditioning, list)
for i in range(len(c)):
c_in.append(torch.cat([unconditional_conditioning[i], c[i]]))
else:
c_in = torch.cat([unconditional_conditioning, c])
model_uncond, model_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)
model_output = model_uncond + unconditional_guidance_scale * (model_t - model_uncond)
if self.model.parameterization == "v":
e_t = self.model.predict_eps_from_z_and_v(x, t, model_output)
else:
e_t = model_output
if score_corrector is not None:
assert self.model.parameterization == "eps", 'not implemented'
e_t = score_corrector.modify_score(self.model, e_t, x, t, c, **corrector_kwargs)
alphas = self.model.alphas_cumprod if use_original_steps else self.ddim_alphas
alphas_prev = self.model.alphas_cumprod_prev if use_original_steps else self.ddim_alphas_prev
sqrt_one_minus_alphas = self.model.sqrt_one_minus_alphas_cumprod if use_original_steps else self.ddim_sqrt_one_minus_alphas
sigmas = self.model.ddim_sigmas_for_original_num_steps if use_original_steps else self.ddim_sigmas
# select parameters corresponding to the currently considered timestep
a_t = torch.full((b, 1, 1, 1), alphas[index], device=device)
a_prev = torch.full((b, 1, 1, 1), alphas_prev[index], device=device)
sigma_t = torch.full((b, 1, 1, 1), sigmas[index], device=device)
sqrt_one_minus_at = torch.full((b, 1, 1, 1), sqrt_one_minus_alphas[index],device=device)
# current prediction for x_0
if self.model.parameterization != "v":
pred_x0 = (x - sqrt_one_minus_at * e_t) / a_t.sqrt()
else:
pred_x0 = self.model.predict_start_from_z_and_v(x, t, model_output)
if quantize_denoised:
pred_x0, _, *_ = self.model.first_stage_model.quantize(pred_x0)
if dynamic_threshold is not None:
raise NotImplementedError()
# direction pointing to x_t
dir_xt = (1. - a_prev - sigma_t**2).sqrt() * e_t
noise = sigma_t * noise_like(x.shape, device, repeat_noise) * temperature
if noise_dropout > 0.:
noise = torch.nn.functional.dropout(noise, p=noise_dropout)
x_prev = a_prev.sqrt() * pred_x0 + dir_xt + noise
return x_prev, pred_x0
@torch.no_grad()
def encode(self, x0, c, t_enc, use_original_steps=False, return_intermediates=None,
unconditional_guidance_scale=1.0, unconditional_conditioning=None, callback=None):
num_reference_steps = self.ddpm_num_timesteps if use_original_steps else self.ddim_timesteps.shape[0]
assert t_enc <= num_reference_steps
num_steps = t_enc
if use_original_steps:
alphas_next = self.alphas_cumprod[:num_steps]
alphas = self.alphas_cumprod_prev[:num_steps]
else:
alphas_next = self.ddim_alphas[:num_steps]
alphas = torch.tensor(self.ddim_alphas_prev[:num_steps])
x_next = x0
intermediates = []
inter_steps = []
for i in tqdm(range(num_steps), desc='Encoding Image'):
t = torch.full((x0.shape[0],), i, device=self.model.device, dtype=torch.long)
if unconditional_guidance_scale == 1.:
noise_pred = self.model.apply_model(x_next, t, c)
else:
assert unconditional_conditioning is not None
e_t_uncond, noise_pred = torch.chunk(
self.model.apply_model(torch.cat((x_next, x_next)), torch.cat((t, t)),
torch.cat((unconditional_conditioning, c))), 2)
noise_pred = e_t_uncond + unconditional_guidance_scale * (noise_pred - e_t_uncond)
xt_weighted = (alphas_next[i] / alphas[i]).sqrt() * x_next
weighted_noise_pred = alphas_next[i].sqrt() * (
(1 / alphas_next[i] - 1).sqrt() - (1 / alphas[i] - 1).sqrt()) * noise_pred
x_next = xt_weighted + weighted_noise_pred
if return_intermediates and i % (
num_steps // return_intermediates) == 0 and i < num_steps - 1:
intermediates.append(x_next)
inter_steps.append(i)
elif return_intermediates and i >= num_steps - 2:
intermediates.append(x_next)
inter_steps.append(i)
if callback: callback(i)
out = {'x_encoded': x_next, 'intermediate_steps': inter_steps}
if return_intermediates:
out.update({'intermediates': intermediates})
return x_next, out
@torch.no_grad()
def stochastic_encode(self, x0, t, use_original_steps=False, noise=None, max_denoise=False):
# fast, but does not allow for exact reconstruction
# t serves as an index to gather the correct alphas
if use_original_steps:
sqrt_alphas_cumprod = self.sqrt_alphas_cumprod
sqrt_one_minus_alphas_cumprod = self.sqrt_one_minus_alphas_cumprod
else:
sqrt_alphas_cumprod = torch.sqrt(self.ddim_alphas)
sqrt_one_minus_alphas_cumprod = self.ddim_sqrt_one_minus_alphas
if noise is None:
noise = torch.randn_like(x0)
if max_denoise:
noise_multiplier = 1.0
else:
noise_multiplier = extract_into_tensor(sqrt_one_minus_alphas_cumprod, t, x0.shape)
return (extract_into_tensor(sqrt_alphas_cumprod, t, x0.shape) * x0 + noise_multiplier * noise)
@torch.no_grad()
def decode(self, x_latent, cond, t_start, unconditional_guidance_scale=1.0, unconditional_conditioning=None,
use_original_steps=False, callback=None):
timesteps = np.arange(self.ddpm_num_timesteps) if use_original_steps else self.ddim_timesteps
timesteps = timesteps[:t_start]
time_range = np.flip(timesteps)
total_steps = timesteps.shape[0]
print(f"Running DDIM Sampling with {total_steps} timesteps")
iterator = tqdm(time_range, desc='Decoding image', total=total_steps)
x_dec = x_latent
for i, step in enumerate(iterator):
index = total_steps - i - 1
ts = torch.full((x_latent.shape[0],), step, device=x_latent.device, dtype=torch.long)
x_dec, _ = self.p_sample_ddim(x_dec, cond, ts, index=index, use_original_steps=use_original_steps,
unconditional_guidance_scale=unconditional_guidance_scale,
unconditional_conditioning=unconditional_conditioning)
if callback: callback(i)
return x_dec

File diff suppressed because it is too large Load Diff

View File

@ -1 +0,0 @@
from .sampler import DPMSolverSampler

File diff suppressed because it is too large Load Diff

View File

@ -1,96 +0,0 @@
"""SAMPLING ONLY."""
import torch
from .dpm_solver import NoiseScheduleVP, model_wrapper, DPM_Solver
MODEL_TYPES = {
"eps": "noise",
"v": "v"
}
class DPMSolverSampler(object):
def __init__(self, model, device=torch.device("cuda"), **kwargs):
super().__init__()
self.model = model
self.device = device
to_torch = lambda x: x.clone().detach().to(torch.float32).to(model.device)
self.register_buffer('alphas_cumprod', to_torch(model.alphas_cumprod))
def register_buffer(self, name, attr):
if type(attr) == torch.Tensor:
if attr.device != self.device:
attr = attr.to(self.device)
setattr(self, name, attr)
@torch.no_grad()
def sample(self,
S,
batch_size,
shape,
conditioning=None,
callback=None,
normals_sequence=None,
img_callback=None,
quantize_x0=False,
eta=0.,
mask=None,
x0=None,
temperature=1.,
noise_dropout=0.,
score_corrector=None,
corrector_kwargs=None,
verbose=True,
x_T=None,
log_every_t=100,
unconditional_guidance_scale=1.,
unconditional_conditioning=None,
# this has to come in the same format as the conditioning, # e.g. as encoded tokens, ...
**kwargs
):
if conditioning is not None:
if isinstance(conditioning, dict):
ctmp = conditioning[list(conditioning.keys())[0]]
while isinstance(ctmp, list): ctmp = ctmp[0]
if isinstance(ctmp, torch.Tensor):
cbs = ctmp.shape[0]
if cbs != batch_size:
print(f"Warning: Got {cbs} conditionings but batch-size is {batch_size}")
elif isinstance(conditioning, list):
for ctmp in conditioning:
if ctmp.shape[0] != batch_size:
print(f"Warning: Got {ctmp.shape[0]} conditionings but batch-size is {batch_size}")
else:
if isinstance(conditioning, torch.Tensor):
if conditioning.shape[0] != batch_size:
print(f"Warning: Got {conditioning.shape[0]} conditionings but batch-size is {batch_size}")
# sampling
C, H, W = shape
size = (batch_size, C, H, W)
print(f'Data shape for DPM-Solver sampling is {size}, sampling steps {S}')
device = self.model.betas.device
if x_T is None:
img = torch.randn(size, device=device)
else:
img = x_T
ns = NoiseScheduleVP('discrete', alphas_cumprod=self.alphas_cumprod)
model_fn = model_wrapper(
lambda x, t, c: self.model.apply_model(x, t, c),
ns,
model_type=MODEL_TYPES[self.model.parameterization],
guidance_type="classifier-free",
condition=conditioning,
unconditional_condition=unconditional_conditioning,
guidance_scale=unconditional_guidance_scale,
)
dpm_solver = DPM_Solver(model_fn, ns, predict_x0=True, thresholding=False)
x = dpm_solver.sample(img, steps=S, skip_type="time_uniform", method="multistep", order=2,
lower_order_final=True)
return x.to(device), None

Some files were not shown because too many files have changed in this diff Show More