Commit Graph

333 Commits

Author SHA1 Message Date
943b3b615d HunyuanVideo 1.5 (#10819)
* init

* update

* Update model.py

* Update model.py

* remove print

* Fix text encoding

* Prevent empty negative prompt

Really doesn't work otherwise

* fp16 works

* I2V

* Update model_base.py

* Update nodes_hunyuan.py

* Better latent rgb factors

* Use the correct sigclip output...

* Support HunyuanVideo1.5 SR model

* whitespaces...

* Proper latent channel count

* SR model fixes

This also still needs timesteps scheduling based on the noise scale, can be used with two samplers too already

* vae_refiner: roll the convolution through temporal

Work in progress.

Roll the convolution through time using 2-latent-frame chunks and a
FIFO queue for the convolution seams.

* Support HunyuanVideo15 latent resampler

* fix

* Some cleanup

Co-Authored-By: comfyanonymous <121283862+comfyanonymous@users.noreply.github.com>

* Proper hyvid15 I2V channels

Co-Authored-By: comfyanonymous <121283862+comfyanonymous@users.noreply.github.com>

* Fix TokenRefiner for fp16

Otherwise x.sum has infs, just in case only casting if input is fp16, I don't know if necessary.

* Bugfix for the HunyuanVideo15 SR model

* vae_refiner: roll the convolution through temporal II

Roll the convolution through time using 2-latent-frame chunks and a
FIFO queue for the convolution seams.

Added support for encoder, lowered to 1 latent frame to save more
VRAM, made work for Hunyuan Image 3.0 (as code shared).

Fixed names, cleaned up code.

* Allow any number of input frames in VAE.

* Better VAE encode mem estimation.

* Lowvram fix.

* Fix hunyuan image 2.1 refiner.

* Fix mistake.

* Name changes.

* Rename.

* Whitespace.

* Fix.

* Fix.

---------

Co-authored-by: kijai <40791699+kijai@users.noreply.github.com>
Co-authored-by: Rattus <rattus128@gmail.com>
2025-11-20 22:44:43 -05:00
513b0c46fb Add RAM Pressure cache mode (#10454)
* execution: Roll the UI cache into the outputs

Currently the UI cache is parallel to the output cache with
expectations of being a content superset of the output cache.
At the same time the UI and output cache are maintained completely
seperately, making it awkward to free the output cache content without
changing the behaviour of the UI cache.

There are two actual users (getters) of the UI cache. The first is
the case of a direct content hit on the output cache when executing a
node. This case is very naturally handled by merging the UI and outputs
cache.

The second case is the history JSON generation at the end of the prompt.
This currently works by asking the cache for all_node_ids and then
pulling the cache contents for those nodes. all_node_ids is the nodes
of the dynamic prompt.

So fold the UI cache into the output cache. The current UI cache setter
now writes to a prompt-scope dict. When the output cache is set, just
get this value from the dict and tuple up with the outputs.

When generating the history, simply iterate prompt-scope dict.

This prepares support for more complex caching strategies (like RAM
pressure caching) where less than 1 workflow will be cached and it
will be desirable to keep the UI cache and output cache in sync.

* sd: Implement RAM getter for VAE

* model_patcher: Implement RAM getter for ModelPatcher

* sd: Implement RAM getter for CLIP

* Implement RAM Pressure cache

Implement a cache sensitive to RAM pressure. When RAM headroom drops
down below a certain threshold, evict RAM-expensive nodes from the
cache.

Models and tensors are measured directly for RAM usage. An OOM score
is then computed based on the RAM usage of the node.

Note the due to indirection through shared objects (like a model
patcher), multiple nodes can account the same RAM as their individual
usage. The intent is this will free chains of nodes particularly
model loaders and associate loras as they all score similar and are
sorted in close to each other.

Has a bias towards unloading model nodes mid flow while being able
to keep results like text encodings and VAE.

* execution: Convert the cache entry to NamedTuple

As commented in review.

Convert this to a named tuple and abstract away the tuple type
completely from graph.py.
2025-10-30 17:39:02 -04:00
e525673f72 Fix issue. (#10527) 2025-10-29 00:37:00 -04:00
8817f8fc14 Mixed Precision Quantization System (#10498)
* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.

* Updated design using Tensor Subclasses

* Fix FP8 MM

* An actually functional POC

* Remove CK reference and ensure correct compute dtype

* Update unit tests

* ruff lint

* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.

* Updated design using Tensor Subclasses

* Fix FP8 MM

* An actually functional POC

* Remove CK reference and ensure correct compute dtype

* Update unit tests

* ruff lint

* Fix missing keys

* Rename quant dtype parameter

* Rename quant dtype parameter

* Fix unittests for CPU build
2025-10-28 16:20:53 -04:00
dfff7e5332 Better memory estimation for the SD/Flux VAE on AMD. (#10334) 2025-10-13 22:37:19 -04:00
84e9ce32c6 Implement the mmaudio VAE. (#10300) 2025-10-11 22:57:23 -04:00
8aea746212 Implement gemma 3 as a text encoder. (#10241)
Not useful yet.
2025-10-06 22:08:08 -04:00
911331c06c sd: fix VAE tiled fallback VRAM leak (#10139)
When the VAE catches this VRAM OOM, it launches the fallback logic
straight from the exception context.

Python however refs the entire call stack that caused the exception
including any local variables for the sake of exception report and
debugging. In the case of tensors, this can hold on the references
to GBs of VRAM and inhibit the VRAM allocated from freeing them.

So dump the except context completely before going back to the VAE
via the tiler by getting out of the except block with nothing but
a flag.

The greately increases the reliability of the tiler fallback,
especially on low VRAM cards, as with the bug, if the leak randomly
leaked more than the headroom needed for a single tile, the tiler
would fallback would OOM and fail the flow.
2025-10-01 18:40:28 -04:00
a6f83a4a1a Support the new hunyuan vae. (#10150) 2025-10-01 17:19:13 -04:00
80b7c9455b Changes to the previous radiance commit. (#9851) 2025-09-13 18:03:34 -04:00
c1297f4eb3 Add support for Chroma Radiance (#9682)
* Initial Chroma Radiance support

* Minor Chroma Radiance cleanups

* Update Radiance nodes to ensure latents/images are on the intermediate device

* Fix Chroma Radiance memory estimation.

* Increase Chroma Radiance memory usage factor

* Increase Chroma Radiance memory usage factor once again

* Ensure images are multiples of 16 for Chroma Radiance
Add batch dimension and fix channels when necessary in ChromaRadianceImageToLatent node

* Tile Chroma Radiance NeRF to reduce memory consumption, update memory usage factor

* Update Radiance to support conv nerf final head type.

* Allow setting NeRF embedder dtype for Radiance
Bump Radiance nerf tile size to 32
Support EasyCache/LazyCache on Radiance (maybe)

* Add ChromaRadianceStubVAE node

* Crop Radiance image inputs to multiples of 16 instead of erroring to be in line with existing VAE behavior

* Convert Chroma Radiance nodes to V3 schema.

* Add ChromaRadianceOptions node and backend support.
Cleanups/refactoring to reduce code duplication with Chroma.

* Fix overriding the NeRF embedder dtype for Chroma Radiance

* Minor Chroma Radiance cleanups

* Move Chroma Radiance to its own directory in ldm
Minor code cleanups and tooltip improvements

* Fix Chroma Radiance embedder dtype overriding

* Remove Radiance dynamic nerf_embedder dtype override feature

* Unbork Radiance NeRF embedder init

* Remove Chroma Radiance image conversion and stub VAE nodes
Add a chroma_radiance option to the VAELoader builtin node which uses comfy.sd.PixelspaceConversionVAE
Add a PixelspaceConversionVAE to comfy.sd for converting BHWC 0..1 <-> BCHW -1..1
2025-09-13 17:58:43 -04:00
a3b04de700 Hunyuan refiner vae now works with tiled. (#9836) 2025-09-12 19:46:46 -04:00
33bd9ed9cb Implement hunyuan image refiner model. (#9817) 2025-09-12 00:43:20 -04:00
85e34643f8 Support hunyuan image 2.1 regular model. (#9792) 2025-09-10 02:05:07 -04:00
c9ebe70072 Some changes to the previous hunyuan PR. (#9725) 2025-09-04 20:39:02 -04:00
261421e218 Add Hunyuan 3D 2.1 Support (#8714) 2025-09-04 20:36:20 -04:00
c012400240 Initial support for qwen image model. (#9179) 2025-08-04 22:53:25 -04:00
a88788dce6 Wan 2.2 support. (#9080) 2025-07-28 08:00:23 -04:00
b40143984c Add model detection error hint for lora. (#8880) 2025-07-12 03:49:26 -04:00
ec70ed6aea Omnigen2 model implementation. (#8669) 2025-06-25 19:35:57 -04:00
7a13f74220 unet -> diffusion model (#8659) 2025-06-25 04:52:34 -04:00
520eb77b72 LoRA Trainer: LoRA training node in weight adapter scheme (#8446) 2025-06-13 19:25:59 -04:00
577de83ca9 ACE VAE works in fp16. (#8055) 2025-05-11 04:58:00 -04:00
a692c3cca4 Make ACE VAE tiling work. (#8004) 2025-05-08 07:25:45 -04:00
5d3cc85e13 Make japanese hiragana and katakana characters work with ACE. (#7997) 2025-05-08 03:32:36 -04:00
c7c025b8d1 Adjust memory estimation code for ACE VAE. (#7990) 2025-05-08 01:22:23 -04:00
16417b40d9 Initial ACE-Step model implementation. (#7972) 2025-05-07 08:33:34 -04:00
08ff5fa08a Cleanup chroma PR. 2025-04-30 20:57:30 -04:00
4ca3d84277 Support for Chroma - Flux1 Schnell distilled with CFG (#7355)
* Upload files for Chroma Implementation

* Remove trailing whitespace

* trim more trailing whitespace..oops

* remove unused imports

* Add supported_inference_dtypes

* Set min_length to 0 and remove attention_mask=True

* Set min_length to 1

* get_mdulations added from blepping and minor changes

* Add lora conversion if statement in lora.py

* Update supported_models.py

* update model_base.py

* add uptream commits

* set modelType.FLOW, will cause beta scheduler to work properly

* Adjust memory usage factor and remove unnecessary code

* fix mistake

* reduce code duplication

* remove unused imports

* refactor for upstream sync

* sync chroma-support with upstream via syncbranch patch

* Update sd.py

* Add Chroma as option for the OptimalStepsScheduler node
2025-04-30 20:57:00 -04:00
23e39f2ba7 Add a T5TokenizerOptions node to set options for the T5 tokenizer. (#7803) 2025-04-25 19:36:00 -04:00
f43e1d7f41 Hidream: Allow loading hidream text encoders in CLIPLoader and DualCLIPLoader (#7676)
* Hidream: Allow partial loading text encoders

* reformat code for ruff check.
2025-04-19 19:47:30 -04:00
9ad792f927 Basic support for hidream i1 model. 2025-04-15 17:35:05 -04:00
3a100b9a55 Disable partial offloading of audio VAE. 2025-04-04 21:24:56 -04:00
3872b43d4b A few fixes for the hunyuan3d models. 2025-03-20 04:52:31 -04:00
11f1b41bab Initial Hunyuan3Dv2 implementation.
Supports the multiview, mini, turbo models and VAEs.
2025-03-19 16:52:58 -04:00
55a1b09ddc Allow loading diffusion model files with the "Load Checkpoint" node. 2025-03-15 08:27:49 -04:00
3c3988df45 Show a better error message if the VAE is invalid. 2025-03-15 08:26:36 -04:00
93fedd92fe Support LTXV 0.9.5.
Credits: Lightricks team.
2025-03-05 00:13:49 -05:00
65042f7d39 Make it easier to set a custom template for hunyuan video. 2025-03-04 09:26:05 -05:00
1804397952 Use fp16 if checkpoint weights are fp16 and the model supports it. 2025-02-27 16:39:57 -05:00
63023011b9 WIP support for Wan t2v model. 2025-02-25 17:20:35 -05:00
e5ea112a90 Support Lumina 2 model. 2025-02-04 04:16:30 -05:00
9d8b6c1f46 More accurate memory estimation for cosmos and hunyuan video. 2025-01-16 03:48:40 -05:00
3aaabb12d4 Implement Cosmos Image/Video to World (Video) diffusion models.
Use CosmosImageToVideoLatent to set the input image/video.
2025-01-14 05:14:10 -05:00
2ff3104f70 WIP support for Nvidia Cosmos 7B and 14B text to world (video) models. 2025-01-10 09:14:16 -05:00
2307ff6746 Improve some logging messages. 2025-01-08 19:05:22 -05:00
ee9547ba31 Improve temporal VAE Encode (Tiled) math. 2024-12-26 07:18:49 -05:00
0229228f3f Clean up the VAE dtypes code. 2024-12-25 04:50:34 -05:00
73e04987f7 Prevent black images in VAE Decode (Tiled) node.
Overlap should be minimum 1 with tiling 2 for tiled temporal VAE decoding.
2024-12-24 07:36:30 -05:00
5388df784a Add temporal tiling to VAE Encode (Tiled) node. 2024-12-24 07:10:09 -05:00