Commit Graph

2059 Commits

Author SHA1 Message Date
95e1059661 fix(ace15): handle missing lm_metadata in memory estimation during checkpoint export #12669 (#12686) 2026-02-28 01:18:40 -05:00
ac4412d0fa Native LongCat-Image implementation (#12597) 2026-02-27 23:04:34 -05:00
e721e24136 ops: implement lora requanting for non QuantizedTensor fp8 (#12668)
Allow non QuantizedTensor layer to set want_requant to get the post lora
calculation stochastic cast down to the original input dtype.

This is then used by the legacy fp8 Linear implementation to set the
compute_dtype to the preferred lora dtype but then want_requant it back
down to fp8.

This fixes the issue with --fast fp8_matrix_mult is combined with
--fast dynamic_vram which doing a lora on an fp8_ non QT model.
2026-02-27 19:05:51 -05:00
25ec3d96a3 Class WanVAE, def encode, feat_map is using self.decoder instead of self.encoder (#12682) 2026-02-27 19:03:45 -05:00
35e9fce775 Enable Pytorch Attention for gfx950 (#12641) 2026-02-26 20:16:12 -05:00
c7f7d52b68 feat: Support SDPose-OOD (#12661) 2026-02-26 19:59:05 -05:00
b233dbe0bc feat(ace-step): add ACE-Step 1.5 lycoris key alias mapping for LoKR #12638 (#12665) 2026-02-26 18:19:19 -05:00
8a4d85c708 Cleanups to the last PR. (#12646) 2026-02-26 01:30:31 -05:00
a4522017c5 feat: per-guide attention strength control in self-attention (#12518)
Implements per-guide attention attenuation via log-space additive bias
in self-attention. Each guide reference tracks its own strength and
optional spatial mask in conditioning metadata (guide_attention_entries).
2026-02-26 01:25:23 -05:00
907e5dcbbf initial FlowRVS support (#12637) 2026-02-25 23:38:46 -05:00
7253531670 Fix ltxav te mem estimation. (#12643) 2026-02-25 23:13:47 -05:00
e14b04478c Fix LTXAV text enc min length. (#12640)
Should have been 1024 instead of 512
2026-02-25 22:36:02 -05:00
4f5b7dbf1f Fix Aimdo fallback on probe to not use zero-copy sft (#12634)
* utils: dont use comfy sft loader in aimdo fallback

This was going to the raw command line switch and should respect main.py
probe of whether aimdo actually loaded successfully.

* ops: dont use deferred linear load in Aimdo fallback

Avoid changes of behaviour on --fast dynamic_vram when aimdo doesnt work.
2026-02-25 16:49:48 -05:00
3ebe1ac22e Disable dynamic_vram when using torch compiler (#12612)
* mp: attach re-construction arguments to model patcher

When making a model-patcher from a unet or ckpt, attach a callable
function that can be called to replay the model construction. This
can be used to deep clone model patcher WRT the actual model.

Originally written by Kosinkadink
f4b99bc623

* mp: Add disable_dynamic clone argument

Add a clone argument that lets a caller clone a ModelPatcher but disable
dynamic to demote the clone to regular MP. This is useful for legacy
features where dynamic_vram support is missing or TBD.

* torch_compile: disable dynamic_vram

This is a bigger feature. Disable for the interim to preserve
functionality.
2026-02-24 19:13:46 -05:00
599f9c5010 Don't crash right away if op is uninitialized. (#12615) 2026-02-24 12:28:25 -05:00
84aba95e03 Temporality unbreak some LTXAV workflows to give people time to migrate. (#12605) 2026-02-24 00:50:03 -05:00
caa43d2395 Fix issue loading fp8 ltxav checkpoints. (#12582) 2026-02-22 16:00:02 -05:00
07ca6852e8 Fix dtype issue in embeddings connector. (#12570) 2026-02-22 03:18:20 -05:00
f266b8d352 Move LTXAV av embedding connectors to diffusion model. (#12569) 2026-02-21 22:29:58 -05:00
0bfb936ab4 comfy-aimdo 0.2 - Improved pytorch allocator integration (#12557)
Integrate comfy-aimdo 0.2 which takes a different approach to
installing the memory allocator hook. Instead of using the complicated
and buggy pytorch MemPool+CudaPluggableAlloctor, cuda is directly hooked
making the process much more transparent to both comfy and pytorch. As
far as pytorch knows, aimdo doesnt exist anymore, and just operates
behind the scenes.

Remove all the mempool setup stuff for dynamic_vram and bump the
comfy-aimdo version. Remove the allocator object from memory_management
and demote its use as an enablment check to a boolean flag.

Comfy-aimdo 0.2 also support the pytorch cuda async allocator, so
remove the dynamic_vram based force disablement of cuda_malloc and
just go back to the old settings of allocators based on command line
input.
2026-02-21 10:52:57 -08:00
f394af8d0f feat: add gradient-slider display mode for FLOAT inputs (#12536)
* feat: add gradient-slider display mode for FLOAT inputs

* fix: use precise type annotation list[list[float]] for gradient_stops

Amp-Thread-ID: https://ampcode.com/threads/T-019c7eea-be2b-72ce-a51f-838376f9b7a7

---------

Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
Co-authored-by: bymyself <cbyrne@comfy.org>
2026-02-20 22:52:32 -08:00
5f2117528a Force min length 1 when tokenizing for text generation. (#12538) 2026-02-19 22:57:44 -05:00
0301ccf745 Small cleanup and try to get qwen 3 work with the text gen. (#12537) 2026-02-19 22:42:28 -05:00
6d11cc7354 feat: Add basic text generation support with native models, initially supporting Gemma3 (#12392) 2026-02-18 20:49:43 -05:00
58dcc97dcf ops: limit return of requants (#12506)
This check was far too broad and the dtype is not a reliable indicator
of wanting the requant (as QT returns the compute dtype as the dtype).
So explictly plumb whether fp8mm wants the requant or not.
2026-02-17 15:32:27 -05:00
44f8598521 Fix anima LLM adapter forward when manual cast (#12504) 2026-02-17 07:56:44 -08:00
c39653163d Fix anima preprocess text embeds not using right inference dtype. (#12501) 2026-02-17 00:29:20 -05:00
18927538a1 Implement NAG on all the models based on the Flux code. (#12500)
Use the Normalized Attention Guidance node.

Flux, Flux2, Klein, Chroma, Chroma radiance, Hunyuan Video, etc..
2026-02-16 23:30:34 -05:00
4454fab7f0 Remove code to support RMSNorm on old pytorch. (#12499) 2026-02-16 20:09:24 -05:00
88e6370527 Remove workaround for old pytorch. (#12480) 2026-02-15 20:43:53 -05:00
c0370044cd MPDynamic: force load flux img_in weight (Fixes flux1 canny+depth lora crash) (#12446)
* lora: add weight shape calculations.

This lets the loader know if a lora will change the shape of a weight
so it can take appropriate action.

* MPDynamic: force load flux img_in weight

This weight is a bit special, in that the lora changes its geometry.
This is rather unique, not handled by existing estimate and doesn't
work for either offloading or dynamic_vram.

Fix for dynamic_vram as a special case. Ideally we can fully precalculate
these lora geometry changes at load time, but just get these models
working first.
2026-02-15 20:30:09 -05:00
e1ede29d82 Remove unsafe pickle loading code that was used on pytorch older than 2.4 (#12473)
ComfyUI hasn't started on pytorch 2.4 since last month.
2026-02-14 22:53:52 -05:00
dc9822b7df Add working Qwen 2512 ControlNet (Fun ControlNet) support (#12359) 2026-02-13 22:23:52 -05:00
712efb466b Add left padding to LTXAV text encoder. (#12456) 2026-02-13 21:56:54 -05:00
726af73867 Fix some custom nodes. (#12455) 2026-02-13 20:21:10 -05:00
831351a29e Support generating attention masks for left padded text encoders. (#12454) 2026-02-13 20:15:23 -05:00
e1add563f9 Use torch RMSNorm for flux models and refactor hunyuan video code. (#12432) 2026-02-13 15:35:13 -05:00
8902907d7a dynamic_vram: Training fixes (#12442) 2026-02-13 15:29:37 -05:00
ae79e33345 llama: use a more efficient rope implementation (#12434)
Get rid of the cat and unary negation and inplace add-cmul the two
halves of the rope. Precompute -sin once at the start of the model
rather than every transformer block.

This is slightly faster on both GPU and CPU bound setups.
2026-02-12 19:56:42 -05:00
117e214354 ModelPatcherDynamic: force load non leaf weights (#12433)
The current behaviour of the default ModelPatcher is to .to a model
only if its fully loaded, which is how random non-leaf weights get
loaded in non-LowVRAM conditions.

The however means they never get loaded in dynamic_vram. In the
dynamic_vram case, force load them to the GPU.
2026-02-12 19:51:50 -05:00
e5ae670a40 Update ace15.py to allow min_p sampling (#12373) 2026-02-11 20:28:48 -05:00
3fe61cedda model_patcher: guard against none model_dtype (#12410)
Handle the case where the _model_dtype exists but is none with the
intended fallback.
2026-02-11 14:54:02 -05:00
2a4328d639 ace15: Use dynamic_vram friendly trange (#12409)
Factor out the ksampler trange and use it in ACE LLM to prevent the
silent stall at 0 and rate distortion due to first-step model load.
2026-02-11 14:53:42 -05:00
d297a749a2 dynamic_vram: Fix windows Aimdo crash + Fix LLM performance (#12408)
* model_management: lazy-cache aimdo_tensor

These tensors cosntructed from aimdo-allocations are CPU expensive to
make on the pytorch side. Add a cache version that will be valid with
signature match to fast path past whatever torch is doing.

* dynamic_vram: Minimize fast path CPU work

Move as much as possible inside the not resident if block and cache
the formed weight and bias rather than the flat intermediates. In
extreme layer weight rates this adds up.
2026-02-11 14:50:16 -05:00
76a7fa96db Make built in lora training work on anima. (#12402) 2026-02-10 22:04:32 -05:00
cdcf4119b3 [Trainer] training with proper offloading (#12189)
* Fix bypass dtype/device moving

* Force offloading mode for training

* training context var

* offloading implementation in training node

* fix wrong input type

* Support bypass load lora model, correct adapter/offloading handling
2026-02-10 21:45:19 -05:00
123a7874a9 ops: Fix vanilla-fp8 loaded lora quality (#12390)
This was missing the stochastic rounding required for fp8 downcast
to be consistent with model_patcher.patch_weight_to_device.

Missed in testing as I spend too much time with quantized tensors
and overlooked the simpler ones.
2026-02-10 13:38:28 -05:00
f719f9c062 sd: delay VAE dtype archive until after override (#12388)
VAEs have host specific dtype logic that should override the dynamic
_model_dtype. Defer the archiving of model dtypes until after.
2026-02-10 13:37:46 -05:00
fe053ba5eb mp: dont deep-clone objects from model_options (#12382)
If there are non-trivial python objects nested in the model_options, this
causes all sorts of issues. Traverse lists and dicts so clones can safely
overide settings and BYO objects but stop there on the deepclone.
2026-02-10 13:37:17 -05:00
a4be04c5d7 Ace step prompts match now. (#12376) 2026-02-09 19:45:56 -05:00