ComfyUI

mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-04-21 03:07:58 +08:00

Author	SHA1	Message	Date
Jukka Seppänen	404d7b9978	feat: Support Qwen3.5 text generation models (#12771 )	2026-03-25 22:48:28 -04:00
Kohaku-Blueleaf	5ebb0c2e0b	FP8 bwd training (#13121 )	2026-03-24 20:39:04 -04:00
Jukka Seppänen	e87858e974	feat: LTX2: Support reference audio (ID-LoRA) (#13111 )	2026-03-23 18:22:24 -04:00
Talmaj	d49420b3c7	LongCat-Image edit (#13003 )	2026-03-21 23:51:05 -04:00
rattus	25b6d1d629	wan: vae: Fix light/color change (#13101 ) There was an issue where the resample split was too early and dropped one of the rolling convolutions a frame early. This is most noticable as a lighting/color change between pixel frames 5->6 (latent 2->3), or as a lighting change between the first and last frame in an FLF wan flow.	2026-03-21 18:44:35 -04:00
comfyanonymous	11c15d8832	Fix fp16 intermediates giving different results. (#13100 )	2026-03-21 17:53:25 -04:00
comfyanonymous	b5d32e6ad2	Fix sampling issue with fp16 intermediates. (#13099 )	2026-03-21 17:47:42 -04:00
Jedrzej Kosinski	87cda1fc25	Move inline comfy.context_windows imports to top-level in model_base.py (#13083 ) The recent PR that added resize_cond_for_context_window methods to model classes used inline 'import comfy.context_windows' in each method body. This moves that import to the top-level import section, replacing 4 duplicate inline imports with a single top-level one.	2026-03-20 20:03:42 -04:00
drozbay	589228e671	Add slice_cond and per-model context window cond resizing (#12645 ) * Add slice_cond and per-model context window cond resizing * Fix cond_value.size() call in context window cond resizing * Expose additional advanced inputs for ContextWindowsManualNode Necessary for WanAnimate context windows workflow, which needs cond_retain_index_list = 0 to work properly with its reference input. ---------	2026-03-19 20:42:42 -07:00
rattus	f49856af57	ltx: vae: Fix missing init variable (#13074 ) Forgot to push this ammendment. Previous test results apply to this.	2026-03-19 22:34:58 -04:00
rattus	82b868a45a	Fix VRAM leak in tiler fallback in video VAEs (#13073 ) * sd: soft_empty_cache on tiler fallback This doesnt cost a lot and creates the expected VRAM reduction in resource monitors when you fallback to tiler. * wan: vae: Don't recursion in local fns (move run_up) Moved Decoder3d’s recursive run_up out of forward into a class method to avoid nested closure self-reference cycles. This avoids cyclic garbage that delays garbage of tensors which in turn delays VRAM release before tiled fallback. * ltx: vae: Don't recursion in local fns (move run_up) Mov the recursive run_up out of forward into a class method to avoid nested closure self-reference cycles. This avoids cyclic garbage that delays garbage of tensors which in turn delays VRAM release before tiled fallback.	2026-03-19 22:30:27 -04:00
comfyanonymous	8458ae2686	Revert "fix: run text encoders on MPS GPU instead of CPU for Apple Silicon (#…" (#13070 ) This reverts commit `b941913f1d`.	2026-03-19 15:27:55 -04:00
Jukka Seppänen	fd0261d2bc	Reduce tiled decode peak memory (#13050 )	2026-03-19 13:29:34 -04:00
rattus	ab14541ef7	memory: Add more exclusion criteria to pinned read (#13067 )	2026-03-19 10:03:20 -07:00
rattus	fabed694a2	ltx: vae: implement chunked encoder + CPU IO chunking (Big VRAM reductions) (#13062 ) * ltx: vae: add cache state to downsample block * ltx: vae: Add time stride awareness to causal_conv_3d * ltx: vae: Automate truncation for encoder Other VAEs just truncate without error. Do the same. * sd/ltx: Make chunked_io a flag in its own right Taking this bi-direcitonal, so make it a for-purpose named flag. * ltx: vae: implement chunked encoder + CPU IO chunking People are doing things with big frame counts in LTX including V2V flows. Implement the time-chunked encoder to keep the VRAM down, with the converse of the new CPU pre-allocation technique, where the chunks are brought from the CPU JIT. * ltx: vae-encode: round chunk sizes more strictly Only powers of 2 and multiple of 8 are valid due to cache slicing.	2026-03-19 09:58:47 -07:00
comfyanonymous	f6b869d7d3	fp16 intermediates doen't work for some text enc models. (#13056 )	2026-03-18 19:42:28 -04:00
comfyanonymous	56ff88f951	Fix regression. (#13053 )	2026-03-18 18:35:25 -04:00
Jukka Seppänen	9fff091f35	Further Reduce LTX VAE decode peak RAM usage (#13052 )	2026-03-18 18:32:26 -04:00
comfyanonymous	dcd659590f	Make more intermediate values follow the intermediate dtype. (#13051 )	2026-03-18 18:14:18 -04:00
Anton Bukov	b941913f1d	fix: run text encoders on MPS GPU instead of CPU for Apple Silicon (#12809 ) On Apple Silicon, `vram_state` is set to `VRAMState.SHARED` because CPU and GPU share unified memory. However, `text_encoder_device()` only checked for `HIGH_VRAM` and `NORMAL_VRAM`, causing all text encoders to fall back to CPU on MPS devices. Adding `VRAMState.SHARED` to the condition allows non-quantized text encoders (e.g. bf16 Gemma 3 12B) to run on the MPS GPU, providing significant speedup for text encoding and prompt generation. Note: quantized models (fp4/fp8) that use float8_e4m3fn internally will still fall back to CPU via the `supports_cast()` check in `CLIP.__init__()`, since MPS does not support fp8 dtypes.	2026-03-17 21:21:32 -04:00
rattus	cad24ce262	cascade: remove dead weight init code (#13026 ) This weight init process is fully shadowed be the weight load and doesnt work in dynamic_vram were the weight allocation is deferred.	2026-03-17 20:59:10 -04:00
comfyanonymous	68d542cc06	Fix case where pixel space VAE could cause issues. (#13030 )	2026-03-17 20:46:22 -04:00
Jukka Seppänen	735a0465e5	Inplace VAE output processing to reduce peak RAM consumption. (#13028 )	2026-03-17 20:20:49 -04:00
rattus	035414ede4	Reduce WAN VAE VRAM, Save use cases for OOM/Tiler (#13014 ) * wan: vae: encoder: Add feature cache layer that corks singles If a downsample only gives you a single frame, save it to the feature cache and return nothing to the top level. This increases the efficiency of cacheability, but also prepares support for going two by two rather than four by four on the frames. * wan: remove all concatentation with the feature cache The loopers are now responsible for ensuring that non-final frames are processes at least two-by-two, elimiating the need for this cat case. * wan: vae: recurse and chunk for 2+2 frames on decode Avoid having to clone off slices of 4 frame chunks and reduce the size of the big 6 frame convolutions down to 4. Save the VRAMs. * wan: encode frames 2x2. Reduce VRAM usage greatly by encoding frames 2 at a time rather than 4. * wan: vae: remove cloning The loopers now control the chunking such there is noever more than 2 frames, so just cache these slices directly and avoid the clone allocations completely. * wan: vae: free consumer caller tensors on recursion * wan: vae: restyle a little to match LTX	2026-03-17 17:34:39 -04:00
rattus	1a157e1f97	Reduce LTX VAE VRAM usage and save use cases from OOMs/Tiler (#13013 ) * ltx: vae: scale the chunk size with the users VRAM Scale this linearly down for users with low VRAM. * ltx: vae: free non-chunking recursive intermediates * ltx: vae: cleanup some intermediates The conv layer can be the VRAM peak and it does a torch.cat. So cleanup the pieces of the cat. Also clear our the cache ASAP as each layer detect its end as this VAE surges in VRAM at the end due to the ended padding increasing the size of the final frame convolutions off-the-books to the chunker. So if all the earlier layers free up their cache it can offset that surge. Its a fragmentation nightmare, and the chance of it having to recache the pyt allocator is very high, but you wont OOM.	2026-03-17 17:32:43 -04:00
Paulo Muggler Moreira	8cc746a864	fix: disable SageAttention for Hunyuan3D v2.1 DiT (#12772 )	2026-03-16 22:27:27 -04:00
comfyanonymous	ca17fc8355	Fix potential issue. (#13009 )	2026-03-16 21:38:40 -04:00
Kohaku-Blueleaf	20561aa919	[Trainer] FP4, 8, 16 training by native dtype support and quant linear autograd function (#12681 )	2026-03-16 21:31:50 -04:00
comfyanonymous	7a16e8aa4e	Add --enable-dynamic-vram options to force enable it. (#13002 )	2026-03-16 16:50:13 -04:00
blepping	b202f842af	Skip running model finalizers at exit (#12994 )	2026-03-16 16:00:42 -04:00
lostdisc	3814bf4454	Enable Pytorch Attention for gfx1150 (#12973 )	2026-03-15 12:45:30 -07:00
rattus	e84a200a3c	ops: opt out of deferred weight init if subclassed (#12967 ) If a subclass BYO _load_from_state_dict and doesnt call the super() the needed default init of these weights is missed and can lead to problems for uninitialized weights.	2026-03-15 11:49:49 -07:00
Jukka Seppänen	0904cc3fe5	LTXV: Accumulate VAE decode results on intermediate_device (#12955 )	2026-03-14 18:09:09 -07:00
comfyanonymous	c711b8f437	Add --fp16-intermediates to use fp16 for intermediate values between nodes (#12953 ) This is an experimental WIP option that might not work in your workflow but should lower memory usage if it does. Currently only the VAE and the load image node will output in fp16 when this option is turned on.	2026-03-14 19:18:19 -04:00
Jukka Seppänen	1c5db7397d	feat: Support mxfp8 (#12907 )	2026-03-14 18:36:29 -04:00
rattus	7810f49702	comfy aimdo 0.2.11 + Improved RAM Pressure release strategies - Windows speedups (#12925 ) * Implement seek and read for pins Source pins from an mmap is pad because its its a CPU->CPU copy that attempts to fully buffer the same data twice. Instead, use seek and read which avoids the mmap buffering while usually being a faster read in the first place (avoiding mmap faulting etc). * pinned_memory: Use Aimdo pinner The aimdo pinner bypasses pytorches CPU allocator which can leak windows commit charge. * ops: bypass init() of weight for embedding layer This similarly consumes large commit charge especially for TEs. It can cause a permanement leaked commit charge which can destabilize on systems close to the commit ceiling and generally confuses the RAM stats. * model_patcher: implement pinned memory counter Implement a pinned memory counter for better accounting of what volume of memory pins have. * implement touch accounting Implement accounting of touching mmapped tensors. * mm+mp: add residency mmap getter * utils: use the aimdo mmap to load sft files * model_management: Implement tigher RAM pressure semantics Implement a pressure release on entire MMAPs as windows does perform faster when mmaps are unloaded and model loads free ramp into fully unallocated RAM. Make the concept of freeing for pins a completely separate concept. Now that pins are loadable directly from original file and don' touch the mmap, tighten the freeing budget to just the current loaded model - what you have left over. This still over-frees pins, but its a lot better than before. So after the pins are freed with that algorithm, bounce entire MMAPs to free RAM based on what the model needs, deducting off any known resident-in-mmap tensors to the free quota to keep it as tight as possible. * comfy-aimdo 0.2.11 Comfy aimdo 0.2.11 * mm: Implement file_slice path for QT * ruff * ops: put meta-tensors in place to allow custom nodes to check geo	2026-03-13 22:18:08 -04:00
Terry Jia	3fa8c5686d	fix: use frontend-compatible format for Float gradient_stops (#12789 ) Co-authored-by: guill <jacob.e.segal@gmail.com> Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>	2026-03-12 10:14:28 -07:00
comfyanonymous	44f1246c89	Support flux 2 klein kv cache model: Use the FluxKVCache node. (#12905 )	2026-03-12 11:30:50 -04:00
comfyanonymous	f6274c06b4	Fix issue with batch_size > 1 on some models. (#12892 )	2026-03-11 16:37:31 -04:00
Adi Borochov	4f4f8659c2	fix: guard torch.AcceleratorError for compatibility with torch < 2.8.0 (#12874 ) * fix: guard torch.AcceleratorError for compatibility with torch < 2.8.0 torch.AcceleratorError was introduced in PyTorch 2.8.0. Accessing it directly raises AttributeError on older versions. Use a try/except fallback at module load time, consistent with the existing pattern used for OOM_EXCEPTION. * fix: address review feedback for AcceleratorError compat - Fall back to RuntimeError instead of type(None) for ACCELERATOR_ERROR, consistent with OOM_EXCEPTION fallback pattern and valid for except clauses - Add "out of memory" message introspection for RuntimeError fallback case - Use RuntimeError directly in discard_cuda_async_error except clause ---------	2026-03-11 10:04:13 -07:00
comfyanonymous	9642e4407b	Add pre attention and post input patches to qwen image model. (#12879 )	2026-03-11 00:09:35 -04:00
comfyanonymous	3ad36d6be6	Allow model patches to have a cleanup function. (#12878 ) The function gets called after sampling is finished.	2026-03-10 20:09:12 -04:00
rattus	535c16ce6e	Widen OOM_EXCEPTION to AcceleratorError form (#12835 ) Pytorch only filters for OOMs in its own allocators however there are paths that can OOM on allocators made outside the pytorch allocators. These manifest as an AllocatorError as pytorch does not have universal error translation to its OOM type on exception. Handle it. A log I have for this also shows a double report of the error async, so call the async discarder to cleanup and make these OOMs look like OOMs.	2026-03-10 00:41:02 -04:00
rattus	a912809c25	model_detection: deep clone pre edited edited weights (#12862 ) Deep clone these weights as needed to avoid segfaulting when it tries to touch the original mmap.	2026-03-09 23:50:10 -04:00
comfyanonymous	c4fb0271cd	Add a way for nodes to add pre attn patches to flux model. (#12861 )	2026-03-09 23:37:58 -04:00
Jukka Seppänen	06f85e2c79	Fix text encoder lora loading for wrapped models (#12852 )	2026-03-09 16:08:51 -04:00
Luke Mino-Altherr	29b24cb517	refactor(assets): modular architecture + async two-phase scanner & background seeder (#12621 )	2026-03-07 20:37:25 -05:00
rattus	bcf1a1fab1	mm: reset_cast_buffers: sync compute stream before free (#12822 ) Sync the compute stream before freeing the cast buffers. This can cause use after free issues when the cast stream frees the buffer while the compute stream is behind enough to still needs a casted weight.	2026-03-07 09:38:08 -08:00
comfyanonymous	d69d30819b	Don't run TE on cpu when dynamic vram enabled. (#12815 )	2026-03-06 19:11:16 -05:00
rattus	f466b06601	Fix fp16 audio encoder models (#12811 ) * mp: respect model_defined_dtypes in default caster This is needed for parametrizations when the dtype changes between sd and model. * audio_encoders: archive model dtypes Archive model dtypes to stop the state dict load override the dtypes defined by the core for compute etc.	2026-03-06 18:20:07 -05:00

1 2 3 4 5 ...

2132 Commits