ComfyUI

mirror of https://github.com/comfyanonymous/ComfyUI.git synced 2026-06-01 05:27:54 +08:00

Author	SHA1	Message	Date
comfyanonymous	85a403d1ea	Disable sage attention in stable audio dit and VAE. (#14148 )	2026-05-27 20:35:03 -04:00
Jukka Seppänen	987a937658	Support context window for PiD and fix lq_latent rounding (#14136 )	2026-05-27 12:08:06 -07:00
comfyanonymous	e75a92c1b6	Add memory usage factor for lens model. (#14124 )	2026-05-26 18:06:51 -07:00
comfyanonymous	d8d860a588	Closer memory usage factors for PID (#14123 )	2026-05-26 18:04:55 -07:00
Jukka Seppänen	28f4ef277c	feat: Support NVIDIA PixelDiT and PiD (CORE-201) (#14103 )	2026-05-26 17:50:14 -07:00
Jukka Seppänen	f9f54cae42	Lens: some cleanup (#14112 ) * Lens: remove redundant memory optimization	2026-05-26 10:32:53 +03:00
Jukka Seppänen	41812fa0ac	feat: Microsoft Lens support (CORE-248) (#14077 )	2026-05-25 23:01:51 -07:00
Ivan Zorin	57414dadfe	fix: cross-attention AdaLN scale, shift, sigma parameters calculation (#14097 )	2026-05-25 20:07:09 -07:00
comfyanonymous	da49b7d0b6	Remove useless annotations imports. (#14105 )	2026-05-25 19:23:29 -07:00
Jedrzej Kosinski	0a2dd86e78	MultiGPU Work Units For Accelerated Sampling (CORE-184) (#7063 )	2026-05-25 18:26:40 -07:00
rattus	b30e980a20	cache-ram: lower thresholds (#14089 ) Use the RAM right up to the wire as the community is bit accustomed too. This trades off headroom for the case where large chunky intermediates arrive and potenitally hits pagefile/swap, but a lot of people have "it just fits" workflows out there, so strike a compromise with 75->90%. Disable the incative cache for all but the very high RAM users.	2026-05-24 15:26:50 -07:00
rattus	39f963b4b0	mark loads to pins as cold immediately (#14088 ) This does the posix_fadvise to kick pins out of the disk cache (to avoid a double copy in RAM).	2026-05-24 15:25:59 -07:00
comfyanonymous	08d809d128	Fix --use-flash-attention ignored when xformers installed. (#14083 )	2026-05-23 17:44:28 -07:00
comfyanonymous	d80fcafee7	Remove dead code. (#14072 )	2026-05-22 19:56:36 -07:00
rattus	03e511862e	Fix reshaping lora application (#14031 ) * ModelPatcherDyanmic: purge stale vbar allocs on force cast * ModelPatcherDynamic: restore backups before load If doing a clean reload, mutative changes (lora application) could be applied on-top of the already loaded weight. Restore from backup unconditionally so that the new load is clean.	2026-05-21 09:47:16 -07:00
Edoardo Carmignani	aab41a9ddb	fix(lanczos): correct dimension transposition for single-channel tensors (#12679 )	2026-05-21 23:47:20 +08:00
rattus	5aa5ccc9e0	Multi-threaded load of models from disk (big load time speedups & Offload to disk) (CORE-43,CORE-152,CORE-164,CORE-165,CORE-117) (#13802 ) * model_management: disable non-dynamic smart memory Disable smart memory outright for non dynamic models. This is a minor step towards deprecation of --disable-dynamic-vram and the legacy ModelPatcher. This is needed for estimate-free model development, where new models can opt-out of supplying a memory estimate and not have to worry about hard VRAM allocations due to legacy non-dynamic model patchers This is also a general stability increase for a lot of stray use cases where estimates may still be off and going forward we are not going to accurately maintain such estimates. * pinned_memory: implement with aimdo growable buffer Use a single growable buffer so we can do threaded pre-warming on pinned memory. * mm: use aimdo to do transfer from disk to pin Aimdo implements a faster threaded loader. * Add stream host pin buffer for AIMDO casts Introduce per-offload-stream HostBuffer reuse for pinned staging, include it in cast buffer reset synchronization. Defer actual casts that go via this pin path to a separate pass such that the buffer can be allocated monolithically (to avoid cudaHostRegister thrash). * remove old pin path * Implement JIT pinned memory pressure Replace the predictive pin pressure mechanism with JIT PIN memory pressure. * LowVRAMPatch: change to two-phase visit * lora: re-implement as inplace swiss-army-knife operation * prepare for multiple pin sets * implement pinned loras * requirements: comfy-aimdo 0.4.0 * ops: remove unused arg This was defeatured in aimdo iteration * ops: sync the CPU with only the offload stream activity This was syncing with the offload stream which itself is synced with the compute stream, so this was syncing CPU with compute transitively. Define the event to sync it more gently. * pins: implement freeing intermediate for pinned memory Pinning is more important than inactive intermediates and the stream pin buffer is more important than even active intermediates. * execution: implement pin eviction on RAM presure Add back proper pin freeing on RAM pressure * implement pin registration swaps Uncap the windows pins from 50% by extending the pool and have a pressure mechanism to move the pin reservations om demand. This unfortunately implies a GPU sync to do the freeing so significant hysterisis needs to be added to consolidate these pressure events. * cli_args/execution: Implement lower background cache-ram threshold Limit the amount of RAM background intermediates can use, so that switching workflows doesn't degrade performance too much. * make default * bump aimdo * model-patcher: force-cast tiny weights Flux 2 gets crazy stalls due to a mix of tiny and giant weights creating lopsided steam buffer rotations which creates stalls. * ops: refactor in prep for chunking * mm: delegate pin-on-the-way to aimdo Aimdo is able to chunk and slice this on the way for better CPU->GPU overlap. The main advantage is the ability to shorten the bus contention window between previous weight transfer and the next weights vbar fault. * bump aimdo * pinning updates * specify hostbuf max allocation size There a signs of virtual memory exhaustion on some linux systems when throwing 128GB for every little piece. Pass the actual to save aimdo from over-estimates * tests: update execution tests for caching The default caching changed to ram-cache so update these tests accordingly. Remove the LRU 0 test as this also falls through to RAM cache.	2026-05-20 17:03:58 -07:00
comfyanonymous	f9c84c94b4	Support Stable Audio 3 model. (#14010 )	2026-05-20 11:34:22 -04:00
Cezarijus Kivylius	78b5dec6b6	fix: Hunyuan3D 2.1 batch size crashes in attention and forward pass (#13699 )	2026-05-20 19:58:49 +08:00
yy	626b082838	Fix typo in ops.py (#11925 )	2026-05-20 05:45:04 +08:00
comfyanonymous	a4382e056e	Use temporal downscale to make empty audio latent nodes more reusable. (#13975 )	2026-05-19 00:14:30 -04:00
comfyanonymous	990a7ae7f2	Initial work to make downscale_ratio_temporal work. (#13972 )	2026-05-18 23:01:43 -04:00
Yousef R. Gamaleldin	187e5237e1	Fix BiRefNet issue (#13966 )	2026-05-19 05:03:22 +08:00
rattus	16f862f02a	implement dynamic clip saving (#13959 ) Fix clip saving by doing the same patching process and diffusion models.	2026-05-18 11:46:40 -07:00
Jukka Seppänen	971c9e3518	HiDream-O1: support area conditioning (#13944 )	2026-05-18 01:17:05 -04:00
Jukka Seppänen	b39af210d0	Fix Qwen3.5 text generation with multiple input images (#13943 )	2026-05-18 01:16:42 -04:00
comfyanonymous	f48d2a017e	Log which quant ops are enabled/emulated. (#13946 )	2026-05-17 16:30:54 -04:00
drozbay	d3607a8e6d	feat: Add downscaled IC-LoRA support to LTXVAddGuide (CORE-102) (#13896 )	2026-05-16 15:02:57 +08:00
comfyanonymous	5d5a4554e1	Remove useless option and clarify what lowvram does. (#13922 )	2026-05-15 17:59:02 -07:00
Jukka Seppänen	33ce449c8b	Reduce LTX2.3 peak VRAM when guide_mask is in use (CORE-166) (#13735 ) - Reduce peak VRAM by handling self_attn_mask more efficiently - Fallback to SDPA when self_attention_mask is used	2026-05-16 00:02:27 +03:00
Jukka Seppänen	77e2ed5e01	feat: Support MoGe (CORE-168) (#13878 )	2026-05-15 10:34:56 +08:00
Talmaj	74c17a25e5	Fix void failing with RuntimeError: start (0) + length (464) exceeds dimension size (461). (#13873 )	2026-05-13 12:37:30 -07:00
comfyanonymous	2bd65f2091	Better Hidream O1 mem usage factor for non dynamic vram. (#13864 )	2026-05-12 20:55:38 -07:00
comfyanonymous	0155ddcbe3	Fix dtype issue with hidream o1 (#13849 )	2026-05-11 20:53:13 -07:00
Jukka Seppänen	8e53f001a4	feat: Support HiDream-O1-Image (CORE-187) (#13817 ) * Initial HiDream01-image support * Cleanup nodes * Cleaner handling of empty placeholder models * Remove snap_to_predefined, prefer tooltip for the trained resolutions * Add model and block wrappers * Fix shift tooltip * Add node to work around the patch tile issue Experimental, runs multiple passes with the patch grid offset and blends with various different methods. * Qwen35 vision rotary_pos_emb cast fix * Fix embedding layout type * Some small optimizations * Cleanup, don't need this fallback * Prefix KV cache, cleanup Bit of speed, reduce redundant code * Get rid of redundant custom sampler, refactor noise scaling Our existing lcm sampler is mathematically same, just added the missing options to it instead and a node to control them. Refactored the noise scaling and fix it for the stochastic samplers, add a generic node to control the initial noise scale. * Update nodes_hidream_o1.py * Fix some cache validation cases * Keep existing sampling params * Remove redundant video vision path * Replace some numpy ops with torch * Fx RoPE index for batch size > 1 * Prefer torch preprocessing * Rename block_type to be compatible with existing patch nodes * Fixes and tweaks	2026-05-11 20:35:53 -07:00
comfyanonymous	0a7d2ffd68	Support anima TE lora kohya format. (#13847 )	2026-05-11 20:01:52 -07:00
rattus	20e439419c	model_patcher: Fix safetensors saving of fp8 (#13835 ) This was missing proper weight scale casting in the saving path.	2026-05-11 12:48:10 -07:00
box4wangjing	f505cb4070	chore: remove extra word in comment (#13826 )	2026-05-11 11:05:09 +08:00
Jukka Seppänen	3200f28e3a	Support Wan-Dancer (#13813 ) * initial WanDancer support * nodes_wandancer: Add list form of chunker. Create an alternate list form of the node so the chunk gens can be trivially looped by the comfy executor. * Closer match to original soxr resampling * Remove librosa node * Cleanup --------- Co-authored-by: Rattus <rattus128@gmail.com>	2026-05-09 14:02:56 -07:00
comfyanonymous	66669b2ded	I don't think there was any because nobody complained. (#13807 )	2026-05-08 17:32:14 -07:00
Alexis Rolland	c5ecd231a2	fix: Fix bug when mask not on same device (CORE-181) (#13801 )	2026-05-08 23:06:29 +08:00
Yousef R. Gamaleldin	d3c18c1636	Add support for BiRefNet background remove model (CORE-46) (#12747 )	2026-05-08 17:59:24 +08:00
omahs	bac6fc35fb	Fix typos (#10986 )	2026-05-08 17:14:45 +08:00
Talmaj	ef8f25601a	Add I2V for causal forcing model. (#13719 )	2026-05-07 18:38:36 -07:00
Jukka Seppänen	8dc3f3f209	Improve SAM3 large input handling (#13767 )	2026-05-07 17:18:28 -07:00
Jukka Seppänen	cd8c7a2306	Throttle dynamic VRAM prepare logging (#13704 )	2026-05-07 10:41:13 +08:00
Talmaj	78b3096bf3	Void model - pass 1 & 2 (CORE-38) (#13403 )	2026-05-05 19:59:04 -07:00
drozbay	e5369c0eec	feat: Context windows - add causal_window_fix to improve blending of context windows (CORE-100) (#13563 ) * Context windows: add causal_window_fix toggle * Fix slice_cond to correctly handle causal anchor index for temporal offsets	2026-05-05 16:40:53 -07:00
drozbay	1655f8089a	Add temporal_downscale_ratio to LatentFormat (#13702 ) Co-authored-by: ozbayb <17261091+ozbayb@users.noreply.github.com> Co-authored-by: Alexis Rolland <alexisrolland@hotmail.com> Co-authored-by: Jukka Seppänen <40791699+kijai@users.noreply.github.com> Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>	2026-05-05 16:30:00 -07:00
Talmaj	fed8d5efa6	feat: Auto-regressive video generation (CORE-25) (#13082 )	2026-05-04 21:01:22 -07:00

1 2 3 4 5 ...

2227 Commits