Commit Graph

1938 Commits

Author SHA1 Message Date
3b832231bb Flux2 Klein support. (#11890) 2026-01-15 10:33:15 -05:00
be518db5a7 Remove extraneous clip missing warnings when loading LTX2 embeddings_connector weights (#11874) 2026-01-14 17:54:04 -05:00
80441eb15e utils: fix lanczos grayscale upscaling (#11873) 2026-01-14 17:53:16 -05:00
6165c38cb5 Optimize nvfp4 lora applying. (#11866)
This changes results a bit but it also speeds up things a lot.
2026-01-14 00:49:38 -05:00
712cca36a1 feat: throttle ProgressBar updates to reduce WebSocket flooding (#11504) 2026-01-13 22:41:44 -05:00
eff2b9d412 Optimize nvfp4 lora applying. (#11856) 2026-01-13 19:37:19 -05:00
15b312de7a Optimize nvfp4 lora applying. (#11854) 2026-01-13 19:23:58 -05:00
1dcbd9efaf Bump ltxav mem estimation a bit. (#11842) 2026-01-13 01:42:07 -05:00
117e7a5853 Refactor to try to lower mem usage. (#11840) 2026-01-12 21:01:52 -08:00
b3c0e4de57 Make loras work on nvfp4 models. (#11837)
The initial applying is a bit slow but will probably be sped up in the
future.
2026-01-12 22:33:54 -05:00
fd5c0755af Reduce LTX2 VRAM use by more efficient timestep embed handling (#11829) 2026-01-12 17:28:59 -05:00
c881a1d689 Support the siglip 2 naflex model as a clip vision model. (#11831)
Not useful yet.
2026-01-12 17:05:54 -05:00
a3b5d4996a Support ModelScope-Trainer DiffSynth lora for Z Image. (#11805) 2026-01-12 15:38:46 -05:00
2f642d5d9b Fix chroma fp8 te being treated as fp16. (#11795) 2026-01-10 14:40:42 -08:00
cd912963f1 Fix issue with t5 text encoder in fp4. (#11794) 2026-01-10 17:31:31 -05:00
6e4b1f9d00 pythorch_attn_by_def_on_gfx1200 (#11793) 2026-01-10 16:51:05 -05:00
dc202a2e51 Properly save mixed ops. (#11772) 2026-01-10 02:03:57 -05:00
bd0e6825e8 Be less strict when loading mixed ops weights. (#11769) 2026-01-09 14:21:06 -05:00
1dc3da6314 Add most basic Asset support for models (#11315)
* Brought over minimal elements from PR 10045 to reproduce seed_assets and register_assets_system without adding anything to the DB or server routes yet, for now making everything sync (can introduce async once everything is cleaned up and brought over)

* Added db script to insert assets stuff, cleaned up some code; assets (models) now get added/rescanned

* Added support for 5 http endpoints for assets

* Replaced Optional with | None in schemas_in.py and schemas_out.py

* Remove two routes that will not be relevant yet in this PR: HEAD /api/assets/hash/<hash> and PUT /api/assets/<id>/preview

* Remove some functions the two deleted endpoints were using

* Don't show assets scan message upon calling /object_info endpoint

* removed unsued import to satisfy ruff

* Simplified hashing function tpye hint and _hash_file_obj

* Satisfied ruff
2026-01-08 22:21:51 -05:00
1a20656448 Fix import issue. (#11746) 2026-01-08 17:23:59 -05:00
0f11869d55 Better detection if AMD torch compiled with efficient attention. (#11745) 2026-01-08 17:16:58 -05:00
50d6e1caf4 Tweak ltxv vae mem estimation. (#11722) 2026-01-07 23:07:05 -05:00
21e8425087 Add warning for old pytorch. (#11718) 2026-01-07 21:07:26 -05:00
b6c79a648a ops: Fix offloading with FP8MM performance (#11697)
This logic was checking comfy_cast_weights, and going straight to
to the forward_comfy_cast_weights implementation without
attempting to downscale input to fp8 in the event comfy_cast_weights
is set.

The main reason comfy_cast_weights would be set would be for async
offload, which is not a good reason to nix FP8MM.

So instead, and together the underlying exclusions for FP8MM which
are:

* having a weight_function (usually LowVramPatch)
* force_cast_weights (compute dtype override)
* the weight is not Quantized
* the input is already quantized
* the model or layer has MM explictily disabled.

If you get past all of those exclusions, quantize the input tensor.
Then hand the new input, quantized or not off to
forward_comfy_cast_weights to handle it. If the weight is offloaded
but input is quantized you will get an offloaded MM8.
2026-01-07 21:01:16 -05:00
25bc1b5b57 Add memory estimation function to ltxav text encoder. (#11716) 2026-01-07 20:11:22 -05:00
3cd19e99c1 Increase ltxav mem estimation by a bit. (#11715) 2026-01-07 20:04:56 -05:00
34751fe9f9 Lower ltxv text encoder vram use. (#11713) 2026-01-07 19:12:15 -05:00
48e5ea1dfd model_patcher: Remove confusing load stat (#11710)
If the loader passes 1e32 as the usable memory size, it means force
the full load. This happens with CPU loads and a few other misc cases.
Removing the confusing number and just leave the other details.
2026-01-07 18:39:20 -05:00
3cd7b32f1b Support gemma 12B with quant weights. (#11696) 2026-01-07 05:15:14 -05:00
b7d7cc1d49 Fix fp8 fast issue. (#11688) 2026-01-07 01:39:06 -05:00
edee33f55e Disable comfy kitchen cuda if pytorch cuda less than 13 (#11681) 2026-01-06 22:13:43 -05:00
2c03884f5f Skip fp4 matrix mult on devices that don't support it. (#11677) 2026-01-06 18:07:26 -05:00
6e9ee55cdd Disable ltxav previews. (#11676) 2026-01-06 17:41:27 -05:00
023cf13721 Fix lowvram issue with ltxv2 text encoder. (#11675) 2026-01-06 17:33:03 -05:00
c3c3e93c5b Use rope functions from comfy kitchen. (#11674) 2026-01-06 16:57:50 -05:00
1618002411 Revert "Use rope functions from comfy kitchen. (#11647)" (#11648)
This reverts commit 6ef85c4915.
2026-01-05 23:07:39 -05:00
6ef85c4915 Use rope functions from comfy kitchen. (#11647) 2026-01-05 22:50:35 -05:00
6da00dd899 Initial ops changes to use comfy_kitchen: Initial nvfp4 checkpoint support. (#11635)
---------

Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
2026-01-05 21:48:58 -05:00
d157c3299d Refactor module_size function. (#11637) 2026-01-05 03:48:31 -05:00
f2b002372b Support the LTXV 2 model. (#11632) 2026-01-05 01:58:59 -05:00
53e762a3af Print memory summary on OOM to help with debugging. (#11613) 2026-01-03 22:28:38 -05:00
9a552df898 Remove leftover scaled_fp8 key. (#11603) 2026-01-02 17:28:10 -08:00
65cfcf5b1b New Year ruff cleanup. (#11595) 2026-01-01 22:06:14 -05:00
1bdc9a947f Remove duplicate import of model_management (#11587) 2025-12-31 19:29:55 -05:00
d622a61874 Refactor: move clip_preprocess to comfy.clip_model (#11586) 2025-12-31 17:38:36 -05:00
0357ed7ec4 Add support for sage attention 3 in comfyui, enable via new cli arg (#11026)
* Add support for sage attention 3 in comfyui, enable via new cli arg
--use-sage-attiention3

* Fix some bugs found in PR review. The N dimension at which Sage
Attention 3 takes effect is reduced to 1024 (although the improvement is
not significant at this scale).

* Remove the Sage Attention3 switch, but retain the attention function
registration.

* Fix a ruff check issue in attention.py
2025-12-30 22:53:52 -05:00
178bdc5e14 Add handling for vace_context in context windows (#11386)
Co-authored-by: ozbayb <17261091+ozbayb@users.noreply.github.com>
2025-12-30 14:40:42 -08:00
0e6221cc79 Add some warnings for pin and unpin errors. (#11561) 2025-12-29 18:26:42 -05:00
9ca7e143af mm: discard async errors from pinning failures (#10738)
Pretty much every error cudaHostRegister can throw also queues the same
error on the async GPU queue. This was fixed for repinning error case,
but there is the bad mmap and just enomem cases that are harder to
detect.

Do some dummy GPU work to clean the error state.
2025-12-29 18:19:34 -05:00
8fd07170f1 Comment out unused norm_final in lumina/z image model. (#11545) 2025-12-28 22:07:25 -05:00