* initial gemma4 support
* parity with reference implementation
outputs can 100% match transformers with same sdpa flags, checkpoint this and then optimize
* Cleanup, video fixes
* cleanup, enable fused rms norm by default
* update comment
* Cleanup
* Update sd.py
* Various fixes
* Add fp8 scaled embedding support
* small fixes
* Translate think tokens
* Fix image encoder attention mask type
So it works with basic attention
* Handle thinking tokens different only for Gemma4
* Code cleanup
* Update nodes_textgen.py
* Use embed scale class instead of buffer
Slight difference to HF, but technically more accurate and simpler code
* Default to fused rms_norm
* Update gemma4.py
* feat(api-nodes): add Topaz Astra 2 model
Signed-off-by: bigcat88 <bigcat88@icloud.com>
* feat(api-nodes): make Astra 2 the default Topaz upscaler model
Reorder UPSCALER_MODELS_MAP and the upscaler_model dynamic combo so
"Astra 2" appears first, surfacing it as the default selection.
---------
Signed-off-by: bigcat88 <bigcat88@icloud.com>
Co-authored-by: Marwan Mostafa <marawan206@gmail.com>
* mm: Use Aimdo raw allocator for cast buffers
pytorch manages allocation of growing buffers on streams poorly. Pyt
has no windows support for the expandable segments allocator (which is
the right tool for this job), while also segmenting the memory by
stream such that it can be generally re-used. So kick the problem to
aimdo which can just grow a virtual region thats freed per stream.
* plan
* ops: move cpu handler up to the caller
* ops: split up prefetch from weight prep block prefetching API
Split up the casting and weight formating/lora stuff in prep for
arbitrary prefetch support.
* ops: implement block prefetching API
allow a model to construct a prefetch list and operate it for increased
async offload.
* ltxv2: Implement block prefetching
* Implement lora async offload
Implement async offload of loras.
* pinned_memory: remove JIT RAM pressure release
This doesn't work, as freeing intermediates for pins needs to be
higher-priority than freeing pins-for-pins if and when you are going
to do that. So this is too late as pins-for-pins is model load time
and we dont have JIT pins-for-pins.
* cacheing: Add a filter to only free intermediates from inactive wfs
This is to get priorities in amongst pins straight.
* mm: free inactive-ram from RAM cache first
Stuff from inactive workflows should be freed before anything else.
* caching: purge old ModelPatchers first
Dont try and score them, just dump them at the first sign of trouble
if they arent part of the workflow.
On failure (ex: animated webp files) fallback to old pillow code.
This should fix the extra precision in high bit depth images (like 16 bit PNG) being discarded when loaded by Pillow and potentially add support for more image formats.
Comfy-aimdo 0.3.0 contains several major new features.
multi-GPU support
ARM support
AMD support
Refactorings include:
Linkless architecture - linkage is now performed purely at runtime
to stop host library lookups completely and only interact with the
torch-loaded Nvidia stack.
Elimination of cudart integration on linux. Its no consistent with
windows.
Misc bugfixes and minor features.
SolidMask had a hardcoded device="cpu" while other nodes (e.g.
EmptyImage) follow intermediate_device(). This causes a RuntimeError
when MaskComposite combines masks from different device sources
under --gpu-only.
- SolidMask: use intermediate_device() instead of hardcoded "cpu"
- MaskComposite: align source device to destination before operating
Co-authored-by: Alexis Rolland <alexisrolland@hotmail.com>
Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
* Change save 3d model's filename prefix to 3d/ComfyUI
As this node has already changed from `Save GLB` to `Save 3D Model`, using the filename prefix `3d` will be better than `mesh`
* use lowercase
---------
Fires on v* tag push (earlier than release.published, which can lag)
and triggers a repository_dispatch on Comfy-Org/cloud with event_type
comfyui_tag_pushed. Legacy desktop dispatch in release-webhook.yml
is left untouched.