* mm: Use Aimdo raw allocator for cast buffers
pytorch manages allocation of growing buffers on streams poorly. Pyt
has no windows support for the expandable segments allocator (which is
the right tool for this job), while also segmenting the memory by
stream such that it can be generally re-used. So kick the problem to
aimdo which can just grow a virtual region thats freed per stream.
* plan
* ops: move cpu handler up to the caller
* ops: split up prefetch from weight prep block prefetching API
Split up the casting and weight formating/lora stuff in prep for
arbitrary prefetch support.
* ops: implement block prefetching API
allow a model to construct a prefetch list and operate it for increased
async offload.
* ltxv2: Implement block prefetching
* Implement lora async offload
Implement async offload of loras.
* pinned_memory: remove JIT RAM pressure release
This doesn't work, as freeing intermediates for pins needs to be
higher-priority than freeing pins-for-pins if and when you are going
to do that. So this is too late as pins-for-pins is model load time
and we dont have JIT pins-for-pins.
* cacheing: Add a filter to only free intermediates from inactive wfs
This is to get priorities in amongst pins straight.
* mm: free inactive-ram from RAM cache first
Stuff from inactive workflows should be freed before anything else.
* caching: purge old ModelPatchers first
Dont try and score them, just dump them at the first sign of trouble
if they arent part of the workflow.
On failure (ex: animated webp files) fallback to old pillow code.
This should fix the extra precision in high bit depth images (like 16 bit PNG) being discarded when loaded by Pillow and potentially add support for more image formats.
Comfy-aimdo 0.3.0 contains several major new features.
multi-GPU support
ARM support
AMD support
Refactorings include:
Linkless architecture - linkage is now performed purely at runtime
to stop host library lookups completely and only interact with the
torch-loaded Nvidia stack.
Elimination of cudart integration on linux. Its no consistent with
windows.
Misc bugfixes and minor features.
SolidMask had a hardcoded device="cpu" while other nodes (e.g.
EmptyImage) follow intermediate_device(). This causes a RuntimeError
when MaskComposite combines masks from different device sources
under --gpu-only.
- SolidMask: use intermediate_device() instead of hardcoded "cpu"
- MaskComposite: align source device to destination before operating
Co-authored-by: Alexis Rolland <alexisrolland@hotmail.com>
Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
* Change save 3d model's filename prefix to 3d/ComfyUI
As this node has already changed from `Save GLB` to `Save 3D Model`, using the filename prefix `3d` will be better than `mesh`
* use lowercase
---------
Fires on v* tag push (earlier than release.published, which can lag)
and triggers a repository_dispatch on Comfy-Org/cloud with event_type
comfyui_tag_pushed. Legacy desktop dispatch in release-webhook.yml
is left untouched.
* Add OpenAPI 3.1 specification for ComfyUI API
Adds a comprehensive OpenAPI 3.1 spec documenting all HTTP endpoints
exposed by ComfyUI's server, including prompt execution, queue management,
file uploads, userdata, settings, system stats, object info, assets,
and internal routes.
The spec was validated against the source code with adversarial review
from multiple models, and passes Spectral linting with zero errors.
Also removes openapi.yaml from .gitignore so the spec is tracked.
* Mark /api/history endpoints as deprecated
Address Jacob's review feedback on PR #13397 by explicitly marking the
three /api/history operations as deprecated in the OpenAPI spec:
* GET /api/history -> superseded by GET /api/jobs
* POST /api/history -> superseded by /api/jobs management
* GET /api/history/{prompt_id} -> superseded by GET /api/jobs/{job_id}
Each operation gains deprecated: true plus a description that names the
replacement. A formal sunset timeline (RFC 8594 Deprecation and RFC 8553
Sunset headers, minimum-runway policy) is being defined separately and
will be applied as a follow-up.
* Address Spectral lint findings in openapi.yaml
- Add operation descriptions to 52 endpoints (prompt, queue, upload,
view, models, userdata, settings, assets, internal, etc.)
- Add schema descriptions to 22 component schemas
- Add parameter descriptions to 8 path parameters that were missing them
- Remove 6 unused component schemas: TaskOutput, EmbeddingsResponse,
ExtensionsResponse, LogRawResponse, UserInfo, UserDataFullInfo
No wire/shape changes. Reduces Spectral findings from 92 to 4. The
remaining 4 are real issues (WebSocket 101 on /ws, loose error schema,
and two snake_case warnings on real wire field names) and are worth
addressing separately.
* fix(openapi): address jtreminio oneOf review on /api/userdata
Restructure the UserData response schemas to address the review feedback
on the `oneOf` without a discriminator, and fix two accuracy bugs found
while doing it.
Changes
- GET /api/userdata response: extract the inline `oneOf` to a named
schema (`ListUserdataResponse`) and add the missing third variant
returned when `split=true` and `full_info=false` (array of
`[relative_path, ...path_components]`). Previously only two of the
three actual server response shapes were described.
- UserDataResponse (POST endpoints): correct the description — this
schema is a single item, not a list — and point at the canonical
`GetUserDataResponseFullFile` schema instead of the duplicate
`UserDataResponseFull`. Also removes the malformed blank line in
`UserDataResponseShort`.
- Delete the now-unused `UserDataResponseFull` and
`UserDataResponseShort` schemas (replaced by reuse of
`GetUserDataResponseFullFile` and an inline string variant).
- Add an `x-variant-selector` vendor extension to both `oneOf` sites
documenting which query-parameter combination selects which branch,
since a true OpenAPI `discriminator` is not applicable (the variants
are type-disjoint and the selector lives in the request, not the
response body).
This keeps the shapes the server actually emits (no wire-breaking
change) while making the selection rule explicit for SDK generators
and readers.
---------
Co-authored-by: guill <jacob.e.segal@gmail.com>
Currently if the graph contains a cycle, the just inifitiate recursions,
hits a catch all then throws a generic error against the output node
that seeded the validation. Instead, fail the offending cycling mode
chain and handlng it as an error in its own right.
Co-authored-by: guill <jacob.e.segal@gmail.com>
This was doing an over-estimate of VRAM used by the async allocator when lots
of little small tensors were in play.
Also change the versioning scheme to == so we can roll forward aimdo without
worrying about stable regressions downstream in comfyUI core.