Benchmarked hybrid (main thread + pool) vs all-pool on 2x RTX 4090
with SD1.5 and NetaYume models. No meaningful performance difference
(within noise). All-pool is simpler: eliminates the main_device
special case, main_batch_tuple deferred execution, and the 3-way
branch in the dispatch loop.
Replace per-step thread create/destroy in _calc_cond_batch_multigpu with a
persistent MultiGPUThreadPool. Each worker thread calls torch.cuda.set_device()
once at startup, preserving compiled kernel caches across diffusion steps.
- Add MultiGPUThreadPool class in comfy/multigpu.py
- Create pool in CFGGuider.outer_sample(), shut down in finally block
- Main thread handles its own device batch directly for zero overhead
- Falls back to sequential execution if no pool is available
When a multigpu clone ModelPatcher is garbage collected, LoadedModel._switch_parent
switches the weakref to point at the parent (main) ModelPatcher. However, it was not
updating LoadedModel.device, leaving it with the old clone's device (e.g., cuda:1).
On subsequent runs, this stale device was passed to ModelPatcherDynamic.load(), causing
an assertion failure (device_to != self.load_device).
Amp-Thread-ID: https://ampcode.com/threads/T-019d3f5c-28c5-72c9-abed-34681f1b54ba
Co-authored-by: Amp <amp@ampcode.com>
Rename all 11 nodes in the utils/string category to include a "Text"
prefix for better discoverability and natural sorting. Regex nodes get
user-friendly names without "Regex" in the display name.
Renames:
- Concatenate → Text Concatenate
- Substring → Text Substring
- Length → Text Length
- Case Converter → Text Case Converter
- Trim → Text Trim
- Replace → Text Replace
- Contains → Text Contains
- Compare → Text Compare
- Regex Match → Text Match
- Regex Extract → Text Extract Substring
- Regex Replace → Text Replace (Regex)
All renamed nodes include their old display name as a search alias so
users can still find them by searching the original name. Regex nodes
also include "regex" as a search alias.
* mm: Lower windows pin threshold
Some workflows have more extranous use of shared GPU memory than is
accounted for in the 5% pin headroom. Lower this for safety.
* mm: Remove pin count clearing threshold.
TOTAL_PINNED_MEMORY is shared between the legacy and aimdo pinning
systems, however this catch-all assumes only the legacy system exists.
Remove the catch-all as the PINNED_MEMORY buffer is coherent already.
The /view endpoint returns text/plain for .svg files on some platforms
because Python's mimetypes module does not always include SVG by default.
Explicitly register image/svg+xml so <img> tags can render SVGs correctly.
Amp-Thread-ID: https://ampcode.com/threads/T-019d2da7-6a64-726a-af91-bd9c44e7f43c