Compare commits

..

3 Commits

3 changed files with 104 additions and 54 deletions

101
AGENTS.md
View File

@ -11,7 +11,8 @@
- Delete obsolete code aggressively when newer infrastructure makes it useless.
Remove dead fallbacks, migration paths, unused options, debug prints, and
compatibility branches that are no longer needed. Do not leave dead branches,
unreachable code, or functions that are never called.
unreachable code, or functions that are never called. If code is not
necessary for the current behavior, remove it.
- Revert or disable problematic behavior quickly when it breaks users. It is
better to remove a broken feature path than keep a complicated partial fix.
- Preserve existing APIs, node names, model-loading behavior, file layout, and
@ -85,6 +86,14 @@
not change a shared method to return extra values, alternate shapes, or
sentinel wrappers for one implementation unless the shared interface is
explicitly updated.
- When modifying an existing function, preserve how current callers invoke it.
Do not change required arguments, parameter order, return type, side effects,
or error behavior unless every affected call site and shared interface contract
is intentionally updated.
- Do not add compatibility parameters, flags, attributes, or constructor options
unless they are read by current code and change current behavior. Remove
pass-through or stored-but-unused values instead of preserving upstream or
deprecated API baggage.
- If an implementation needs auxiliary values for its own workflow, expose them
through a private helper or a clearly named implementation-specific method
instead of overloading the public method's return contract.
@ -111,6 +120,11 @@
- Do not add unnecessary `try`/`except` blocks. Use them for optional dependency,
platform, or backend capability detection only when the program has a useful
fallback. Prefer specific exception types when changing new code.
- Remove any workarounds for PyTorch versions that ComfyUI no longer officially
supports. Deprecated workarounds include catching an exception and rerunning
the same op with the input cast to float. If a workaround does not have a
comment naming the exact PyTorch version or versions that still need it,
remove it.
- Let unsupported model formats, invalid quantization metadata, and bad states
fail with clear errors instead of silently producing lower quality output.
- Match the existing local style in the file you edit. This codebase tolerates
@ -129,8 +143,87 @@
adding parallel code paths. Use `comfy.quant_ops`, `comfy.model_management`,
`comfy.memory_management`, `comfy.pinned_memory`, `comfy_aimdo`, and
`comfy-kitchen` helpers where they already solve the problem.
- Use optimized comfy-kitchen ops in places where they improve performance
without changing the expected dtype, device, memory, or interface behavior.
- All models should use the optimized attention function selected by ComfyUI.
Treat optimized backend functions, dispatch helpers, and capability-selected
callables as opaque. Higher-level code must not inspect function identity,
names, modules, or implementation details to decide behavior.
- Apply the same opacity rule to similar patterns beyond attention: callers
should depend on the documented interface and result contract, not on which
backend implementation was selected underneath.
- Do not use custom inference ops that only duplicate an existing op while
upcasting to float32, such as custom RMSNorm variants. Use the generic ComfyUI
ops and/or native torch ops instead.
- If a model class `__init__` has an `operations` parameter, assume
`operations` is never `None`. Do not add fallback branches or default torch
ops for a missing `operations` object.
- Do not add unnecessary parameters to model, model block, or model ops related
classes. Constructor and forward signatures should carry only values that are
actually needed by that object for inference.
- Reuse existing model classes, blocks, ops, and helper modules when appropriate.
Before implementing a new version of a model component, search the existing
model code for a class or helper that already provides the behavior.
- Avoid adding `einops` usage in core inference code. Use native torch tensor
ops such as `reshape`, `view`, `permute`, `transpose`, `flatten`, `unflatten`,
`unsqueeze`, and `squeeze` instead.
- Do not use tensors as general-purpose Python data structures. Keep metadata,
bookkeeping, counters, flags, shape math, padding math, index planning, memory
estimates, and control-flow decisions in plain Python values unless the data
must participate directly in tensor computation. Avoid creating temporary
tensors just to use tensor methods for scalar or structural calculations.
- Avoid unnecessary casts and transfers. Preserve the intended compute dtype,
storage dtype, bias dtype, and original tensor shape metadata.
- Assume inputs to the main model forward are already in the compute dtype by
default, except integer inputs such as some model timestep tensors. Do not add
defensive or convenience casts in model code; it is better for invalid dtype
plumbing to error clearly than to hide it with unnecessary casts.
- Raw model parameters that are not owned by an op and may be initialized in a
dtype different from the compute dtype should be cast at use in forward or
inference code with `comfy.ops.cast_to_input` or
`comfy.model_management.cast_to` to avoid dtype mismatches.
- Model code should not care what dtype it is initialized in, and model
`__init__` methods should not contain workarounds for specific dtypes. Dtype
workaround code, such as making a model work with fp16 compute, belongs in the
execution or model-management layer that owns compute policy.
- Model code should not perform unnecessary device-to-CPU or CPU-to-device
transfers. New allocations must be created on the correct device and dtype;
never allocate on CPU and then move to GPU, or allocate in one dtype and then
convert to another.
- Model code itself should not perform memory management. Loading, unloading,
offloading, device movement, VRAM policy, cache lifetime, and cleanup belong
in the relevant model-management and execution layers, not inside model
implementations.
- Do not add global, module-level, class-level, singleton, or model-owned stores
for tensors or other large memory that persist across executions. Temporary
caches must be scoped to a single execution or forward/encode/decode call:
allocate them in the owning top-level call, pass them explicitly through the
call stack, and let them be discarded when that call returns.
- Follow the Wan VAE temporal cache pattern for temporary caches: create a local
cache such as `feat_map` for the encode/decode operation, pass it into the
blocks that need it, and do not retain it on the model or in global state.
- In model init code, prefer `torch.empty` for parameter/buffer placeholders
that are populated from the model state dict instead of zero-initializing with
`torch.zeros` or similar. If an allocation is not loaded from the state dict
and is useless for inference, do not include it.
- `nn.Parameter` tensors that are stored in and populated from the model state
dict should be initialized with `torch.empty`, not with zero, random, or
otherwise meaningful initialization.
- Model initialization should describe module structure, not fabricate
checkpoint-owned tensor contents. Parameters and buffers that are loaded from
the state dict must not be manually initialized, reassigned, or filled with
fallback values unless that value is actually used when no checkpoint key
exists.
- When slicing large tensors, copy the slice if the sliced tensor's lifetime
exceeds the current function scope. Do not keep a long-lived view into a large
backing tensor when a smaller copy would release memory sooner.
- Use fused or compound torch operations such as `addcmul` when they naturally
match the math. Reducing Python and torch dispatch overhead is a valid
optimization when it does not obscure the code or change dtype/device
behavior.
- Avoid caches that persist across different executions as much as possible.
Persistent caches are acceptable only when they use a very minimal amount of
memory and have a clear ownership and invalidation story.
- When optimizing, favor small measurable changes: fewer allocations, fewer
device transfers, less peak memory, better batching, or use of a faster
existing backend op.
@ -141,6 +234,12 @@
`CATEGORY`, and registration through the local mapping used by that file.
- Keep node changes backward compatible by default. Add inputs with sensible
defaults and avoid changing output types unless the request requires it.
- Model implementations should add the minimal number of ComfyUI nodes required
to run the model. Reuse existing nodes as much as possible; adapting the model
to work with existing nodes is strongly preferred over creating new nodes.
- Node-level code must not patch model code directly. Any node behavior that
modifies, wraps, hooks, or changes model behavior must go through the model
patcher class instead of reaching into model internals.
- The official mascot of ComfyUI is a very cute anime girl with massive fennec
ears, a big fluffy tail, long blonde wavy hair, and blue eyes. Feel free to
use her in ComfyUI materials, UI text, examples, tests, generated assets, or

View File

@ -167,7 +167,7 @@ class Qwen3VLTokenizer(sd1_clip.SD1Tokenizer):
embed_count = 0
for r in tokens[key_name]:
for i in range(len(r)):
if r[i][0] == 151655: # <|image_pad|>
if isinstance(r[i][0], (int, float)) and r[i][0] == 151655: # <|image_pad|>
if len(images) > embed_count:
r[i] = ({"type": "image", "data": images[embed_count], "original_type": "image"},) + r[i][1:]
embed_count += 1

View File

@ -166,32 +166,6 @@ def boxes_to_regions(boxes, width: int, height: int) -> list:
return regions
def normalize_incoming_boxes(bboxes) -> list:
if isinstance(bboxes, dict):
frame = [bboxes]
elif not isinstance(bboxes, list) or not bboxes:
frame = []
elif isinstance(bboxes[0], dict):
frame = bboxes
else:
frame = bboxes[0] if isinstance(bboxes[0], list) else []
boxes = []
for box in frame:
if not isinstance(box, dict):
continue
norm = {
"x": box.get("x", 0),
"y": box.get("y", 0),
"width": box.get("width", 0),
"height": box.get("height", 0),
}
meta = box.get("metadata")
if isinstance(meta, dict):
norm["metadata"] = meta
boxes.append(norm)
return boxes
def _norm_bbox(region: dict) -> list[int]:
def grid(value: float) -> int:
return max(0, min(1000, round(value * 1000)))
@ -225,8 +199,6 @@ def build_elements(regions: list) -> list:
class CreateBoundingBoxes(io.ComfyNode):
_last_incoming: dict = {}
@classmethod
def define_schema(cls):
editor_state = io.BoundingBoxes.Input(
@ -245,12 +217,6 @@ class CreateBoundingBoxes(io.ComfyNode):
optional=True,
tooltip="Optional image used as background in the canvas and preview.",
),
io.BoundingBox.Input(
"bboxes",
force_input=True,
optional=True,
tooltip="Bounding boxes from an upstream node. A new upstream value seeds the canvas; edits you make on the canvas take priority and are kept until the upstream value changes again.",
),
io.Int.Input("width", default=1024, min=64, max=16384, step=16,
tooltip="Width of the canvas and the pixel grid for the bounding boxes."),
io.Int.Input("height", default=1024, min=64, max=16384, step=16,
@ -262,33 +228,18 @@ class CreateBoundingBoxes(io.ComfyNode):
io.BoundingBox.Output(display_name="bboxes"),
io.Array.Output(display_name="elements"),
],
hidden=[io.Hidden.unique_id],
is_output_node=True,
is_experimental=True,
)
@classmethod
def execute(cls, width, height, editor_state=None, background=None, bboxes=None) -> io.NodeOutput:
incoming = normalize_incoming_boxes(bboxes)
node_id = cls.hidden.unique_id
if incoming:
changed = cls._last_incoming.get(node_id) != incoming
if changed:
cls._last_incoming[node_id] = incoming
else:
changed = False
cls._last_incoming.pop(node_id, None)
source = incoming if changed else (editor_state or incoming)
regions = boxes_to_regions(source, width, height)
def execute(cls, width, height, editor_state=None, background=None) -> io.NodeOutput:
regions = boxes_to_regions(editor_state, width, height)
preview = render_preview(regions, width, height, _bg_from_image(background))
ui = {"dims": [width, height]}
if incoming:
ui["input_bboxes"] = incoming
return io.NodeOutput(
preview,
fractions_to_bbox_frame(regions, width, height),
build_elements(regions),
ui=ui,
ui={"dims": [width, height]},
)