Commit Graph

23 Commits

Author SHA1 Message Date
f4b99bc623 Made multigpu deepclone load model from disk to avoid needing to deepclone actual model object, fixed issues with merge, turn off cuda backend as it causes device mismatch issue with rope (and potentially other ops), will investigate 2026-02-17 04:55:00 -08:00
6165c38cb5 Optimize nvfp4 lora applying. (#11866)
This changes results a bit but it also speeds up things a lot.
2026-01-14 00:49:38 -05:00
b3c0e4de57 Make loras work on nvfp4 models. (#11837)
The initial applying is a bit slow but will probably be sped up in the
future.
2026-01-12 22:33:54 -05:00
21e8425087 Add warning for old pytorch. (#11718) 2026-01-07 21:07:26 -05:00
edee33f55e Disable comfy kitchen cuda if pytorch cuda less than 13 (#11681) 2026-01-06 22:13:43 -05:00
6da00dd899 Initial ops changes to use comfy_kitchen: Initial nvfp4 checkpoint support. (#11635)
---------

Co-authored-by: Jedrzej Kosinski <kosinkadink1@gmail.com>
2026-01-05 21:48:58 -05:00
791e30ff50 Fix nan issue when quantizing fp16 tensor. (#11213) 2025-12-09 17:03:21 -05:00
43071e3de3 Make old scaled fp8 format use the new mixed quant ops system. (#11000) 2025-12-05 14:35:42 -05:00
6484ac89dc fix QuantizedTensor.is_contiguous (#10956) (#10959) 2025-11-28 16:33:07 -05:00
3f382a4f98 quant ops: Dequantize weight in-place (#10935)
In flux2 these weights are huge (200MB). As plain_tensor is a throw-away
deep copy, do this multiplication in-place to save VRAM.
2025-11-27 08:06:30 -08:00
bdb10a583f Fix loras not working on mixed fp8. (#10899) 2025-11-26 00:07:58 -05:00
015a0599d0 I found a case where this is needed (#10875) 2025-11-25 03:23:19 -05:00
b6805429b9 Allow pinning quantized tensors. (#10873) 2025-11-25 02:48:20 -05:00
25022e0b09 Cleanup and fix issues with text encoder quants. (#10872) 2025-11-25 01:48:53 -05:00
3b3ef9a77a Quantized Ops fixes (#10715)
* offload support, bug fixes, remove mixins

* add readme
2025-11-12 18:26:52 -05:00
af4b7b5edb More fp8 torch.compile regressions fixed. (#10625) 2025-11-03 22:14:20 -05:00
6b88478f9f Bring back fp8 torch compile performance to what it should be. (#10622) 2025-11-03 19:22:10 -05:00
e199c8cc67 Fixes (#10621) 2025-11-03 17:58:24 -05:00
958a17199a People should update their pytorch versions. (#10618) 2025-11-03 17:08:30 -05:00
c58c13b2ba Fix torch compile regression on fp8 ops. (#10580) 2025-11-01 00:25:17 -04:00
906c089957 Fix small performance regression with fp8 fast and scaled fp8. (#10537) 2025-10-29 19:29:01 -04:00
1a58087ac2 Reduce memory usage for fp8 scaled op. (#10531) 2025-10-29 15:43:51 -04:00
8817f8fc14 Mixed Precision Quantization System (#10498)
* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.

* Updated design using Tensor Subclasses

* Fix FP8 MM

* An actually functional POC

* Remove CK reference and ensure correct compute dtype

* Update unit tests

* ruff lint

* Implement mixed precision operations with a registry design and metadate for quant spec in checkpoint.

* Updated design using Tensor Subclasses

* Fix FP8 MM

* An actually functional POC

* Remove CK reference and ensure correct compute dtype

* Update unit tests

* ruff lint

* Fix missing keys

* Rename quant dtype parameter

* Rename quant dtype parameter

* Fix unittests for CPU build
2025-10-28 16:20:53 -04:00