Files
cutlass/FUNCTIONALITY.md
2025-07-21 22:03:55 -04:00

2.7 KiB

Changelog for CuTe DSL API changes

4.1.0 (2025-07-16)

  • for loop
    • Python built-in range now always generates IR and executes at runtime
    • cutlass.range is advanced range with IR level unrolling and pipelining control
    • Deprecated cutlass.range_dynamic, please replace with range or cutlass.range
    • Experimental Added pipelining control for compiler generated software pipeline code
  • while/if
    • while/if now by default generates IR and executes at runtime unless cutlass.const_expr is specified for the predicate
    • Deprecated cutlass.dynamic_expr, please remove it
  • Rename mbarrier functions to reduce ambiguity
  • Modify SyncObject API (MbarrierArray, NamedBarrier, TmaStoreFence) to match std::barrier
  • Change pipeline create function to take only keyword arguments, and make barrier_storage optional.
  • Introduce cutlass.cute.arch.get_dyn_smem_size api to get runtime dynamic shared memory size.
  • Various API Support for SM100 BlockScaled Gemm
    • Introduce BlockScaled MmaOps in tcgen05/mma.py, and provide a make_blockscaled_trivial_tiled_mma function in blackwell_helpers.py to help construct a BlockScaled TiledMma.
    • Introduce S2T CopyOps in tcgen05/copy.py.
    • Introduce BlockScaled layout utilities in blockscaled_layout.py for creating the required scale factor layouts in global memory, shared memory and tensor memory.
  • cutlass.cute.compile now supports compilation options. Refer to JIT compilation options for more details.
  • cutlass.cute.testing.assert_ now works for device JIT function. Specify --enable-device-assertions as compilation option to enable.
  • cutlass.cute.make_tiled_copy is now deprecated. Please use cutlass.cute.make_tiled_copy_tv instead.
  • Shared memory capacity query
    • Introduce cutlass.utils.get_smem_capacity_in_bytes for querying the shared memory capacity.
    • <arch>_utils.SMEM_CAPACITY["<arch_str>"] is now deprecated.

4.0.0 (2025-06-03)

  • Fixed API mismatch in class cute.runtime.Pointer: change element_type to dtype to match typing.Pointer