v4.1 release
This commit is contained in:
@ -10,109 +10,130 @@ Control Flow
|
||||
|
||||
Overview
|
||||
--------
|
||||
|DSL| walks Python’s AST and converts each control-flow construct it finds into
|
||||
|DSL| walks Python's AST and converts each control-flow construct it finds into
|
||||
structured |IR|. You can therefore write ordinary Python loops and branches
|
||||
while the compiler decides—statement by statement—whether to
|
||||
|
||||
* **evaluate at compile time** if the controlling value is a |Constexpr|, or
|
||||
* **emit intermediate representation (IR)** when the value is dynamic.
|
||||
* **evaluate at compile time** if it's a native Python control flow, or
|
||||
* **emit intermediate representation (IR)** when the control flow is marked as dynamic.
|
||||
|
||||
Passing |IR| values to a native Python control flow will result in an error.
|
||||
|
||||
For a high-level discussion of the overall pipeline, see
|
||||
:doc:`the code-generation overview <dsl_code_generation>`.
|
||||
|
||||
|
||||
For Loops
|
||||
---------
|
||||
|DSL| recognises three kinds of ranges for ``for`` loops:
|
||||
|
||||
* ``range`` – the Python built-in
|
||||
* ``cutlass.range_dynamic`` – always lowers to |IR|
|
||||
* ``cutlass.range_constexpr`` – always unrolls at compile time
|
||||
* ``range`` – the Python built-in, always lowered to |IR|
|
||||
* ``cutlass.range`` - Same as Python built-in ``range``, but supports advanced unrolling and pipelining control
|
||||
* ``cutlass.range_constexpr`` – unrolled at compile time
|
||||
|
||||
|
||||
range(...)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
The AST rewriter inserts a small helper stub. At runtime the loop bounds are
|
||||
inspected:
|
||||
|
||||
* **Constant bounds** → the loop is unrolled at compile time.
|
||||
* **Dynamic bounds** → the loop is emitted as structured |IR|.
|
||||
|
||||
|
||||
cutlass.range_dynamic(...)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Use when you *always* want a loop in the generated |IR|, even if the bounds
|
||||
look constant.
|
||||
|
||||
range(...)/cutlass.range(...)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Use when you *always* want a loop in the generated |IR|, even if the inputs
|
||||
are Python values.
|
||||
|
||||
cutlass.range_constexpr(...)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Runs in the Python interpreter and is fully unrolled before code generation.
|
||||
All loop indices must be |Constexpr|.
|
||||
|
||||
|
||||
Limitations of Dynamic For Loops
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
* Early-exit ``break``, ``continue``, or raising exception are not yet supported.
|
||||
* Operations in the loop body are traced only when tracing is active in that
|
||||
region.
|
||||
|
||||
|
||||
**Example:**
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@cute.jit
|
||||
def loop_example():
|
||||
n = 10
|
||||
@cute.jit
|
||||
def control_flow_examples(bound: cutlass.Int32):
|
||||
n = 10
|
||||
|
||||
# ❌ This loop is dynamic, early-exit isn't allowed.
|
||||
for i in cutlass.range_dynamic(n):
|
||||
if i == 5:
|
||||
break # Early-exit
|
||||
cute.printf("%d\\n", i)
|
||||
# ✅ This loop is Python loop, evaluated at compile time.
|
||||
for i in cutlass.range_constexpr(n):
|
||||
cute.printf("%d\\n", i)
|
||||
|
||||
# ✅ This loop is dynamic, even when bound is Python value.
|
||||
for i in range(n):
|
||||
cute.printf("%d\\n", i)
|
||||
|
||||
# ❌ This loop bound is a dynamic value, not allowed in Python loop.
|
||||
# Should use `range` instead.
|
||||
for i in cutlass.range_constexpr(bound):
|
||||
cute.printf("%d\\n", i)
|
||||
|
||||
# ✅ This loop is dynamic, emitted IR loop.
|
||||
for i in range(bound):
|
||||
cute.printf("%d\\n", i)
|
||||
|
||||
# ✅ This loop is dynamic, emitted IR loop with unrolling
|
||||
for i in cutlass.range(bound, unroll=2):
|
||||
cute.printf("%d\\n", i)
|
||||
|
||||
# ✅ This loop is constexpr, early-exit is allowed.
|
||||
for i in cutlass.range_constexpr(n):
|
||||
if i == 5:
|
||||
break # Early-exit
|
||||
cute.printf("%d\\n", i)
|
||||
|
||||
If-Else Statements
|
||||
------------------
|
||||
|
||||
Standard Python ``if``/``else`` is supported.
|
||||
Standard Python ``if``/``elif``/``else`` is supported.
|
||||
|
||||
* **Predicate is Constexpr (compile-time Python value)** → evaluated at compile time.
|
||||
* **Predicate is dynamic** → lowered to |IR|.
|
||||
* **Predicate without annotation** → lowered to |IR|.
|
||||
* **Predicate annotated with `cutlass.const_expr`** → evaluated at compile time.
|
||||
|
||||
**Example:**
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@cute.jit
|
||||
def main(const_var: cutlass.Constexpr, dynamic_var: cutlass.Int32):
|
||||
if const_var: # compile-time branch
|
||||
cute.printf("Const branch\\n")
|
||||
else:
|
||||
cute.printf("Const else\\n")
|
||||
@cute.jit
|
||||
def main(const_var: cutlass.Constexpr, dynamic_var: cutlass.Int32):
|
||||
# ✅ This branch is Python branch, evaluated at compile time.
|
||||
if cutlass.const_expr(const_var):
|
||||
cute.printf("Const branch\\n")
|
||||
else:
|
||||
cute.printf("Const else\\n")
|
||||
|
||||
if dynamic_var == 10: # dynamic branch
|
||||
cute.printf("Dynamic True\\n")
|
||||
else:
|
||||
cute.printf("Dynamic False\\n")
|
||||
# ✅ This branch is dynamic branch, emitted IR branch.
|
||||
if dynamic_var == 10:
|
||||
cute.printf("Dynamic True\\n")
|
||||
else:
|
||||
cute.printf("Dynamic False\\n")
|
||||
|
||||
# ❌ Using a dynamic value with `cutlass.const_expr` is not allowed.
|
||||
if cutlass.const_expr(dynamic_var == 10):
|
||||
cute.printf("Bound is 10\\n")
|
||||
|
||||
Similarly to for-loops, the ``if cutlass.const_expr`` and ``if cutlass.dynamic_expr`` constructs can
|
||||
be used to force the evaluation at compile-time or the generation of IR, respectively. Unstructured
|
||||
control flow is only supported when using ``if cutlass.const_expr``.
|
||||
|
||||
While Loops
|
||||
-----------
|
||||
|
||||
Python ``while`` loops are always treated as **dynamic** because the loop condition may become
|
||||
dynamic after the first iteration. Similarly to for-loops and ``if``/``else``, the
|
||||
``while cutlass.const_expr`` and ``while cutlass.dynamic_expr`` constructs are available.
|
||||
Standard Python ``while`` is supported.
|
||||
|
||||
* **Condition without annotation** → lowered to |IR|.
|
||||
* **Condition annotated with `cutlass.const_expr`** → evaluated at compile time.
|
||||
|
||||
**Example:**
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@cute.jit
|
||||
def main(dynamic_var: cutlass.Int32):
|
||||
n = 0
|
||||
|
||||
# ✅ This is Python while loop, evaluated at compile time.
|
||||
while cutlass.const_expr(n < 10):
|
||||
cute.printf("Const branch\\n")
|
||||
n += 1
|
||||
|
||||
# ✅ This is dynamic while loop, emitted IR while loop.
|
||||
while dynamic_var == 10:
|
||||
cute.printf("Dynamic True\\n")
|
||||
n += 1
|
||||
|
||||
# ❌ Using a dynamic value with `cutlass.const_expr` is not allowed.
|
||||
while cutlass.const_expr(n < dynamic_var):
|
||||
n += 1
|
||||
|
||||
|
||||
Compile-Time Metaprogramming
|
||||
----------------------------
|
||||
@ -127,7 +148,7 @@ an optional **ReLU** epilogue:
|
||||
def gemm(..., do_relu: cutlass.Constexpr):
|
||||
# main GEMM work
|
||||
...
|
||||
if const_expr(do_relu): # compile-time guard
|
||||
if cutlass.const_expr(do_relu): # compile-time guard
|
||||
# ReLU code is emitted only when do_relu is True
|
||||
...
|
||||
|
||||
@ -135,3 +156,45 @@ an optional **ReLU** epilogue:
|
||||
|
||||
gemm(..., False) # ReLU is omitted from the generated |IR|
|
||||
gemm(..., True) # ReLU is included
|
||||
|
||||
|
||||
Limitations of Dynamic Control Flow
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
* Early-exit ``break``, ``continue``, ``pass`` or raising exception from
|
||||
control flow body are not yet supported.
|
||||
* Operations in the control flow body are traced only when tracing is active in
|
||||
that region.
|
||||
* Values originating in control flow body are not available outside the control
|
||||
flow.
|
||||
* Changing type of a variable in control flow body is not allowed.
|
||||
|
||||
**Example:**
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
@cute.jit
|
||||
def control_flow_negative_examples(predicate: cutlass.Boolean):
|
||||
n = 10
|
||||
|
||||
# ❌ This loop is dynamic, early-exit isn't allowed.
|
||||
for i in cutlass.range_dynamic(n):
|
||||
if i == 5:
|
||||
break # Early-exit
|
||||
|
||||
if predicate:
|
||||
val = 10
|
||||
# ❌ return from control flow body is not allowed.
|
||||
return
|
||||
# ❌ Raising exception from control flow body is not allowed.
|
||||
raise ValueError("This is not allowed")
|
||||
# ❌ Using pass in control flow body is not allowed.
|
||||
pass
|
||||
|
||||
# ❌ val is not available outside the dynamic if
|
||||
cute.printf("%d\\n", val)
|
||||
|
||||
if predicate:
|
||||
# ❌ Changing type of a variable in control flow body is not allowed.
|
||||
n = 10.0
|
||||
|
||||
|
||||
@ -39,7 +39,7 @@ General
|
||||
the GitHub code only exists as a way for users to file issues and pull requests against.
|
||||
While it can be used with the pip wheel, we do not recommend most users do so unless they are
|
||||
hacking on the DSL itself. For all other users, we recommend they
|
||||
simply ``pip install nvidia-cutlas-dsl`` and use the pip wheel as the single source
|
||||
simply ``pip install nvidia-cutlass-dsl`` and use the pip wheel as the single source
|
||||
of truth for the dialect compiler and DSL implementation. CUTLASS GitHub repository will
|
||||
contain a ``requirements.txt`` file pinning the version of the wheel consistent with the state
|
||||
of the OSS repository (please see :doc:`quick_start`). This means getting started with
|
||||
|
||||
@ -18,7 +18,6 @@ Notable unsupported features
|
||||
----------------------------
|
||||
|
||||
- GeForce RTX 50 Series support
|
||||
- RS WGMMA (The input matrix A comes from register and the input matrix B comes from shared memory)
|
||||
- Programmatic Dependent Launch (PDL)
|
||||
- narrow-precision data type support, including related tensor core instructions
|
||||
- convolutions
|
||||
@ -31,6 +30,10 @@ Notable unsupported features
|
||||
Programming Model
|
||||
---------------------
|
||||
|
||||
**CuTe Layout Algebra Only support 32bit**
|
||||
Today, we only support 32bit shapes/strides in CuTe layouts. 64bit or arbitrary
|
||||
width support is planned for future releases.
|
||||
|
||||
**Python Native Data Types**
|
||||
CuTe DSL supports Python data structures when used for "meta-programming,"
|
||||
but these structures cannot be treated as dynamic values modifiable at runtime.
|
||||
|
||||
Reference in New Issue
Block a user