* v4.3 update. * Update the cute_dsl_api changelog's doc link * Update version to 4.3.0 * Update the example link * Update doc to encourage user to install DSL from requirements.txt --------- Co-authored-by: Larry Wu <larwu@nvidia.com>
26 lines
1.0 KiB
ReStructuredText
26 lines
1.0 KiB
ReStructuredText
.. _cute_arch:
|
|
|
|
arch
|
|
====
|
|
|
|
The ``cute.arch`` module provides lightweight wrappers for NVVM Operation builders which implement CUDA built-in
|
|
device functions such as ``thread_idx``. It integrates seamlessly with CuTe DSL types.
|
|
|
|
These wrappers enable source location tracking through the ``@dsl_user_op``
|
|
decorator. The module includes the following functionality:
|
|
|
|
- Core CUDA built-in functions such as ``thread_idx``, ``warp_idx``, ``block_dim``, ``grid_dim``, ``cluster_dim``, and related functions
|
|
- Memory barrier management functions including ``mbarrier_init``, ``mbarrier_arrive``, ``mbarrier_wait``, and associated operations
|
|
- Low-level shared memory (SMEM) management capabilities, with ``SmemAllocator`` as the recommended interface
|
|
- Low-level tensor memory (TMEM) management capabilities, with ``TmemAllocator`` as the recommended interface
|
|
|
|
API documentation
|
|
-----------------
|
|
|
|
.. automodule:: cutlass.cute.arch
|
|
:members:
|
|
:undoc-members:
|
|
:show-inheritance:
|
|
:special-members: __init__
|
|
:private-members:
|