Files
cutlass/media/docs/pythonDSL/cute_dsl_general/debugging.rst
2025-06-06 02:39:20 -04:00

130 lines
3.9 KiB
ReStructuredText

.. _debugging:
Debugging
=========
This page provides an overview of debugging techniques and tools for CuTe DSL programs.
Getting Familiar with the Limitations
-------------------------------------
Before diving into comprehensive debugging capabilities, it's important to understand the limitations of CuTe DSL.
Understanding these limitations will help you avoid potential pitfalls from the start.
Please refer to :doc:`../limitations` for more details.
DSL Debugging
-------------
CuTe DSL provides built-in logging mechanisms to help you understand the code execution flow and
some of the internal state.
Enabling Logging
~~~~~~~~~~~~~~~~
CuTe DSL provides environment variables to control logging level:
.. code:: bash
# Enable console logging (default: False)
export CUTE_DSL_LOG_TO_CONSOLE=1
# Log to file instead of console (default: False)
export CUTE_DSL_LOG_TO_FILE=my_log.txt
# Control log verbosity (0, 10, 20, 30, 40, 50, default: 10)
export CUTE_DSL_LOG_LEVEL=20
Log Categories and Levels
~~~~~~~~~~~~~~~~~~~~~~~~~
Similar to standard Python logging, different log levels provide varying degrees of detail:
+--------+-------------+
| Level | Description |
+========+=============+
| 0 | Disabled |
+--------+-------------+
| 10 | Debug |
+--------+-------------+
| 20 | Info |
+--------+-------------+
| 30 | Warning |
+--------+-------------+
| 40 | Error |
+--------+-------------+
| 50 | Critical |
+--------+-------------+
Dump the generated IR
~~~~~~~~~~~~~~~~~~~~~
For users familiar with MLIR and compilers, CuTe DSL supports dumping the Intermediate Representation (IR).
This helps you verify whether the IR is generated as expected.
.. code:: bash
# Dump Generated CuTe IR (default: False)
export CUTE_DSL_PRINT_IR=1
# Keep Generated CuTe IR in a file (default: False)
export CUTE_DSL_KEEP_IR=1
Kernel Functional Debugging
----------------------------
Using Python's ``print`` and CuTe's ``cute.printf``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CuTe DSL programs can use both Python's native ``print()`` as well as our own ``cute.printf()`` to
print debug information during kernel generation and execution. They differ in a few key ways:
- Python's ``print()`` executes during compile-time only (no effect on the generated kernel) and is
typically used for printing static values (e.g. a fully static layouts).
- ``cute.printf()`` executes at runtime on the GPU itself and changes the PTX being generated. This
can be used for printing values of tensors at runtime for diagnostics, but comes at a performance
overhead similar to that of `printf()` in CUDA C.
For detailed examples of using these functions for debugging, please refer to the associated
notebook referenced in :doc:`notebooks`.
Handling Unresponsive/Hung Kernels
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When a kernel becomes unresponsive and ``SIGINT`` (``CTRL+C``) fails to terminate it,
you can follow these steps to forcefully terminate the process:
1. Use ``CTRL+Z`` to suspend the unresponsive kernel
2. Execute the following command to terminate the suspended process:
.. code:: bash
# Terminate the most recently suspended process
kill -9 $(jobs -p | tail -1)
CuTe DSL can also be debugged using standard NVIDIA CUDA tools.
Using Compute-Sanitizer
~~~~~~~~~~~~~~~~~~~~~~~
For detecting memory errors and race conditions:
.. code:: bash
compute-sanitizer --some_options python your_dsl_code.py
Please refer to the `compute-sanitizer documentation <https://developer.nvidia.com/compute-sanitizer>`_ for more details.
Conclusion
----------
This page covered several key methods for debugging CuTe DSL programs. Effective debugging typically requires a combination of these approaches.
If you encounter issues with DSL, you can enable logging and share the logs with the CUTLASS team as a GitHub issue to report a bug.