{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Your First Program with CuTe DSL\n", "\n", "## Introduction\n", "\n", "Welcome! In this tutorial, we'll write a simple \"Hello World\" program that runs on your GPU using CuTe DSL. This will help you understand the basics of GPU programming with our framework.\n", "\n", "### What You'll Learn\n", "\n", "- How to write code that runs on both CPU (host) and GPU (device),\n", "- How to launch a GPU kernel (a function that runs on the GPU),\n", "- Basic CUDA concepts like threads and thread blocks,\n", "\n", "### Step 1: Import Required Libraries\n", "\n", "First, let's import the libraries we need:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import cutlass \n", "import cutlass.cute as cute " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Step 2: Write Our GPU Kernel\n", "A GPU kernel is a function that runs on the GPU. Here's a simple kernel that prints \"Hello World\".\n", "Key concepts:\n", "- `@cute.kernel`: This decorator tells CUTLASS that this function should run on the GPU\n", "- `cute.arch.thread_idx()`: Gets the ID of the current GPU thread (like a worker's ID number)\n", "- We only want one thread to print the message (thread 0) to avoid multiple prints" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "@cute.kernel\n", "def kernel():\n", " # Get the x component of the thread index (y and z components are unused)\n", " tidx, _, _ = cute.arch.thread_idx()\n", " # Only the first thread (thread 0) prints the message\n", " if tidx == 0:\n", " cute.printf(\"Hello world\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 3: Write Our Host Function\n", "\n", "Now we need a function that sets up the GPU and launches our kernel.\n", "Key concepts:\n", "- `@cute.jit`: This decorator is for functions that run on the CPU but can launch GPU code\n", "- We need to initialize CUDA before using the GPU\n", "- `.launch()` tells CUDA how many blocks, threads, shared memory, etc. to use" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "@cute.jit\n", "def hello_world():\n", "\n", " # Print hello world from host code\n", " cute.printf(\"hello world\")\n", "\n", " # Launch kernel\n", " kernel().launch(\n", " grid=(1, 1, 1), # Single thread block\n", " block=(32, 1, 1) # One warp (32 threads) per thread block\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 4: Run Our Program\n", "\n", "There are 2 ways we can run our program:\n", "\n", "1. compile and run immediately\n", "2. separate compilation which allows us to compile the code once and run multiple times\n", " \n", "Please note the `Compiling...` for Method 2 prints before the \"Hello world\" of the first kernel. This shows the asynchronous behavior between CPU and GPU prints. " ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Running hello_world()...\n", "hello world\n", "Compiling...\n", "Hello world\n", "Running compiled version...\n", "hello world\n" ] } ], "source": [ "# Initialize CUDA context for launching a kernel with error checking\n", "# We make context initialization explicit to allow users to control the context creation \n", "# and avoid potential issues with multiple contexts\n", "cutlass.cuda.initialize_cuda_context()\n", "\n", "# Method 1: Just-In-Time (JIT) compilation - compiles and runs the code immediately\n", "print(\"Running hello_world()...\")\n", "hello_world()\n", "\n", "# Method 2: Compile first (useful if you want to run the same code multiple times)\n", "print(\"Compiling...\")\n", "hello_world_compiled = cute.compile(hello_world)\n", "\n", "# Run the pre-compiled version\n", "print(\"Running compiled version...\")\n", "hello_world_compiled()" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.5" }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 4 }