Files
cutlass/examples/python/CuTeDSL/notebooks/hello_world.ipynb
2025-06-06 02:39:20 -04:00

175 lines
5.0 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Your First Program with CuTe DSL\n",
"\n",
"## Introduction\n",
"\n",
"Welcome! In this tutorial, we'll write a simple \"Hello World\" program that runs on your GPU using CuTe DSL. This will help you understand the basics of GPU programming with our framework.\n",
"\n",
"### What You'll Learn\n",
"\n",
"- How to write code that runs on both CPU (host) and GPU (device),\n",
"- How to launch a GPU kernel (a function that runs on the GPU),\n",
"- Basic CUDA concepts like threads and thread blocks,\n",
"\n",
"### Step 1: Import Required Libraries\n",
"\n",
"First, let's import the libraries we need:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import cutlass \n",
"import cutlass.cute as cute "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"### Step 2: Write Our GPU Kernel\n",
"A GPU kernel is a function that runs on the GPU. Here's a simple kernel that prints \"Hello World\".\n",
"Key concepts:\n",
"- `@cute.kernel`: This decorator tells CUTLASS that this function should run on the GPU\n",
"- `cute.arch.thread_idx()`: Gets the ID of the current GPU thread (like a worker's ID number)\n",
"- We only want one thread to print the message (thread 0) to avoid multiple prints"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"@cute.kernel\n",
"def kernel():\n",
" # Get the x component of the thread index (y and z components are unused)\n",
" tidx, _, _ = cute.arch.thread_idx()\n",
" # Only the first thread (thread 0) prints the message\n",
" if tidx == 0:\n",
" cute.printf(\"Hello world\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 3: Write Our Host Function\n",
"\n",
"Now we need a function that sets up the GPU and launches our kernel.\n",
"Key concepts:\n",
"- `@cute.jit`: This decorator is for functions that run on the CPU but can launch GPU code\n",
"- We need to initialize CUDA before using the GPU\n",
"- `.launch()` tells CUDA how many blocks, threads, shared memory, etc. to use"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"@cute.jit\n",
"def hello_world():\n",
"\n",
" # Print hello world from host code\n",
" cute.printf(\"hello world\")\n",
"\n",
" # Launch kernel\n",
" kernel().launch(\n",
" grid=(1, 1, 1), # Single thread block\n",
" block=(32, 1, 1) # One warp (32 threads) per thread block\n",
" )"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 4: Run Our Program\n",
"\n",
"There are 2 ways we can run our program:\n",
"\n",
"1. compile and run immediately\n",
"2. separate compilation which allows us to compile the code once and run multiple times\n",
" \n",
"Please note the `Compiling...` for Method 2 prints before the \"Hello world\" of the first kernel. This shows the asynchronous behavior between CPU and GPU prints. "
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Running hello_world()...\n",
"hello world\n",
"Compiling...\n",
"Hello world\n",
"Running compiled version...\n",
"hello world\n"
]
}
],
"source": [
"# Initialize CUDA context for launching a kernel with error checking\n",
"# We make context initialization explicit to allow users to control the context creation \n",
"# and avoid potential issues with multiple contexts\n",
"cutlass.cuda.initialize_cuda_context()\n",
"\n",
"# Method 1: Just-In-Time (JIT) compilation - compiles and runs the code immediately\n",
"print(\"Running hello_world()...\")\n",
"hello_world()\n",
"\n",
"# Method 2: Compile first (useful if you want to run the same code multiple times)\n",
"print(\"Compiling...\")\n",
"hello_world_compiled = cute.compile(hello_world)\n",
"\n",
"# Run the pre-compiled version\n",
"print(\"Running compiled version...\")\n",
"hello_world_compiled()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.5"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"state": {},
"version_major": 2,
"version_minor": 0
}
}
},
"nbformat": 4,
"nbformat_minor": 4
}