From bb4dd682dd7a71d25bb84012edae345c8f352f29 Mon Sep 17 00:00:00 2001 From: milesvant <26556534+milesvant@users.noreply.github.com> Date: Sun, 20 Apr 2025 21:01:12 -0700 Subject: [PATCH] Fix broken links and alt text in cluster launch control docs (#2234) * Fix broken links in cluster launch control docs * Improve titles and alt text --- media/docs/cpp/blackwell_cluster_launch_control.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/media/docs/cpp/blackwell_cluster_launch_control.md b/media/docs/cpp/blackwell_cluster_launch_control.md index d8a31aaf..a4006f20 100644 --- a/media/docs/cpp/blackwell_cluster_launch_control.md +++ b/media/docs/cpp/blackwell_cluster_launch_control.md @@ -6,7 +6,7 @@ A GEMM workload usually consists of three phases: prologue, mainloop and epilogu Consider a GEMM that has `20x20x1` output tiles, running on a GPU with `100` SMs. There is another kernel occupying all the resources of `20` SMs so only `80` SMs can be used. Assume cluster shape is `1x1x1`. The following diagram shows how the schedule would look like for such a kernel. -

A beautiful sunset

+

GEMM tiles are evenly divided among available SMs

### Static Scheduler @@ -14,7 +14,7 @@ CUTLASS has adopted a software technique named **persistent kernels**. Persisten However, static scheduler is susceptible to workload imbalance if the resources of some SMs are unavailable. The following diagram illustrates this issue. -

A beautiful sunset

+

GEMM tiles are unevenly divided among available SMs, leading to workload imbalance

### Dynamic Scheduler with Cluster Launch Control A fundamental limitation of persistent scheduling is that the number of SMs this kernel can utilize is unknown in real time. Some SMs might be occupied by another kernel and thus their resources are unavailable. This makes it challenging to load-balance work across SMs. @@ -32,7 +32,7 @@ Cluster launch control follows the below rules: The following diagram shows how the schedule would look like with cluster launch control. -

A beautiful sunset

+

GEMM tiles are dynamically allocated among available SMs, leading to a balanced workload

## Programming Model ### Pseudo Code