CUTLASS 3.1 Python interface documentation (#917)
* Add 12.1 Dockerfile * Add 3.1 docs
@ -12,7 +12,7 @@ A, B, C, D = [np.ones((4096, 4096), dtype=np.float16) for i in range(4)]
|
||||
plan.run(A, B, C, D)
|
||||
```
|
||||
|
||||
**NOTE** The CUTLASS Python interface is currently an experimental release. The API may change in the future.
|
||||
**NOTE:** The CUTLASS Python interface is currently an experimental release. The API may change in the future.
|
||||
We welcome feedback from the community.
|
||||
|
||||
## Overview
|
||||
@ -93,11 +93,11 @@ The CUTLASS Python interface currently supports the following operations:
|
||||
We recommend using the CUTLASS Python interface via one of the Docker images located in the [docker](/python/docker) directory.
|
||||
|
||||
```bash
|
||||
docker build -t cutlass-cuda12.0:latest -f docker/Dockerfile-cuda12.0-pytorch .
|
||||
docker run --gpus all -it --rm cutlass-cuda12.0:latest
|
||||
docker build -t cutlass-cuda12.1:latest -f docker/Dockerfile-cuda12.1-pytorch .
|
||||
docker run --gpus all -it --rm cutlass-cuda12.1:latest
|
||||
```
|
||||
|
||||
The CUTLASS Python interface has been tested with CUDA 11.8 and CUDA 12.0 on Python 3.8.10 and 3.9.7.
|
||||
The CUTLASS Python interface has been tested with CUDA 11.8, 12.0, and 12.1 on Python 3.8.10 and 3.9.7.
|
||||
|
||||
### Optional environment variables
|
||||
Prior to installing the CUTLASS Python interface, one may optionally set the following environment variables:
|
||||
|
||||
38
python/docker/Dockerfile-cuda12.1-pytorch
Normal file
@ -0,0 +1,38 @@
|
||||
#################################################################################################
|
||||
#
|
||||
# Copyright (c) 2023 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
|
||||
# SPDX-License-Identifier: BSD-3-Clause
|
||||
#
|
||||
# Redistribution and use in source and binary forms, with or without
|
||||
# modification, are permitted provided that the following conditions are met:
|
||||
#
|
||||
# 1. Redistributions of source code must retain the above copyright notice, this
|
||||
# list of conditions and the following disclaimer.
|
||||
#
|
||||
# 2. Redistributions in binary form must reproduce the above copyright notice,
|
||||
# this list of conditions and the following disclaimer in the documentation
|
||||
# and/or other materials provided with the distribution.
|
||||
#
|
||||
# 3. Neither the name of the copyright holder nor the names of its
|
||||
# contributors may be used to endorse or promote products derived from
|
||||
# this software without specific prior written permission.
|
||||
#
|
||||
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
|
||||
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
|
||||
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
||||
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
#
|
||||
#################################################################################################
|
||||
|
||||
FROM nvcr.io/nvidia/pytorch:23.03-py3
|
||||
|
||||
RUN chmod ugo+rwx /home
|
||||
ENV LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
|
||||
ENV LIBRARY_PATH=/usr/local/cuda/lib64:$LIBRARY_PATH
|
||||
ENV CUDA_INSTALL_PATH=/usr/local/cuda
|
||||
4
python/docs/.buildinfo
Normal file
@ -0,0 +1,4 @@
|
||||
# Sphinx build info version 1
|
||||
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
|
||||
config: 4a5275a3b68094ba1d8a4b7e4c459321
|
||||
tags: 645f666f9bcd5a90fca523b33c5a78b7
|
||||
923
python/docs/_modules/cutlass/emit/pytorch.html
Normal file
@ -0,0 +1,923 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><link rel="index" title="Index" href="../../../genindex.html" /><link rel="search" title="Search" href="../../../search.html" />
|
||||
<link rel="canonical" href="docs/_modules/cutlass/emit/pytorch.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>cutlass.emit.pytorch - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="../../../index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="../../../index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="../../../_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="../../../_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="../../../search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../../../modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="../../../cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../../cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../../cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../../cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../../../examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../../externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../../externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../../externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<h1>Source code for cutlass.emit.pytorch</h1><div class="highlight"><pre>
|
||||
<span></span><span class="c1">#################################################################################################</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># Copyright (c) 2023 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.</span>
|
||||
<span class="c1"># SPDX-License-Identifier: BSD-3-Clause</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># Redistribution and use in source and binary forms, with or without</span>
|
||||
<span class="c1"># modification, are permitted provided that the following conditions are met:</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 1. Redistributions of source code must retain the above copyright notice, this</span>
|
||||
<span class="c1"># list of conditions and the following disclaimer.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 2. Redistributions in binary form must reproduce the above copyright notice,</span>
|
||||
<span class="c1"># this list of conditions and the following disclaimer in the documentation</span>
|
||||
<span class="c1"># and/or other materials provided with the distribution.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 3. Neither the name of the copyright holder nor the names of its</span>
|
||||
<span class="c1"># contributors may be used to endorse or promote products derived from</span>
|
||||
<span class="c1"># this software without specific prior written permission.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"</span>
|
||||
<span class="c1"># AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE</span>
|
||||
<span class="c1"># IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE</span>
|
||||
<span class="c1"># DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE</span>
|
||||
<span class="c1"># FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL</span>
|
||||
<span class="c1"># DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR</span>
|
||||
<span class="c1"># SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER</span>
|
||||
<span class="c1"># CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,</span>
|
||||
<span class="c1"># OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE</span>
|
||||
<span class="c1"># OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1">#################################################################################################</span>
|
||||
|
||||
<span class="sd">"""</span>
|
||||
<span class="sd">Utilities for generating source for building a PyTorch CUDA extension that using a CUTLASS kernel.</span>
|
||||
<span class="sd">If specified, the extension can be JIT compiled via PyTorch's ``cpp_extension.load`` method.</span>
|
||||
|
||||
<span class="sd">Example usage with JIT compilation:</span>
|
||||
|
||||
<span class="sd">.. highlight:: python</span>
|
||||
<span class="sd">.. code-block:: python</span>
|
||||
|
||||
<span class="sd"> plan = cutlass.op.Gemm(element=torch.float32, layout=cutlass.LayoutType.RowMajor)</span>
|
||||
<span class="sd"> op = plan.construct()</span>
|
||||
<span class="sd"> mod = cutlass.emit.pytorch(op, 'cutlass_gemm', 80, jit=True)</span>
|
||||
|
||||
<span class="sd"> # Generate inputs for the GEMM</span>
|
||||
<span class="sd"> A, B, C = [torch.ones((512, 512)).to('cuda') for _ in range(3)]</span>
|
||||
|
||||
<span class="sd"> # Run the module</span>
|
||||
<span class="sd"> D = mod.run(A, B, C)</span>
|
||||
|
||||
|
||||
<span class="sd">Example usage without JIT compilation:</span>
|
||||
|
||||
<span class="sd">.. highlight:: python</span>
|
||||
<span class="sd">.. code-block:: python</span>
|
||||
|
||||
<span class="sd"> plan = cutlass.op.Gemm(element=torch.float32, layout=cutlass.LayoutType.RowMajor)</span>
|
||||
<span class="sd"> op = plan.construct()</span>
|
||||
<span class="sd"> cutlass.emit.pytorch(op, 'cutlass_gemm', 80, jit=False, sourcedir='output')</span>
|
||||
|
||||
<span class="sd">After this call, the directory ``output`` contains ``setup.py``,</span>
|
||||
<span class="sd">``cutlass_gemm.cpp``, and ``cutlass_gemm_kernel.cu``. The module can be built from</span>
|
||||
<span class="sd">within ``output`` by running: ``TORCH_CUDA_ARCH_LIST="8.0" python setup.py develop --user``.</span>
|
||||
|
||||
<span class="sd">The module can later be used in Python via:</span>
|
||||
|
||||
<span class="sd">.. highlight:: python</span>
|
||||
<span class="sd">.. code-block:: python</span>
|
||||
|
||||
<span class="sd"> import torch</span>
|
||||
<span class="sd"> import cutlass_gemm</span>
|
||||
|
||||
<span class="sd"> # Generate inputs for the GEMM</span>
|
||||
<span class="sd"> A, B, C = [torch.ones((512, 512)).to('cuda') for _ in range(3)]</span>
|
||||
|
||||
<span class="sd"> # Run the module</span>
|
||||
<span class="sd"> D = cutlass_gemm.run(A, B, C)</span>
|
||||
<span class="sd">"""</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">logging</span>
|
||||
<span class="kn">import</span> <span class="nn">os</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">cutlass_bindings</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">cutlass</span> <span class="kn">import</span> <span class="n">CUTLASS_PATH</span><span class="p">,</span> <span class="n">logger</span><span class="p">,</span> <span class="n">swizzle</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.backend.gemm_operation</span> <span class="kn">import</span> <span class="n">GemmOperationGrouped</span><span class="p">,</span> <span class="n">GemmOperationUniversal</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.backend.library</span> <span class="kn">import</span> <span class="n">ApiVersion</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.backend.utils.software</span> <span class="kn">import</span> <span class="n">CheckPackages</span><span class="p">,</span> <span class="n">SubstituteTemplate</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.emit</span> <span class="kn">import</span> <span class="n">common</span>
|
||||
|
||||
<span class="n">torch_available</span> <span class="o">=</span> <span class="n">CheckPackages</span><span class="p">()</span><span class="o">.</span><span class="n">check_torch</span><span class="p">()</span>
|
||||
<span class="k">if</span> <span class="n">torch_available</span><span class="p">:</span>
|
||||
<span class="kn">import</span> <span class="nn">torch</span>
|
||||
|
||||
|
||||
<span class="n">_PYTORCH_CUDA_TEMPLATE</span> <span class="o">=</span> <span class="n">common</span><span class="o">.</span><span class="n">_CSTYLE_AUTOGEN_COMMENT</span> <span class="o">+</span> <span class="s2">"""</span>
|
||||
<span class="s2">#include <torch/extension.h></span>
|
||||
<span class="s2">#include <ATen/ATen.h></span>
|
||||
|
||||
<span class="s2">#include "cutlass/cutlass.h"</span>
|
||||
<span class="s2">#include "cutlass/util/device_memory.h"</span>
|
||||
|
||||
<span class="s2">$</span><span class="si">{includes}</span>
|
||||
<span class="s2">$</span><span class="si">{declaration}</span>
|
||||
<span class="s2">$</span><span class="si">{impl}</span>
|
||||
<span class="s2">"""</span>
|
||||
|
||||
<span class="n">_PYTORCH_GEMM_CPP_TEMPLATE</span> <span class="o">=</span> <span class="n">common</span><span class="o">.</span><span class="n">_CSTYLE_AUTOGEN_COMMENT</span> <span class="o">+</span> <span class="s2">"""</span>
|
||||
<span class="s2">#include <torch/extension.h></span>
|
||||
<span class="s2">#include <ATen/ATen.h></span>
|
||||
<span class="s2">#include <pybind11/stl.h></span>
|
||||
|
||||
<span class="s2">// CUDA forward declarations</span>
|
||||
<span class="s2">at::Tensor $</span><span class="si">{name}</span><span class="s2">_kernel(const at::Tensor& A, const at::Tensor& B, at::optional<const at::Tensor> C=at::nullopt, float alpha=1.f, float beta=0.f);</span>
|
||||
|
||||
<span class="s2">// C++ interface</span>
|
||||
<span class="s2">at::Tensor $</span><span class="si">{name}</span><span class="s2">(const at::Tensor& A, const at::Tensor& B, at::optional<const at::Tensor> C=at::nullopt, float alpha=1.f, float beta=0.f) {</span>
|
||||
<span class="s2"> return $</span><span class="si">{name}</span><span class="s2">_kernel(A, B, C, alpha, beta);</span>
|
||||
<span class="s2">}</span>
|
||||
|
||||
<span class="s2">PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {</span>
|
||||
<span class="s2"> m.def("run", py::overload_cast<const at::Tensor&, const at::Tensor&, at::optional<const at::Tensor>, float, float>(&$</span><span class="si">{name}</span><span class="s2">), py::arg("A"), py::arg("B"), py::arg("C") = nullptr, py::arg("alpha") = 1.f, py::arg("beta") = 0.f);</span>
|
||||
<span class="s2">}</span>
|
||||
<span class="s2">"""</span>
|
||||
|
||||
<span class="n">_PYTORCH_GROUPED_GEMM_CPP_TEMPLATE</span> <span class="o">=</span> <span class="n">common</span><span class="o">.</span><span class="n">_CSTYLE_AUTOGEN_COMMENT</span> <span class="o">+</span> <span class="s2">"""</span>
|
||||
<span class="s2">#include <torch/extension.h></span>
|
||||
<span class="s2">#include <ATen/ATen.h></span>
|
||||
<span class="s2">#include <pybind11/stl.h></span>
|
||||
|
||||
<span class="s2">// CUDA forward declarations</span>
|
||||
<span class="s2">std::vector<at::Tensor> $</span><span class="si">{name}</span><span class="s2">_kernel(const std::vector<at::Tensor>& A, const std::vector<at::Tensor>& B, at::optional<const std::vector<at::Tensor>> C=at::nullopt, float alpha=1.f, float beta=0.f);</span>
|
||||
|
||||
<span class="s2">// C++ interface</span>
|
||||
<span class="s2">std::vector<at::Tensor> $</span><span class="si">{name}</span><span class="s2">(const std::vector<at::Tensor>& A, const std::vector<at::Tensor>& B, at::optional<const std::vector<at::Tensor>> C=at::nullopt, float alpha=1.f, float beta=0.f) {</span>
|
||||
<span class="s2"> return $</span><span class="si">{name}</span><span class="s2">_kernel(A, B, C, alpha, beta);</span>
|
||||
<span class="s2">}</span>
|
||||
|
||||
<span class="s2">PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {</span>
|
||||
<span class="s2"> m.def("run", py::overload_cast<const std::vector<at::Tensor>&, const std::vector<at::Tensor>&, at::optional<const std::vector<at::Tensor>>, float, float>(&$</span><span class="si">{name}</span><span class="s2">),</span>
|
||||
<span class="s2"> py::arg("A"), py::arg("B"), py::arg("C") = nullptr, py::arg("alpha") = 1.f, py::arg("beta") = 0.f);</span>
|
||||
<span class="s2">}</span>
|
||||
<span class="s2">"""</span>
|
||||
|
||||
<span class="n">_PYTORCH_GEMM_INCLUDES</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="n">ApiVersion</span><span class="o">.</span><span class="n">v2x</span><span class="p">:</span> <span class="s2">"""</span>
|
||||
<span class="s2">#include "cutlass/gemm/device/gemm_universal.h"</span>
|
||||
<span class="s2">"""</span><span class="p">,</span>
|
||||
<span class="n">ApiVersion</span><span class="o">.</span><span class="n">v3x</span><span class="p">:</span> <span class="s2">"""</span>
|
||||
<span class="s2">#include "cutlass/gemm/device/gemm_universal_adapter.h"</span>
|
||||
<span class="s2">#include "cutlass/gemm/collective/collective_builder.hpp"</span>
|
||||
<span class="s2">#include "cutlass/gemm/device/gemm_universal_adapter.h"</span>
|
||||
<span class="s2">#include "cutlass/gemm/kernel/gemm_universal.hpp"</span>
|
||||
<span class="s2">#include "cutlass/epilogue/collective/default_epilogue.hpp"</span>
|
||||
<span class="s2">#include "cutlass/util/packed_stride.hpp"</span>
|
||||
<span class="s2">"""</span><span class="p">,</span>
|
||||
<span class="p">}</span>
|
||||
|
||||
<span class="n">_PYTORCH_GROUPED_GEMM_INCLUDES</span> <span class="o">=</span> <span class="s2">"""</span>
|
||||
<span class="s2">#include "cutlass/gemm/kernel/default_gemm_grouped.h"</span>
|
||||
<span class="s2">#include "cutlass/gemm/device/gemm_grouped.h"</span>
|
||||
<span class="s2">"""</span>
|
||||
|
||||
<span class="n">_CUTLASS_TYPE_TO_TORCH_TYPE</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">float16</span><span class="p">:</span> <span class="s2">"torch::kF16"</span><span class="p">,</span>
|
||||
<span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">float32</span><span class="p">:</span> <span class="s2">"torch::kF32"</span><span class="p">,</span>
|
||||
<span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">float64</span><span class="p">:</span> <span class="s2">"torch::kF64"</span><span class="p">,</span>
|
||||
<span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">int8</span><span class="p">:</span> <span class="s2">"torch::I8"</span><span class="p">,</span>
|
||||
<span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">int32</span><span class="p">:</span> <span class="s2">"torch::I32"</span><span class="p">,</span>
|
||||
<span class="p">}</span>
|
||||
|
||||
<span class="n">_PYTORCH_GEMM_IMPL_TEMPLATE_2x</span> <span class="o">=</span> <span class="p">(</span>
|
||||
<span class="n">common</span><span class="o">.</span><span class="n">_CUTLASS_KERNEL_RUN_GEMM_2x</span>
|
||||
<span class="o">+</span> <span class="s2">"""</span>
|
||||
<span class="s2">at::Tensor $</span><span class="si">{name}</span><span class="s2">_kernel(const at::Tensor& A, const at::Tensor& B, at::optional<const at::Tensor> C, float alpha, float beta) {</span>
|
||||
<span class="s2"> int M = A.size(0);</span>
|
||||
<span class="s2"> int N = B.size(1);</span>
|
||||
<span class="s2"> int K = A.size(1);</span>
|
||||
|
||||
<span class="s2"> typename DeviceKernel::ElementC* ptrC = (C == at::nullopt) ?</span>
|
||||
<span class="s2"> nullptr :</span>
|
||||
<span class="s2"> reinterpret_cast<typename DeviceKernel::ElementC*>(C->contiguous().data_ptr());</span>
|
||||
<span class="s2"> at::Tensor D = B.new_empty({M, N}, $</span><span class="si">{torch_type_C}</span><span class="s2">);</span>
|
||||
|
||||
<span class="s2"> cutlass::Status status = $</span><span class="si">{name}</span><span class="s2">_kernel_run(M, N, K,</span>
|
||||
<span class="s2"> reinterpret_cast<typename DeviceKernel::ElementA*>(A.contiguous().data_ptr()),</span>
|
||||
<span class="s2"> reinterpret_cast<typename DeviceKernel::ElementB*>(B.contiguous().data_ptr()),</span>
|
||||
<span class="s2"> ptrC,</span>
|
||||
<span class="s2"> reinterpret_cast<typename DeviceKernel::ElementC*>(D.contiguous().data_ptr()),</span>
|
||||
<span class="s2"> ElementCompute(alpha), ElementCompute(beta));</span>
|
||||
|
||||
<span class="s2"> TORCH_CHECK(status == cutlass::Status::kSuccess, "CUTLASS kernel failed");</span>
|
||||
<span class="s2"> return D;</span>
|
||||
<span class="s2">}</span>
|
||||
<span class="s2">"""</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
<span class="n">_PYTORCH_GEMM_IMPL_TEMPLATE_3x</span> <span class="o">=</span> <span class="p">(</span>
|
||||
<span class="n">common</span><span class="o">.</span><span class="n">_CUTLASS_KERNEL_RUN_GEMM_3x</span>
|
||||
<span class="o">+</span> <span class="s2">"""</span>
|
||||
<span class="s2">bool hw_info_queried = false;</span>
|
||||
<span class="s2">cutlass::KernelHardwareInfo hw_info;</span>
|
||||
|
||||
<span class="s2">at::Tensor $</span><span class="si">{name}</span><span class="s2">_kernel(const at::Tensor& A, const at::Tensor& B, at::optional<const at::Tensor> C, float alpha, float beta) {</span>
|
||||
<span class="s2"> int M = A.size(0);</span>
|
||||
<span class="s2"> int N = B.size(1);</span>
|
||||
<span class="s2"> int K = A.size(1);</span>
|
||||
<span class="s2"> int L = 1;</span>
|
||||
|
||||
<span class="s2"> // Query hardware info if we haven't already</span>
|
||||
<span class="s2"> if (!hw_info_queried) {</span>
|
||||
<span class="s2"> hw_info.device_id = 0;</span>
|
||||
<span class="s2"> hw_info.sm_count = cutlass::KernelHardwareInfo::query_device_multiprocessor_count(hw_info.device_id);</span>
|
||||
<span class="s2"> }</span>
|
||||
|
||||
<span class="s2"> typename DeviceKernel::ElementC* ptrC = (C == at::nullopt) ?</span>
|
||||
<span class="s2"> nullptr :</span>
|
||||
<span class="s2"> reinterpret_cast<typename DeviceKernel::ElementC*>(C->contiguous().data_ptr());</span>
|
||||
<span class="s2"> at::Tensor D = B.new_empty({M, N}, $</span><span class="si">{torch_type_C}</span><span class="s2">);</span>
|
||||
|
||||
<span class="s2"> cutlass::Status status = $</span><span class="si">{name}</span><span class="s2">_kernel_run(M, N, K, L,</span>
|
||||
<span class="s2"> reinterpret_cast<typename DeviceKernel::ElementA*>(A.contiguous().data_ptr()),</span>
|
||||
<span class="s2"> reinterpret_cast<typename DeviceKernel::ElementB*>(B.contiguous().data_ptr()),</span>
|
||||
<span class="s2"> ptrC,</span>
|
||||
<span class="s2"> reinterpret_cast<typename DeviceKernel::ElementC*>(D.contiguous().data_ptr()),</span>
|
||||
<span class="s2"> ElementCompute(alpha), ElementCompute(beta),</span>
|
||||
<span class="s2"> hw_info);</span>
|
||||
|
||||
<span class="s2"> TORCH_CHECK(status == cutlass::Status::kSuccess, "CUTLASS kernel failed");</span>
|
||||
<span class="s2"> return D;</span>
|
||||
<span class="s2">}</span>
|
||||
<span class="s2">"""</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
|
||||
<span class="n">_PYTORCH_GROUPED_GEMM_IMPL_TEMPLATE</span> <span class="o">=</span> <span class="p">(</span>
|
||||
<span class="n">common</span><span class="o">.</span><span class="n">_CUTLASS_KERNEL_RUN_GROUPED_GEMM_2x</span>
|
||||
<span class="o">+</span> <span class="s2">"""</span>
|
||||
<span class="s2">std::vector<at::Tensor> $</span><span class="si">{name}</span><span class="s2">_kernel(const std::vector<at::Tensor>& A, const std::vector<at::Tensor>& B, at::optional<const std::vector<at::Tensor>> C, float alpha, float beta) {</span>
|
||||
<span class="s2"> size_t num = A.size();</span>
|
||||
|
||||
<span class="s2"> // To avoid performing many small cudaMallocs and host-to-device copies,</span>
|
||||
<span class="s2"> // we serialize the grouped GEMM arguments on the host, allocate one</span>
|
||||
<span class="s2"> // large chunk of device memory, and perform a single cudaMemcpy to</span>
|
||||
<span class="s2"> // copy the host data to the device. Allocation overheads could be</span>
|
||||
<span class="s2"> // avoided by using a memory pool.</span>
|
||||
|
||||
<span class="s2"> // Calculate the total size of the data to be copied from host to device</span>
|
||||
<span class="s2"> size_t total_size = sizeof(cutlass::gemm::GemmCoord) +</span>
|
||||
<span class="s2"> sizeof(DeviceKernel::ElementA*) +</span>
|
||||
<span class="s2"> sizeof(DeviceKernel::ElementB*) +</span>
|
||||
<span class="s2"> sizeof(DeviceKernel::ElementC*) +</span>
|
||||
<span class="s2"> sizeof(DeviceKernel::ElementC*) +</span>
|
||||
<span class="s2"> sizeof(int64_t) +</span>
|
||||
<span class="s2"> sizeof(int64_t) +</span>
|
||||
<span class="s2"> sizeof(int64_t);</span>
|
||||
<span class="s2"> total_size *= num;</span>
|
||||
|
||||
<span class="s2"> // num * sizeof(cutlass::gemm::GemmCoord) may leave one at a non-multiple</span>
|
||||
<span class="s2"> // of sizeof(DeviceKernel::ElementA*) (which will be 64 on a 64-bit system).</span>
|
||||
<span class="s2"> // To ensure that we don't end up having misaligned loads in the kernel,</span>
|
||||
<span class="s2"> // we pad to the nearest multiple of 8.</span>
|
||||
<span class="s2"> //</span>
|
||||
<span class="s2"> // Note that, even on a 32-bit system (for which sizeof(X*) will not equal</span>
|
||||
<span class="s2"> // sizeof(int64_t)), only padding between the list of GemmCoords and the</span>
|
||||
<span class="s2"> // list of ptr_As is sufficient because the set of four equal-length lists of pointers</span>
|
||||
<span class="s2"> // (A*, B*, C*, D*) will ensure that the first list of int64_ts will always</span>
|
||||
<span class="s2"> // start on a multiple of 8.</span>
|
||||
<span class="s2"> int64_t padding = 8 - (total_size % 8);</span>
|
||||
<span class="s2"> total_size += padding;</span>
|
||||
|
||||
<span class="s2"> uint8_t* host_data = new uint8_t[total_size];</span>
|
||||
<span class="s2"> cutlass::DeviceAllocation<uint8_t> device_data(total_size);</span>
|
||||
|
||||
<span class="s2"> uint8_t* start = host_data;</span>
|
||||
<span class="s2"> cutlass::gemm::GemmCoord* problem_sizes_host = reinterpret_cast<cutlass::gemm::GemmCoord*>(start);</span>
|
||||
|
||||
<span class="s2"> // Apply the padding after the list of GemmCoords</span>
|
||||
<span class="s2"> start += num * sizeof(cutlass::gemm::GemmCoord) + padding;</span>
|
||||
|
||||
<span class="s2"> int64_t ptr_A_offset = start - host_data;</span>
|
||||
<span class="s2"> DeviceKernel::ElementA** ptr_A_host = reinterpret_cast<DeviceKernel::ElementA**>(start);</span>
|
||||
<span class="s2"> start += num * sizeof(DeviceKernel::ElementA*);</span>
|
||||
|
||||
<span class="s2"> int64_t ptr_B_offset = start - host_data;</span>
|
||||
<span class="s2"> DeviceKernel::ElementB** ptr_B_host = reinterpret_cast<DeviceKernel::ElementB**>(start);</span>
|
||||
<span class="s2"> start += num * sizeof(DeviceKernel::ElementB*);</span>
|
||||
|
||||
<span class="s2"> int64_t ptr_C_offset = start - host_data;</span>
|
||||
<span class="s2"> DeviceKernel::ElementC** ptr_C_host = reinterpret_cast<DeviceKernel::ElementC**>(start);</span>
|
||||
<span class="s2"> start += num * sizeof(DeviceKernel::ElementC*);</span>
|
||||
|
||||
<span class="s2"> int64_t ptr_D_offset = start - host_data;</span>
|
||||
<span class="s2"> DeviceKernel::ElementC** ptr_D_host = reinterpret_cast<DeviceKernel::ElementC**>(start);</span>
|
||||
<span class="s2"> start += num * sizeof(DeviceKernel::ElementC*);</span>
|
||||
|
||||
<span class="s2"> int64_t lda_offset = start - host_data;</span>
|
||||
<span class="s2"> int64_t* lda_host = reinterpret_cast<int64_t*>(start);</span>
|
||||
<span class="s2"> start += num * sizeof(int64_t);</span>
|
||||
|
||||
<span class="s2"> int64_t ldb_offset = start - host_data;</span>
|
||||
<span class="s2"> int64_t* ldb_host = reinterpret_cast<int64_t*>(start);</span>
|
||||
<span class="s2"> start += num * sizeof(int64_t);</span>
|
||||
|
||||
<span class="s2"> int64_t ldc_offset = start - host_data;</span>
|
||||
<span class="s2"> int64_t* ldc_host = reinterpret_cast<int64_t*>(start);</span>
|
||||
<span class="s2"> start += num * sizeof(int64_t);</span>
|
||||
|
||||
<span class="s2"> std::vector<at::Tensor> D(num);</span>
|
||||
|
||||
<span class="s2"> bool need_C = (C != at::nullopt) && (beta != 0.f);</span>
|
||||
<span class="s2"> for (size_t i = 0; i < num; ++i) {</span>
|
||||
<span class="s2"> int M = A[i].size(0);</span>
|
||||
<span class="s2"> int N = B[i].size(1);</span>
|
||||
<span class="s2"> int K = A[i].size(1);</span>
|
||||
<span class="s2"> *(problem_sizes_host + i) = {M, N, K};</span>
|
||||
<span class="s2"> *(ptr_A_host + i) = reinterpret_cast<typename DeviceKernel::ElementA*>(A[i].contiguous().data_ptr());</span>
|
||||
<span class="s2"> *(ptr_B_host + i) = reinterpret_cast<typename DeviceKernel::ElementB*>(B[i].contiguous().data_ptr());</span>
|
||||
|
||||
<span class="s2"> if (need_C) {</span>
|
||||
<span class="s2"> *(ptr_C_host + i) = reinterpret_cast<typename DeviceKernel::ElementC*>(C->at(i).contiguous().data_ptr());</span>
|
||||
<span class="s2"> }</span>
|
||||
<span class="s2"> else {</span>
|
||||
<span class="s2"> *(ptr_C_host + i) = nullptr;</span>
|
||||
<span class="s2"> }</span>
|
||||
|
||||
<span class="s2"> D[i] = B[i].new_empty({M, N}, $</span><span class="si">{torch_type_C}</span><span class="s2">);</span>
|
||||
<span class="s2"> *(ptr_D_host + i) = reinterpret_cast<typename DeviceKernel::ElementC*>(D[i].contiguous().data_ptr());</span>
|
||||
|
||||
<span class="s2"> *(lda_host + i) = DeviceKernel::LayoutA::packed({M, K}).stride(0);</span>
|
||||
<span class="s2"> *(ldb_host + i) = DeviceKernel::LayoutB::packed({K, N}).stride(0);</span>
|
||||
<span class="s2"> *(ldc_host + i) = DeviceKernel::LayoutC::packed({M, N}).stride(0);</span>
|
||||
<span class="s2"> }</span>
|
||||
|
||||
<span class="s2"> device_data.copy_from_host(host_data);</span>
|
||||
|
||||
<span class="s2"> cutlass::Status status = $</span><span class="si">{name}</span><span class="s2">_kernel_run(</span>
|
||||
<span class="s2"> num,</span>
|
||||
<span class="s2"> reinterpret_cast<cutlass::gemm::GemmCoord*>(device_data.get()),</span>
|
||||
<span class="s2"> reinterpret_cast<DeviceKernel::ElementA**>(device_data.get() + ptr_A_offset),</span>
|
||||
<span class="s2"> reinterpret_cast<DeviceKernel::ElementB**>(device_data.get() + ptr_B_offset),</span>
|
||||
<span class="s2"> reinterpret_cast<DeviceKernel::ElementC**>(device_data.get() + ptr_C_offset),</span>
|
||||
<span class="s2"> reinterpret_cast<DeviceKernel::ElementC**>(device_data.get() + ptr_D_offset),</span>
|
||||
<span class="s2"> reinterpret_cast<int64_t*>(device_data.get() + lda_offset),</span>
|
||||
<span class="s2"> reinterpret_cast<int64_t*>(device_data.get() + ldb_offset),</span>
|
||||
<span class="s2"> reinterpret_cast<int64_t*>(device_data.get() + ldc_offset),</span>
|
||||
<span class="s2"> reinterpret_cast<int64_t*>(device_data.get() + ldc_offset),</span>
|
||||
<span class="s2"> ElementCompute(alpha), ElementCompute(beta));</span>
|
||||
|
||||
<span class="s2"> delete[] host_data;</span>
|
||||
|
||||
<span class="s2"> TORCH_CHECK(status == cutlass::Status::kSuccess, "CUTLASS kernel failed");</span>
|
||||
<span class="s2"> return D;</span>
|
||||
<span class="s2">}</span>
|
||||
<span class="s2">"""</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
|
||||
<span class="n">_PYTORCH_SETUP_PY</span> <span class="o">=</span> <span class="n">common</span><span class="o">.</span><span class="n">_PYSTYLE_AUTOGEN_COMMENT</span> <span class="o">+</span> <span class="s2">"""</span>
|
||||
<span class="s2">from setuptools import setup</span>
|
||||
<span class="s2">from torch.utils.cpp_extension import BuildExtension, CUDAExtension</span>
|
||||
|
||||
<span class="s2">setup(</span>
|
||||
<span class="s2"> name='$</span><span class="si">{name}</span><span class="s2">',</span>
|
||||
<span class="s2"> ext_modules=[</span>
|
||||
<span class="s2"> CUDAExtension('$</span><span class="si">{name}</span><span class="s2">', [</span>
|
||||
<span class="s2"> '$</span><span class="si">{name}</span><span class="s2">.cpp',</span>
|
||||
<span class="s2"> '$</span><span class="si">{name}</span><span class="s2">_kernel.cu',</span>
|
||||
<span class="s2"> ],</span>
|
||||
<span class="s2"> include_dirs=['$</span><span class="si">{cutlass_path}</span><span class="s2">/include', '$</span><span class="si">{cutlass_path}</span><span class="s2">/tools/util/include'],</span>
|
||||
<span class="s2"> extra_compile_args=['-std=c++17']</span>
|
||||
<span class="s2"> ),</span>
|
||||
<span class="s2"> ],</span>
|
||||
<span class="s2"> cmdclass={</span>
|
||||
<span class="s2"> 'build_ext': BuildExtension</span>
|
||||
<span class="s2"> })</span>
|
||||
|
||||
<span class="s2">"""</span>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_generate_setup</span><span class="p">(</span><span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">sourcedir</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Generates a setup.py file for the extension</span>
|
||||
|
||||
<span class="sd"> :param name: name of the module to generate</span>
|
||||
<span class="sd"> :type name: str</span>
|
||||
<span class="sd"> :param sourcedir: directory to which generated source files should be written</span>
|
||||
<span class="sd"> :type sourcedir: str</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">setup_py_file</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">sourcedir</span><span class="p">,</span> <span class="s2">"setup.py"</span><span class="p">)</span>
|
||||
<span class="n">setup_source</span> <span class="o">=</span> <span class="n">SubstituteTemplate</span><span class="p">(</span>
|
||||
<span class="n">_PYTORCH_SETUP_PY</span><span class="p">,</span> <span class="p">{</span><span class="s2">"name"</span><span class="p">:</span> <span class="n">name</span><span class="p">,</span> <span class="s2">"cutlass_path"</span><span class="p">:</span> <span class="n">CUTLASS_PATH</span><span class="p">}</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">setup_py_file</span><span class="p">,</span> <span class="s2">"w"</span><span class="p">)</span> <span class="k">as</span> <span class="n">outfile</span><span class="p">:</span>
|
||||
<span class="n">outfile</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">setup_source</span><span class="p">)</span>
|
||||
|
||||
|
||||
<span class="k">class</span> <span class="nc">_ArchListSetter</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Utility context manager for temporarily setting the value of the ``TORCH_CUDA_ARCH_LIST``</span>
|
||||
<span class="sd"> environment variable when building a PyTorch CUDA module.</span>
|
||||
|
||||
<span class="sd"> ``TORCH_CUDA_ARCH_LIST`` is a space-delmited list of compute capabilites for which a PyTorch</span>
|
||||
<span class="sd"> CUDA module should be compiled.</span>
|
||||
|
||||
<span class="sd"> For example, ``TORCH_CUDA_ARCH_LIST="7.0 8.0"`` would result in the inclusion of</span>
|
||||
<span class="sd"> ``-gencode=arch=compute_70,code=sm_70`` and ``-gencode=arch=compute_80,code=sm_80`` in the</span>
|
||||
<span class="sd"> compilation of the module.</span>
|
||||
|
||||
<span class="sd"> This utility wraps the building of a PyTorch CUDA module with a setting of this environment</span>
|
||||
<span class="sd"> variable according to the current compute capability being targetted.</span>
|
||||
|
||||
<span class="sd"> Example usage:</span>
|
||||
|
||||
<span class="sd"> .. highlight:: python</span>
|
||||
<span class="sd"> .. code-block:: python</span>
|
||||
|
||||
<span class="sd"> # Temporarily set TORCH_CUDA_ARCH_LIST="8.0"</span>
|
||||
<span class="sd"> with _ArchListSetter(80):</span>
|
||||
<span class="sd"> # Perform JIT compilation and loading of the module</span>
|
||||
<span class="sd"> mod = torch.utils.cpp_extension.load(...)</span>
|
||||
|
||||
<span class="sd"> :param cc: compute capability</span>
|
||||
<span class="sd"> :type cc: int</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="n">_TORCH_CUDA_ARCH_LIST</span> <span class="o">=</span> <span class="s2">"TORCH_CUDA_ARCH_LIST"</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">cc</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">cc_str</span> <span class="o">=</span> <span class="s2">"."</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">cc</span><span class="p">)))</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__enter__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Saves the old value of TORCH_CUDA_ARCH_LIST and reset it to the new value based on ``cc``</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">old_arch_list</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">getenv</span><span class="p">(</span><span class="n">_ArchListSetter</span><span class="o">.</span><span class="n">_TORCH_CUDA_ARCH_LIST</span><span class="p">)</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="n">_ArchListSetter</span><span class="o">.</span><span class="n">_TORCH_CUDA_ARCH_LIST</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">cc_str</span>
|
||||
|
||||
<span class="k">return</span> <span class="bp">self</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__exit__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">exc_type</span><span class="p">,</span> <span class="n">exc_val</span><span class="p">,</span> <span class="n">traceback</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Restores the old value of TORCH_CUDA_ARCH_LIST</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">environ</span><span class="p">[</span><span class="n">_ArchListSetter</span><span class="o">.</span><span class="n">_TORCH_CUDA_ARCH_LIST</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">old_arch_list</span>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_jit</span><span class="p">(</span><span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">cc</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">cpp_file</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">cuda_file</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> JIT compiles and loads a PyTorch CUDA extension.</span>
|
||||
|
||||
<span class="sd"> :param name: name of the module to generate</span>
|
||||
<span class="sd"> :type name: str</span>
|
||||
<span class="sd"> :param cc: compute capability of the device the module should target</span>
|
||||
<span class="sd"> :type cc: int</span>
|
||||
<span class="sd"> :param cpp_file: path to file containing extension's C++ interface</span>
|
||||
<span class="sd"> :type cpp_file: str</span>
|
||||
<span class="sd"> :param cuda_file: path to file containing extension's CUDA interface</span>
|
||||
<span class="sd"> :type cuda_file: str</span>
|
||||
|
||||
<span class="sd"> :return: loaded PyTorch module</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">torch.utils.cpp_extension</span> <span class="kn">import</span> <span class="n">load</span>
|
||||
|
||||
<span class="n">extra_cuda_cflags</span> <span class="o">=</span> <span class="p">[</span><span class="s2">"-std=c++17"</span><span class="p">]</span>
|
||||
<span class="k">if</span> <span class="n">cc</span> <span class="o">==</span> <span class="mi">90</span><span class="p">:</span>
|
||||
<span class="c1"># PyTorch does not currently add the sm_90a target when compute capability</span>
|
||||
<span class="c1"># 9.0 is set within TORCH_CUDA_ARCH_LIST. Thus, we manually add the sm_90a target.</span>
|
||||
<span class="n">extra_cuda_cflags</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="s2">"-gencode=arch=compute_90a,code=sm_90a"</span><span class="p">)</span>
|
||||
|
||||
<span class="k">with</span> <span class="n">_ArchListSetter</span><span class="p">(</span><span class="n">cc</span><span class="p">):</span>
|
||||
<span class="n">jitmodule</span> <span class="o">=</span> <span class="n">load</span><span class="p">(</span>
|
||||
<span class="n">name</span><span class="p">,</span>
|
||||
<span class="p">[</span><span class="n">cpp_file</span><span class="p">,</span> <span class="n">cuda_file</span><span class="p">],</span>
|
||||
<span class="n">extra_cuda_cflags</span><span class="o">=</span><span class="n">extra_cuda_cflags</span><span class="p">,</span>
|
||||
<span class="n">extra_include_paths</span><span class="o">=</span><span class="p">[</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">CUTLASS_PATH</span><span class="p">,</span> <span class="s2">"include"</span><span class="p">),</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">CUTLASS_PATH</span><span class="p">,</span> <span class="s2">"tools/util/include"</span><span class="p">),</span>
|
||||
<span class="p">],</span>
|
||||
<span class="n">verbose</span><span class="o">=</span><span class="p">(</span><span class="n">logger</span><span class="o">.</span><span class="n">level</span> <span class="o">==</span> <span class="n">logging</span><span class="o">.</span><span class="n">DEBUG</span><span class="p">)</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">jitmodule</span>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_pytorch_gemm</span><span class="p">(</span><span class="n">op</span><span class="p">,</span> <span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">cc</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">jit</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">,</span> <span class="n">sourcedir</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="s2">""</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Generates source for building a PyTorch CUDA module that leverages the CUTLASS GEMM</span>
|
||||
<span class="sd"> specified by ``op``. If the ``jit`` parameter is set to true, the module is just-in-time</span>
|
||||
<span class="sd"> compiled, loaded, and returned.</span>
|
||||
|
||||
<span class="sd"> :param op: operation to emit in the module</span>
|
||||
<span class="sd"> :param name: name of the module to generate</span>
|
||||
<span class="sd"> :type name: str</span>
|
||||
<span class="sd"> :param cc: compute capability of the device the module should target</span>
|
||||
<span class="sd"> :type cc: int</span>
|
||||
<span class="sd"> :param jit: whether the module should be just-in-time compiled</span>
|
||||
<span class="sd"> :type jit: bool</span>
|
||||
<span class="sd"> :param sourcedir: directory to which generated source files should be written</span>
|
||||
<span class="sd"> :type sourcedir: str</span>
|
||||
|
||||
<span class="sd"> :return: loaded PyTorch module if ``jit=True`` or ``None`` otherwise</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">sourcedir</span> <span class="o">!=</span> <span class="s2">""</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">isdir</span><span class="p">(</span><span class="n">sourcedir</span><span class="p">):</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">sourcedir</span><span class="p">)</span>
|
||||
|
||||
<span class="n">cuda_file</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">sourcedir</span><span class="p">,</span> <span class="n">name</span> <span class="o">+</span> <span class="s2">"_kernel.cu"</span><span class="p">)</span>
|
||||
<span class="n">extra_kw</span> <span class="o">=</span> <span class="p">{}</span>
|
||||
<span class="k">if</span> <span class="n">op</span><span class="o">.</span><span class="n">api</span> <span class="o">==</span> <span class="n">ApiVersion</span><span class="o">.</span><span class="n">v3x</span><span class="p">:</span>
|
||||
<span class="n">impl_template</span> <span class="o">=</span> <span class="n">_PYTORCH_GEMM_IMPL_TEMPLATE_3x</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">impl_template</span> <span class="o">=</span> <span class="n">_PYTORCH_GEMM_IMPL_TEMPLATE_2x</span>
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">op</span><span class="o">.</span><span class="n">swizzling_functor</span><span class="p">,</span> <span class="n">swizzle</span><span class="o">.</span><span class="n">ThreadblockSwizzleStreamK</span><span class="p">):</span>
|
||||
<span class="n">extra_kw</span><span class="p">[</span><span class="s2">"args"</span><span class="p">]</span> <span class="o">=</span> <span class="n">common</span><span class="o">.</span><span class="n">_CUTLASS_KERNEL_ARGS_2x_STREAM_K</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">extra_kw</span><span class="p">[</span><span class="s2">"args"</span><span class="p">]</span> <span class="o">=</span> <span class="n">common</span><span class="o">.</span><span class="n">_CUTLASS_KERNEL_ARGS_2x</span>
|
||||
<span class="n">impl_template</span> <span class="o">=</span> <span class="p">(</span>
|
||||
<span class="n">_PYTORCH_GEMM_IMPL_TEMPLATE_3x</span>
|
||||
<span class="k">if</span> <span class="n">op</span><span class="o">.</span><span class="n">api</span> <span class="o">==</span> <span class="n">ApiVersion</span><span class="o">.</span><span class="n">v3x</span>
|
||||
<span class="k">else</span> <span class="n">_PYTORCH_GEMM_IMPL_TEMPLATE_2x</span>
|
||||
<span class="p">)</span>
|
||||
<span class="n">cuda_impl</span> <span class="o">=</span> <span class="n">SubstituteTemplate</span><span class="p">(</span><span class="n">impl_template</span><span class="p">,</span> <span class="p">{</span><span class="s2">"name"</span><span class="p">:</span> <span class="n">name</span><span class="p">,</span> <span class="o">**</span><span class="n">extra_kw</span><span class="p">})</span>
|
||||
<span class="n">cuda_source</span> <span class="o">=</span> <span class="n">SubstituteTemplate</span><span class="p">(</span>
|
||||
<span class="n">_PYTORCH_CUDA_TEMPLATE</span><span class="p">,</span>
|
||||
<span class="p">{</span>
|
||||
<span class="s2">"includes"</span><span class="p">:</span> <span class="n">_PYTORCH_GEMM_INCLUDES</span><span class="p">[</span><span class="n">op</span><span class="o">.</span><span class="n">api</span><span class="p">],</span>
|
||||
<span class="s2">"declaration"</span><span class="p">:</span> <span class="n">op</span><span class="o">.</span><span class="n">rt_module</span><span class="o">.</span><span class="n">emit</span><span class="p">(),</span>
|
||||
<span class="s2">"procedural_name"</span><span class="p">:</span> <span class="n">op</span><span class="o">.</span><span class="n">procedural_name</span><span class="p">(),</span>
|
||||
<span class="s2">"impl"</span><span class="p">:</span> <span class="n">cuda_impl</span><span class="p">,</span>
|
||||
<span class="s2">"torch_type_C"</span><span class="p">:</span> <span class="n">_CUTLASS_TYPE_TO_TORCH_TYPE</span><span class="p">[</span><span class="n">op</span><span class="o">.</span><span class="n">C</span><span class="o">.</span><span class="n">element</span><span class="p">],</span>
|
||||
<span class="p">},</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">cuda_file</span><span class="p">,</span> <span class="s2">"w"</span><span class="p">)</span> <span class="k">as</span> <span class="n">outfile</span><span class="p">:</span>
|
||||
<span class="n">outfile</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">cuda_source</span><span class="p">)</span>
|
||||
|
||||
<span class="n">cpp_file</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">sourcedir</span><span class="p">,</span> <span class="n">name</span> <span class="o">+</span> <span class="s2">".cpp"</span><span class="p">)</span>
|
||||
<span class="n">cpp_source</span> <span class="o">=</span> <span class="n">SubstituteTemplate</span><span class="p">(</span>
|
||||
<span class="n">_PYTORCH_GEMM_CPP_TEMPLATE</span><span class="p">,</span>
|
||||
<span class="p">{</span><span class="s2">"name"</span><span class="p">:</span> <span class="n">name</span><span class="p">,</span> <span class="s2">"description"</span><span class="p">:</span> <span class="sa">f</span><span class="s2">"CUTLASS </span><span class="si">{</span><span class="n">op</span><span class="o">.</span><span class="n">procedural_name</span><span class="p">()</span><span class="si">}</span><span class="s2"> GEMM"</span><span class="p">},</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">cpp_file</span><span class="p">,</span> <span class="s2">"w"</span><span class="p">)</span> <span class="k">as</span> <span class="n">outfile</span><span class="p">:</span>
|
||||
<span class="n">outfile</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">cpp_source</span><span class="p">)</span>
|
||||
|
||||
<span class="n">_generate_setup</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">sourcedir</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">jit</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">_jit</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">cc</span><span class="p">,</span> <span class="n">cpp_file</span><span class="p">,</span> <span class="n">cuda_file</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="kc">None</span>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_pytorch_grouped_gemm</span><span class="p">(</span>
|
||||
<span class="n">op</span><span class="p">,</span> <span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">cc</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">jit</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">,</span> <span class="n">sourcedir</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="s2">""</span>
|
||||
<span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Generates source for building a PyTorch CUDA module that leverages the CUTLASS grouped GEMM</span>
|
||||
<span class="sd"> specified by ``op``. If the ``jit`` parameter is set to true, the module is just-in-time</span>
|
||||
<span class="sd"> compiled, loaded, and returned.</span>
|
||||
|
||||
<span class="sd"> :param op: operation to emit in the module</span>
|
||||
<span class="sd"> :param name: name of the module to generate</span>
|
||||
<span class="sd"> :type name: str</span>
|
||||
<span class="sd"> :param cc: compute capability of the device the module should target</span>
|
||||
<span class="sd"> :type cc: int</span>
|
||||
<span class="sd"> :param jit: whether the module should be just-in-time compiled</span>
|
||||
<span class="sd"> :type jit: bool</span>
|
||||
<span class="sd"> :param sourcedir: directory to which generated source files should be written</span>
|
||||
<span class="sd"> :type sourcedir: str</span>
|
||||
|
||||
<span class="sd"> :return: loaded PyTorch module if ``jit=True`` or ``None`` otherwise</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">op</span><span class="o">.</span><span class="n">api</span> <span class="o">!=</span> <span class="n">ApiVersion</span><span class="o">.</span><span class="n">v2x</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s2">"Grouped GEMM is currently only supported for CUTLASS 2.x"</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">sourcedir</span> <span class="o">!=</span> <span class="s2">""</span> <span class="ow">and</span> <span class="ow">not</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">isdir</span><span class="p">(</span><span class="n">sourcedir</span><span class="p">):</span>
|
||||
<span class="n">os</span><span class="o">.</span><span class="n">makedirs</span><span class="p">(</span><span class="n">sourcedir</span><span class="p">)</span>
|
||||
|
||||
<span class="n">cuda_file</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">sourcedir</span><span class="p">,</span> <span class="n">name</span> <span class="o">+</span> <span class="s2">"_kernel.cu"</span><span class="p">)</span>
|
||||
<span class="n">cuda_impl</span> <span class="o">=</span> <span class="n">SubstituteTemplate</span><span class="p">(</span><span class="n">_PYTORCH_GROUPED_GEMM_IMPL_TEMPLATE</span><span class="p">,</span> <span class="p">{</span><span class="s2">"name"</span><span class="p">:</span> <span class="n">name</span><span class="p">})</span>
|
||||
<span class="n">cuda_source</span> <span class="o">=</span> <span class="n">SubstituteTemplate</span><span class="p">(</span>
|
||||
<span class="n">_PYTORCH_CUDA_TEMPLATE</span><span class="p">,</span>
|
||||
<span class="p">{</span>
|
||||
<span class="s2">"includes"</span><span class="p">:</span> <span class="n">_PYTORCH_GROUPED_GEMM_INCLUDES</span><span class="p">,</span>
|
||||
<span class="s2">"declaration"</span><span class="p">:</span> <span class="n">op</span><span class="o">.</span><span class="n">rt_module</span><span class="o">.</span><span class="n">emit</span><span class="p">(),</span>
|
||||
<span class="s2">"procedural_name"</span><span class="p">:</span> <span class="n">op</span><span class="o">.</span><span class="n">procedural_name</span><span class="p">(),</span>
|
||||
<span class="s2">"impl"</span><span class="p">:</span> <span class="n">cuda_impl</span><span class="p">,</span>
|
||||
<span class="s2">"torch_type_C"</span><span class="p">:</span> <span class="n">_CUTLASS_TYPE_TO_TORCH_TYPE</span><span class="p">[</span><span class="n">op</span><span class="o">.</span><span class="n">C</span><span class="o">.</span><span class="n">element</span><span class="p">],</span>
|
||||
<span class="p">},</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">cuda_file</span><span class="p">,</span> <span class="s2">"w"</span><span class="p">)</span> <span class="k">as</span> <span class="n">outfile</span><span class="p">:</span>
|
||||
<span class="n">outfile</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">cuda_source</span><span class="p">)</span>
|
||||
|
||||
<span class="n">cpp_file</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">sourcedir</span><span class="p">,</span> <span class="n">name</span> <span class="o">+</span> <span class="s2">".cpp"</span><span class="p">)</span>
|
||||
<span class="n">cpp_source</span> <span class="o">=</span> <span class="n">SubstituteTemplate</span><span class="p">(</span>
|
||||
<span class="n">_PYTORCH_GROUPED_GEMM_CPP_TEMPLATE</span><span class="p">,</span>
|
||||
<span class="p">{</span><span class="s2">"name"</span><span class="p">:</span> <span class="n">name</span><span class="p">,</span> <span class="s2">"description"</span><span class="p">:</span> <span class="sa">f</span><span class="s2">"CUTLASS </span><span class="si">{</span><span class="n">op</span><span class="o">.</span><span class="n">procedural_name</span><span class="p">()</span><span class="si">}</span><span class="s2"> grouped GEMM"</span><span class="p">},</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">cpp_file</span><span class="p">,</span> <span class="s2">"w"</span><span class="p">)</span> <span class="k">as</span> <span class="n">outfile</span><span class="p">:</span>
|
||||
<span class="n">outfile</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">cpp_source</span><span class="p">)</span>
|
||||
|
||||
<span class="n">_generate_setup</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">sourcedir</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">jit</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">_jit</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">cc</span><span class="p">,</span> <span class="n">cpp_file</span><span class="p">,</span> <span class="n">cuda_file</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="kc">None</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="pytorch"><a class="viewcode-back" href="../../../cutlass.emit.html#cutlass.emit.pytorch.pytorch">[docs]</a><span class="k">def</span> <span class="nf">pytorch</span><span class="p">(</span><span class="n">op</span><span class="p">,</span> <span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">cc</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">jit</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">,</span> <span class="n">sourcedir</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="s2">""</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Generates source for building a PyTorch CUDA module that leverages the CUTLASS kernel</span>
|
||||
<span class="sd"> specified by ``op``. If the ``jit`` parameter is set to true, the module is just-in-time</span>
|
||||
<span class="sd"> compiled, loaded, and returned.</span>
|
||||
|
||||
<span class="sd"> The result of this method is files within ``sourcedir`` that can be used for building</span>
|
||||
<span class="sd"> a PyTorch module.</span>
|
||||
|
||||
<span class="sd"> :param op: operation to emit in the module</span>
|
||||
<span class="sd"> :param name: name of the module to generate</span>
|
||||
<span class="sd"> :type name: str</span>
|
||||
<span class="sd"> :param cc: compute capability of the device the module should target</span>
|
||||
<span class="sd"> :type cc: int</span>
|
||||
<span class="sd"> :param jit: whether the module should be just-in-time compiled</span>
|
||||
<span class="sd"> :type jit: bool</span>
|
||||
<span class="sd"> :param sourcedir: directory to which generated source files should be written</span>
|
||||
<span class="sd"> :type sourcedir: str</span>
|
||||
|
||||
<span class="sd"> :return: loaded PyTorch module (if ``jit=True``) or None</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">device_op</span> <span class="o">=</span> <span class="n">op</span><span class="o">.</span><span class="n">device_op</span><span class="p">()</span>
|
||||
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">op</span><span class="p">,</span> <span class="n">GemmOperationUniversal</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="n">_pytorch_gemm</span><span class="p">(</span><span class="n">device_op</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">cc</span><span class="p">,</span> <span class="n">jit</span><span class="p">,</span> <span class="n">sourcedir</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">op</span><span class="p">,</span> <span class="n">GemmOperationGrouped</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="n">_pytorch_grouped_gemm</span><span class="p">(</span><span class="n">device_op</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">cc</span><span class="p">,</span> <span class="n">jit</span><span class="p">,</span> <span class="n">sourcedir</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span>
|
||||
<span class="sa">f</span><span class="s2">"Operation type </span><span class="si">{</span><span class="nb">type</span><span class="p">(</span><span class="n">op</span><span class="p">)</span><span class="si">}</span><span class="s2"> is not currently supported for PyTorch emission."</span>
|
||||
<span class="p">)</span></div>
|
||||
</pre></div>
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
|
||||
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer no-toc">
|
||||
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
|
||||
<script src="../../../_static/doctools.js"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../../_static/scripts/furo.js"></script>
|
||||
<script src="../../../_static/clipboard.min.js"></script>
|
||||
<script src="../../../_static/copybutton.js"></script>
|
||||
<script src="../../../_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
391
python/docs/_modules/cutlass/epilogue.html
Normal file
@ -0,0 +1,391 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><link rel="index" title="Index" href="../../genindex.html" /><link rel="search" title="Search" href="../../search.html" />
|
||||
<link rel="canonical" href="docs/_modules/cutlass/epilogue.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>cutlass.epilogue - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="../../index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="../../index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="../../_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="../../_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="../../search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../../modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="../../cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../../examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<h1>Source code for cutlass.epilogue</h1><div class="highlight"><pre>
|
||||
<span></span><span class="c1">#################################################################################################</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># Copyright (c) 2023 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.</span>
|
||||
<span class="c1"># SPDX-License-Identifier: BSD-3-Clause</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># Redistribution and use in source and binary forms, with or without</span>
|
||||
<span class="c1"># modification, are permitted provided that the following conditions are met:</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 1. Redistributions of source code must retain the above copyright notice, this</span>
|
||||
<span class="c1"># list of conditions and the following disclaimer.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 2. Redistributions in binary form must reproduce the above copyright notice,</span>
|
||||
<span class="c1"># this list of conditions and the following disclaimer in the documentation</span>
|
||||
<span class="c1"># and/or other materials provided with the distribution.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 3. Neither the name of the copyright holder nor the names of its</span>
|
||||
<span class="c1"># contributors may be used to endorse or promote products derived from</span>
|
||||
<span class="c1"># this software without specific prior written permission.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"</span>
|
||||
<span class="c1"># AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE</span>
|
||||
<span class="c1"># IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE</span>
|
||||
<span class="c1"># DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE</span>
|
||||
<span class="c1"># FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL</span>
|
||||
<span class="c1"># DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR</span>
|
||||
<span class="c1"># SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER</span>
|
||||
<span class="c1"># CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,</span>
|
||||
<span class="c1"># OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE</span>
|
||||
<span class="c1"># OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1">#################################################################################################</span>
|
||||
|
||||
<span class="sd">"""</span>
|
||||
<span class="sd">Registry of elementwise epilogues</span>
|
||||
|
||||
<span class="sd">Elementwise epilogues can be added to many CUTLASS kernels in the CUTLAS Python interface via</span>
|
||||
<span class="sd">code like the following for GEMM:</span>
|
||||
|
||||
<span class="sd">.. highlight:: python</span>
|
||||
<span class="sd">.. code-block:: python</span>
|
||||
|
||||
<span class="sd"> plan = cutlass.op.Gemm(element=cutlass.DataType.f32, layout=cutlass.LayoutType.RowMajor)</span>
|
||||
<span class="sd"> plan.activation = cutlass.epilogue.relu</span>
|
||||
<span class="sd">"""</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">cutlass.backend</span> <span class="kn">import</span> <span class="n">epilogue</span>
|
||||
|
||||
<span class="n">gelu</span> <span class="o">=</span> <span class="n">epilogue</span><span class="o">.</span><span class="n">gelu</span>
|
||||
<span class="n">hardswish</span> <span class="o">=</span> <span class="n">epilogue</span><span class="o">.</span><span class="n">hardswish</span>
|
||||
<span class="n">identity</span> <span class="o">=</span> <span class="n">epilogue</span><span class="o">.</span><span class="n">identity</span>
|
||||
<span class="n">leaky_relu</span> <span class="o">=</span> <span class="n">epilogue</span><span class="o">.</span><span class="n">leaky_relu</span>
|
||||
<span class="n">relu</span> <span class="o">=</span> <span class="n">epilogue</span><span class="o">.</span><span class="n">relu</span>
|
||||
<span class="n">sigmoid</span> <span class="o">=</span> <span class="n">epilogue</span><span class="o">.</span><span class="n">sigmoid</span>
|
||||
<span class="n">silu</span> <span class="o">=</span> <span class="n">epilogue</span><span class="o">.</span><span class="n">silu</span>
|
||||
<span class="n">tanh</span> <span class="o">=</span> <span class="n">epilogue</span><span class="o">.</span><span class="n">tanh</span>
|
||||
|
||||
|
||||
<span class="n">_activations</span> <span class="o">=</span> <span class="p">[</span><span class="n">gelu</span><span class="p">,</span> <span class="n">hardswish</span><span class="p">,</span> <span class="n">identity</span><span class="p">,</span> <span class="n">leaky_relu</span><span class="p">,</span> <span class="n">relu</span><span class="p">,</span> <span class="n">sigmoid</span><span class="p">,</span> <span class="n">silu</span><span class="p">,</span> <span class="n">tanh</span><span class="p">]</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="get_activations"><a class="viewcode-back" href="../../cutlass.html#cutlass.epilogue.get_activations">[docs]</a><span class="k">def</span> <span class="nf">get_activations</span><span class="p">()</span> <span class="o">-></span> <span class="nb">list</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns a list of available activation functions</span>
|
||||
|
||||
<span class="sd"> :return: list of available activation functions</span>
|
||||
<span class="sd"> :rtype: list</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">_activations</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="get_activation_epilogue"><a class="viewcode-back" href="../../cutlass.html#cutlass.epilogue.get_activation_epilogue">[docs]</a><span class="k">def</span> <span class="nf">get_activation_epilogue</span><span class="p">(</span>
|
||||
<span class="n">activation</span><span class="p">,</span>
|
||||
<span class="n">element_output</span><span class="p">,</span>
|
||||
<span class="n">elements_per_access</span><span class="p">,</span>
|
||||
<span class="n">element_accumulator</span><span class="p">,</span>
|
||||
<span class="n">element_compute</span><span class="p">,</span>
|
||||
<span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Return an epilogue corresponding to the activation function, data types, and alignment</span>
|
||||
<span class="sd"> used in the kernel</span>
|
||||
|
||||
<span class="sd"> :param activation: elementwise activation function to use</span>
|
||||
<span class="sd"> :param element_output: data type of the output</span>
|
||||
<span class="sd"> :param elements_per_access: alignment of operand C of the kernel</span>
|
||||
<span class="sd"> :type elements_per_access: int</span>
|
||||
<span class="sd"> :param element_accumulator: data type of the accumulated output C</span>
|
||||
<span class="sd"> :param element_compute: data type in which compute operations should be performed</span>
|
||||
|
||||
<span class="sd"> :return: epilogue functor</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">activation</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">_activations</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span>
|
||||
<span class="sa">f</span><span class="s2">"Unsupported activation type </span><span class="si">{</span><span class="n">activation</span><span class="si">}</span><span class="s2">. Available activations are: </span><span class="si">{</span><span class="n">_activations</span><span class="si">}</span><span class="s2">"</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">activation</span> <span class="o">==</span> <span class="n">identity</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">epilogue</span><span class="o">.</span><span class="n">LinearCombination</span><span class="p">(</span>
|
||||
<span class="n">element_output</span><span class="p">,</span> <span class="n">elements_per_access</span><span class="p">,</span> <span class="n">element_accumulator</span><span class="p">,</span> <span class="n">element_compute</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">epilogue</span><span class="o">.</span><span class="n">LinearCombinationGeneric</span><span class="p">(</span>
|
||||
<span class="n">activation</span><span class="p">(</span><span class="n">element_compute</span><span class="p">),</span>
|
||||
<span class="n">element_output</span><span class="p">,</span>
|
||||
<span class="n">elements_per_access</span><span class="p">,</span>
|
||||
<span class="n">element_accumulator</span><span class="p">,</span>
|
||||
<span class="n">element_compute</span><span class="p">,</span>
|
||||
<span class="p">)</span></div>
|
||||
</pre></div>
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
|
||||
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer no-toc">
|
||||
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
|
||||
<script src="../../_static/doctools.js"></script>
|
||||
<script src="../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../_static/scripts/furo.js"></script>
|
||||
<script src="../../_static/clipboard.min.js"></script>
|
||||
<script src="../../_static/copybutton.js"></script>
|
||||
<script src="../../_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
729
python/docs/_modules/cutlass/library_defaults.html
Normal file
@ -0,0 +1,729 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><link rel="index" title="Index" href="../../genindex.html" /><link rel="search" title="Search" href="../../search.html" />
|
||||
<link rel="canonical" href="docs/_modules/cutlass/library_defaults.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>cutlass.library_defaults - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="../../index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="../../index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="../../_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="../../_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="../../search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../../modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="../../cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../../examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<h1>Source code for cutlass.library_defaults</h1><div class="highlight"><pre>
|
||||
<span></span><span class="c1">#################################################################################################</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># Copyright (c) 2023 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.</span>
|
||||
<span class="c1"># SPDX-License-Identifier: BSD-3-Clause</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># Redistribution and use in source and binary forms, with or without</span>
|
||||
<span class="c1"># modification, are permitted provided that the following conditions are met:</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 1. Redistributions of source code must retain the above copyright notice, this</span>
|
||||
<span class="c1"># list of conditions and the following disclaimer.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 2. Redistributions in binary form must reproduce the above copyright notice,</span>
|
||||
<span class="c1"># this list of conditions and the following disclaimer in the documentation</span>
|
||||
<span class="c1"># and/or other materials provided with the distribution.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 3. Neither the name of the copyright holder nor the names of its</span>
|
||||
<span class="c1"># contributors may be used to endorse or promote products derived from</span>
|
||||
<span class="c1"># this software without specific prior written permission.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"</span>
|
||||
<span class="c1"># AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE</span>
|
||||
<span class="c1"># IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE</span>
|
||||
<span class="c1"># DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE</span>
|
||||
<span class="c1"># FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL</span>
|
||||
<span class="c1"># DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR</span>
|
||||
<span class="c1"># SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER</span>
|
||||
<span class="c1"># CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,</span>
|
||||
<span class="c1"># OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE</span>
|
||||
<span class="c1"># OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1">#################################################################################################</span>
|
||||
|
||||
<span class="sd">"""</span>
|
||||
<span class="sd">Classes containing valid operations for a given compute capability and data types.</span>
|
||||
<span class="sd">"""</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">logging</span>
|
||||
<span class="kn">from</span> <span class="nn">cuda</span> <span class="kn">import</span> <span class="n">__version__</span>
|
||||
|
||||
<span class="c1"># Strip any additional information from the CUDA version</span>
|
||||
<span class="n">_cuda_version</span> <span class="o">=</span> <span class="n">__version__</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">"rc"</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
|
||||
|
||||
<span class="c1"># Imports from CUTLASS profiler generator and manifest scripts</span>
|
||||
<span class="kn">import</span> <span class="nn">generator</span> <span class="k">as</span> <span class="nn">prof_generator</span>
|
||||
<span class="kn">import</span> <span class="nn">manifest</span> <span class="k">as</span> <span class="nn">prof_manifest</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">cutlass</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.utils.check</span> <span class="kn">import</span> <span class="n">valid_stage_count</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.utils.datatypes</span> <span class="kn">import</span> <span class="n">td_from_profiler_td</span><span class="p">,</span> <span class="n">td_from_profiler_op</span><span class="p">,</span> <span class="n">has_binding_type</span>
|
||||
|
||||
|
||||
<span class="n">_generator_ccs</span> <span class="o">=</span> <span class="p">[</span><span class="mi">50</span><span class="p">,</span> <span class="mi">60</span><span class="p">,</span> <span class="mi">61</span><span class="p">,</span> <span class="mi">70</span><span class="p">,</span> <span class="mi">75</span><span class="p">,</span> <span class="mi">80</span><span class="p">,</span> <span class="mi">90</span><span class="p">]</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="KernelsForDataType"><a class="viewcode-back" href="../../cutlass.html#cutlass.library_defaults.KernelsForDataType">[docs]</a><span class="k">class</span> <span class="nc">KernelsForDataType</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Container class for keeping track of kernels that correspond to a particular combination</span>
|
||||
<span class="sd"> of data types for operands A, B, and accumulator</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">datatype_comb</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">,</span> <span class="n">layout_comb</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">datatype_comb</span> <span class="o">=</span> <span class="n">datatype_comb</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">layout_comb</span> <span class="o">=</span> <span class="n">layout_comb</span>
|
||||
|
||||
<span class="c1"># Dictionary mapping from alignment (int) to a list of kernels that fit the alignment</span>
|
||||
<span class="c1"># constraint for the data type combination</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">kernels_by_alignment</span> <span class="o">=</span> <span class="p">{}</span>
|
||||
|
||||
<div class="viewcode-block" id="KernelsForDataType.add"><a class="viewcode-back" href="../../cutlass.html#cutlass.library_defaults.KernelsForDataType.add">[docs]</a> <span class="k">def</span> <span class="nf">add</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">operation</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Add an operation to the list of supported kernels</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">alignment</span> <span class="o">=</span> <span class="n">operation</span><span class="o">.</span><span class="n">A</span><span class="o">.</span><span class="n">alignment</span>
|
||||
<span class="k">if</span> <span class="n">alignment</span> <span class="ow">not</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">kernels_by_alignment</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">kernels_by_alignment</span><span class="p">[</span><span class="n">alignment</span><span class="p">]</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">kernels_by_alignment</span><span class="p">[</span><span class="n">alignment</span><span class="p">]</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">operation</span><span class="p">)</span></div>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">alignments</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns an unsorted list of alignments supported by this data type combination</span>
|
||||
|
||||
<span class="sd"> :return: unsorted list of alignments supported by this data type combination</span>
|
||||
<span class="sd"> :rtype: list</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="nb">list</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">kernels_by_alignment</span><span class="o">.</span><span class="n">keys</span><span class="p">())</span>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">all_operations</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns a list of all operations supported by this data type combination</span>
|
||||
|
||||
<span class="sd"> :return: list of all operations supported by this data type combination</span>
|
||||
<span class="sd"> :rtype: list</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">ops</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">_</span><span class="p">,</span> <span class="n">alignment_ops</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">kernels_by_alignment</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
|
||||
<span class="n">ops</span><span class="o">.</span><span class="n">extend</span><span class="p">(</span><span class="n">alignment_ops</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">ops</span>
|
||||
|
||||
<div class="viewcode-block" id="KernelsForDataType.operations"><a class="viewcode-back" href="../../cutlass.html#cutlass.library_defaults.KernelsForDataType.operations">[docs]</a> <span class="k">def</span> <span class="nf">operations</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">alignment</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns operations satisfying the alignment constraint indicated by `alignment`</span>
|
||||
|
||||
<span class="sd"> :param alignment: alignment constraint of operations to return</span>
|
||||
<span class="sd"> :type alignment: int</span>
|
||||
|
||||
<span class="sd"> :return: list of operations</span>
|
||||
<span class="sd"> :rtype: list</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">alignment</span> <span class="ow">not</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">kernels_by_alignment</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span>
|
||||
<span class="sa">f</span><span class="s2">"No operations of alignment </span><span class="si">{</span><span class="n">alignment</span><span class="si">}</span><span class="s2"> found for data type and layout "</span>
|
||||
<span class="sa">f</span><span class="s2">"combination </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">datatype_comb</span><span class="si">}</span><span class="s2"> </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">layout_comb</span><span class="si">}</span><span class="s2">"</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">kernels_by_alignment</span><span class="p">[</span><span class="n">alignment</span><span class="p">]</span></div>
|
||||
|
||||
<div class="viewcode-block" id="KernelsForDataType.find_alignment"><a class="viewcode-back" href="../../cutlass.html#cutlass.library_defaults.KernelsForDataType.find_alignment">[docs]</a> <span class="k">def</span> <span class="nf">find_alignment</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">shape</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">,</span> <span class="n">layout</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="p">)</span> <span class="o">-></span> <span class="nb">int</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the most preferable alignment for a given shape and layout</span>
|
||||
|
||||
<span class="sd"> :param shape: extent of each dimension of the tensor</span>
|
||||
<span class="sd"> :type shape: tuple</span>
|
||||
<span class="sd"> :param layout: layout of the tensor</span>
|
||||
<span class="sd"> :type layout: cutlass.LayoutType</span>
|
||||
|
||||
<span class="sd"> :return: maximum alignment supported by the data type combination and tensor size</span>
|
||||
<span class="sd"> :rtype: int</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="c1"># Determine the leading dimension of the shape</span>
|
||||
<span class="k">if</span> <span class="n">layout</span> <span class="o">==</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">:</span>
|
||||
<span class="n">ld</span> <span class="o">=</span> <span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
|
||||
<span class="k">elif</span> <span class="n">layout</span> <span class="o">==</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">:</span>
|
||||
<span class="n">ld</span> <span class="o">=</span> <span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Unexpected or unsupported layout </span><span class="si">{</span><span class="n">layout</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
||||
|
||||
<span class="k">for</span> <span class="n">alignment</span> <span class="ow">in</span> <span class="nb">sorted</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">kernels_by_alignment</span><span class="o">.</span><span class="n">keys</span><span class="p">()),</span> <span class="n">reverse</span><span class="o">=</span><span class="kc">True</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">ld</span> <span class="o">%</span> <span class="n">alignment</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">alignment</span>
|
||||
|
||||
<span class="c1"># Default to alignment of 1 if no others match</span>
|
||||
<span class="k">return</span> <span class="mi">1</span></div>
|
||||
|
||||
<div class="viewcode-block" id="KernelsForDataType.sort"><a class="viewcode-back" href="../../cutlass.html#cutlass.library_defaults.KernelsForDataType.sort">[docs]</a> <span class="k">def</span> <span class="nf">sort</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Sorts each list of kernels in `kernels_by_alignment` in descending order of threadblock shape</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">key</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">op</span><span class="p">:</span> <span class="p">(</span>
|
||||
<span class="n">op</span><span class="o">.</span><span class="n">tile_description</span><span class="o">.</span><span class="n">threadblock_shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
|
||||
<span class="o">*</span> <span class="n">op</span><span class="o">.</span><span class="n">tile_description</span><span class="o">.</span><span class="n">threadblock_shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
|
||||
<span class="o">*</span> <span class="n">op</span><span class="o">.</span><span class="n">tile_description</span><span class="o">.</span><span class="n">threadblock_shape</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">for</span> <span class="n">alignment</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">kernels_by_alignment</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">kernels_by_alignment</span><span class="p">[</span><span class="n">alignment</span><span class="p">]</span><span class="o">.</span><span class="n">sort</span><span class="p">(</span><span class="n">key</span><span class="o">=</span><span class="n">key</span><span class="p">,</span> <span class="n">reverse</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="ArchOptions"><a class="viewcode-back" href="../../cutlass.html#cutlass.library_defaults.ArchOptions">[docs]</a><span class="k">class</span> <span class="nc">ArchOptions</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Structure for keeping track of kernels available on a given compute capability</span>
|
||||
|
||||
<span class="sd"> :param target_cc: compute capability of the device on which kernels will be run</span>
|
||||
<span class="sd"> :type target_cc: int</span>
|
||||
<span class="sd"> :param kernel_cc: compute capability of the kernels to generate</span>
|
||||
<span class="sd"> :type kernel_cc: int</span>
|
||||
<span class="sd"> :param operation_kind: type of operation to register</span>
|
||||
<span class="sd"> :type operation_kind: cutlass.OperationKind</span>
|
||||
<span class="sd"> :param gemm_kinds: types of GEMM operations that can be included</span>
|
||||
<span class="sd"> :type gemm_kinds: list</span>
|
||||
<span class="sd"> :param allowed_math_operations: types of primitive math operations allowed</span>
|
||||
<span class="sd"> :type allowed_math_operations: list</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="p">,</span>
|
||||
<span class="n">target_cc</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span>
|
||||
<span class="n">kernel_cc</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span>
|
||||
<span class="n">operation_kind</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OperationKind</span><span class="p">,</span>
|
||||
<span class="n">gemm_kinds</span><span class="p">:</span> <span class="nb">list</span><span class="p">,</span>
|
||||
<span class="n">allowed_math_operations</span><span class="p">:</span> <span class="nb">list</span> <span class="o">=</span> <span class="p">[</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">MathOperation</span><span class="o">.</span><span class="n">multiply_add</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">MathOperation</span><span class="o">.</span><span class="n">multiply_add_saturate</span><span class="p">,</span>
|
||||
<span class="p">]</span>
|
||||
<span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">cc</span> <span class="o">=</span> <span class="n">kernel_cc</span>
|
||||
|
||||
<span class="c1"># Dictionary with following structure:</span>
|
||||
<span class="c1"># Key: OpcodeClass</span>
|
||||
<span class="c1"># Value: Dictionary with the following structure:</span>
|
||||
<span class="c1"># Key: tuple of ((DataType, DataType, DataType), (LayoutType, LayoutType, LayoutType),</span>
|
||||
<span class="c1"># representing ((element_a, element_b, element_accumulator), (layout_a, layout_b))</span>
|
||||
<span class="c1"># Value: KernelsForDataType</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">operations_by_opclass</span> <span class="o">=</span> <span class="p">{}</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">op_class</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">allowed_math_operations</span> <span class="o">=</span> <span class="n">allowed_math_operations</span>
|
||||
|
||||
<span class="c1"># Identify the method within CUTLASS generator script that generates kernel</span>
|
||||
<span class="c1"># descriptions for the target CC</span>
|
||||
<span class="n">generate_function_name</span> <span class="o">=</span> <span class="s2">"GenerateSM"</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">kernel_cc</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">prof_generator</span><span class="p">,</span> <span class="n">generate_function_name</span><span class="p">):</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">logger</span><span class="o">.</span><span class="n">warning</span><span class="p">(</span><span class="sa">f</span><span class="s2">"No generator found for architecture </span><span class="si">{</span><span class="n">kernel_cc</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
||||
<span class="k">return</span>
|
||||
<span class="n">generate_function</span> <span class="o">=</span> <span class="nb">getattr</span><span class="p">(</span><span class="n">prof_generator</span><span class="p">,</span> <span class="n">generate_function_name</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># Initialize a default manifest and populate it with valid kernel descriptions</span>
|
||||
<span class="c1"># for the target CC</span>
|
||||
<span class="n">args</span> <span class="o">=</span> <span class="p">[</span>
|
||||
<span class="s2">"--kernels=all"</span><span class="p">,</span>
|
||||
<span class="sa">f</span><span class="s2">"--log-level=</span><span class="si">{</span><span class="n">logging</span><span class="o">.</span><span class="n">getLevelName</span><span class="p">(</span><span class="n">cutlass</span><span class="o">.</span><span class="n">logger</span><span class="o">.</span><span class="n">level</span><span class="p">)</span><span class="si">}</span><span class="s2">"</span>
|
||||
<span class="p">]</span>
|
||||
<span class="n">manifest_args</span> <span class="o">=</span> <span class="n">prof_generator</span><span class="o">.</span><span class="n">define_parser</span><span class="p">()</span><span class="o">.</span><span class="n">parse_args</span><span class="p">(</span><span class="n">args</span><span class="p">)</span>
|
||||
<span class="n">manifest</span> <span class="o">=</span> <span class="n">prof_manifest</span><span class="o">.</span><span class="n">Manifest</span><span class="p">(</span><span class="n">manifest_args</span><span class="p">)</span>
|
||||
<span class="n">generate_function</span><span class="p">(</span><span class="n">manifest</span><span class="p">,</span> <span class="n">_cuda_version</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">operation_kind</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">manifest</span><span class="o">.</span><span class="n">operations</span><span class="p">:</span>
|
||||
<span class="c1"># No kernels generated for this architecture, this could be because the CUDA</span>
|
||||
<span class="c1"># toolkit is insufficient to support operations in this CC</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">logger</span><span class="o">.</span><span class="n">warning</span><span class="p">(</span><span class="sa">f</span><span class="s2">"No operations of type </span><span class="si">{</span><span class="n">operation_kind</span><span class="si">}</span><span class="s2"> found for CC </span><span class="si">{</span><span class="n">kernel_cc</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
||||
<span class="k">return</span>
|
||||
|
||||
<span class="c1"># Iterate through the available operations for this operation kind and</span>
|
||||
<span class="c1"># find available opclasses and data types</span>
|
||||
<span class="k">for</span> <span class="n">name</span><span class="p">,</span> <span class="n">op_list</span> <span class="ow">in</span> <span class="n">manifest</span><span class="o">.</span><span class="n">operations</span><span class="p">[</span><span class="n">operation_kind</span><span class="p">]</span><span class="o">.</span><span class="n">items</span><span class="p">():</span>
|
||||
<span class="k">for</span> <span class="n">op</span> <span class="ow">in</span> <span class="n">op_list</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">op</span><span class="o">.</span><span class="n">gemm_kind</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">gemm_kinds</span><span class="p">:</span>
|
||||
<span class="k">continue</span>
|
||||
|
||||
<span class="n">mi</span> <span class="o">=</span> <span class="n">op</span><span class="o">.</span><span class="n">tile_description</span><span class="o">.</span><span class="n">math_instruction</span>
|
||||
<span class="k">if</span> <span class="n">mi</span><span class="o">.</span><span class="n">math_operation</span> <span class="ow">not</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">allowed_math_operations</span><span class="p">:</span>
|
||||
<span class="k">continue</span>
|
||||
|
||||
<span class="n">datatype_comb</span> <span class="o">=</span> <span class="p">(</span><span class="n">mi</span><span class="o">.</span><span class="n">element_a</span><span class="p">,</span> <span class="n">mi</span><span class="o">.</span><span class="n">element_b</span><span class="p">,</span> <span class="n">mi</span><span class="o">.</span><span class="n">element_accumulator</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># Skip any data types that do not currently have conversions via cutlass_bindings</span>
|
||||
<span class="k">if</span> <span class="kc">False</span> <span class="ow">in</span> <span class="p">[</span><span class="n">has_binding_type</span><span class="p">(</span><span class="n">elt</span><span class="p">)</span> <span class="k">for</span> <span class="n">elt</span> <span class="ow">in</span> <span class="n">datatype_comb</span><span class="p">]:</span>
|
||||
<span class="k">continue</span>
|
||||
|
||||
<span class="c1"># Prune operations that don't fit in shared memory</span>
|
||||
<span class="n">td</span> <span class="o">=</span> <span class="n">td_from_profiler_op</span><span class="p">(</span><span class="n">op</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="n">valid_stage_count</span><span class="p">(</span><span class="n">target_cc</span><span class="p">,</span> <span class="n">td</span><span class="p">)[</span><span class="mi">0</span><span class="p">]:</span>
|
||||
<span class="k">continue</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">mi</span><span class="o">.</span><span class="n">opcode_class</span> <span class="ow">not</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">operations_by_opclass</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">operations_by_opclass</span><span class="p">[</span><span class="n">mi</span><span class="o">.</span><span class="n">opcode_class</span><span class="p">]</span> <span class="o">=</span> <span class="p">{}</span>
|
||||
|
||||
<span class="n">datatype_comb</span> <span class="o">=</span> <span class="p">(</span><span class="n">mi</span><span class="o">.</span><span class="n">element_a</span><span class="p">,</span> <span class="n">mi</span><span class="o">.</span><span class="n">element_b</span><span class="p">,</span> <span class="n">mi</span><span class="o">.</span><span class="n">element_accumulator</span><span class="p">)</span>
|
||||
<span class="n">layout_comb</span> <span class="o">=</span> <span class="p">(</span><span class="n">op</span><span class="o">.</span><span class="n">A</span><span class="o">.</span><span class="n">layout</span><span class="p">,</span> <span class="n">op</span><span class="o">.</span><span class="n">B</span><span class="o">.</span><span class="n">layout</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># Register TF32 kernels as F32 to enable F32 -> TF32 conversion + TF32 Tensor Core operations</span>
|
||||
<span class="k">if</span> <span class="n">datatype_comb</span> <span class="o">==</span> <span class="p">(</span><span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">tf32</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">tf32</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">):</span>
|
||||
<span class="c1"># TF32 kernels only supported on SM80 and beyond</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">cc</span> <span class="o"><</span> <span class="mi">80</span><span class="p">:</span>
|
||||
<span class="k">continue</span>
|
||||
<span class="k">elif</span> <span class="bp">self</span><span class="o">.</span><span class="n">cc</span> <span class="o">==</span> <span class="mi">90</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="p">(</span><span class="n">op</span><span class="o">.</span><span class="n">A</span><span class="o">.</span><span class="n">element</span> <span class="o">!=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span>
|
||||
<span class="ow">or</span> <span class="n">op</span><span class="o">.</span><span class="n">B</span><span class="o">.</span><span class="n">element</span> <span class="o">!=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span>
|
||||
<span class="ow">or</span> <span class="n">op</span><span class="o">.</span><span class="n">C</span><span class="o">.</span><span class="n">element</span> <span class="o">!=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">):</span>
|
||||
<span class="k">continue</span>
|
||||
|
||||
<span class="n">datatype_comb</span> <span class="o">=</span> <span class="p">(</span><span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">)</span>
|
||||
|
||||
<span class="n">opclass_dict</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">operations_by_opclass</span><span class="p">[</span><span class="n">mi</span><span class="o">.</span><span class="n">opcode_class</span><span class="p">]</span>
|
||||
<span class="n">key</span> <span class="o">=</span> <span class="p">(</span><span class="n">datatype_comb</span><span class="p">,</span> <span class="n">layout_comb</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">key</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">opclass_dict</span><span class="p">:</span>
|
||||
<span class="n">opclass_dict</span><span class="p">[</span><span class="n">key</span><span class="p">]</span> <span class="o">=</span> <span class="n">KernelsForDataType</span><span class="p">(</span><span class="n">datatype_comb</span><span class="p">,</span> <span class="n">layout_comb</span><span class="p">)</span>
|
||||
<span class="n">opclass_dict</span><span class="p">[</span><span class="n">key</span><span class="p">]</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">op</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># Set the default opclass to TensorOp, if available. Otherwise default to SIMT</span>
|
||||
<span class="k">if</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">TensorOp</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">operations_by_opclass</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">op_class</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">TensorOp</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">op_class</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">Simt</span>
|
||||
|
||||
<span class="c1"># The profiler's generator may generate only a limited set of combinations of operands for SIMT kernels.</span>
|
||||
<span class="c1"># Here, we generate additional versions via a generic TileDescription.</span>
|
||||
<span class="k">if</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">Simt</span> <span class="ow">not</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">operations_by_opclass</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">operations_by_opclass</span><span class="p">[</span><span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">Simt</span><span class="p">]</span> <span class="o">=</span> <span class="p">{}</span>
|
||||
|
||||
<span class="n">types</span> <span class="o">=</span> <span class="p">[</span>
|
||||
<span class="p">(</span><span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">s8</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">s8</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">s8</span><span class="p">),</span>
|
||||
<span class="p">(</span><span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">s8</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">s8</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">s32</span><span class="p">),</span>
|
||||
<span class="p">(</span><span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f16</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f16</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f16</span><span class="p">),</span>
|
||||
<span class="p">(</span><span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f16</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f16</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">),</span>
|
||||
<span class="p">(</span><span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">),</span>
|
||||
<span class="p">(</span><span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f64</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f64</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f64</span><span class="p">),</span>
|
||||
<span class="p">]</span>
|
||||
|
||||
<span class="n">layouts</span> <span class="o">=</span> <span class="p">[</span>
|
||||
<span class="p">(</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">),</span>
|
||||
<span class="p">(</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">ColumnMajor</span><span class="p">),</span>
|
||||
<span class="p">(</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">ColumnMajor</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">),</span>
|
||||
<span class="p">(</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">ColumnMajor</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">ColumnMajor</span><span class="p">),</span>
|
||||
<span class="p">]</span>
|
||||
<span class="n">alignment</span> <span class="o">=</span> <span class="mi">1</span>
|
||||
<span class="n">epilogue_functor</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">EpilogueFunctor</span><span class="o">.</span><span class="n">LinearCombination</span>
|
||||
<span class="n">swizzling_functor</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">SwizzlingFunctor</span><span class="o">.</span><span class="n">Identity8</span>
|
||||
<span class="k">for</span> <span class="n">type_comb</span> <span class="ow">in</span> <span class="n">types</span><span class="p">:</span>
|
||||
<span class="k">for</span> <span class="n">layout_comb</span> <span class="ow">in</span> <span class="n">layouts</span><span class="p">:</span>
|
||||
<span class="n">comb</span> <span class="o">=</span> <span class="p">(</span><span class="n">type_comb</span><span class="p">,</span> <span class="n">layout_comb</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">comb</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">operations_by_opclass</span><span class="p">[</span><span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">Simt</span><span class="p">]:</span>
|
||||
<span class="k">continue</span>
|
||||
|
||||
<span class="n">A</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">TensorDescription</span><span class="p">(</span><span class="n">type_comb</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">layout_comb</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">alignment</span><span class="p">)</span>
|
||||
<span class="n">B</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">TensorDescription</span><span class="p">(</span><span class="n">type_comb</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">layout_comb</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">alignment</span><span class="p">)</span>
|
||||
<span class="n">C</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">TensorDescription</span><span class="p">(</span><span class="n">type_comb</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">ColumnMajor</span><span class="p">,</span> <span class="n">alignment</span><span class="p">)</span>
|
||||
<span class="n">math_inst</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">MathInstruction</span><span class="p">(</span>
|
||||
<span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span>
|
||||
<span class="n">type_comb</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span>
|
||||
<span class="n">type_comb</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span>
|
||||
<span class="n">type_comb</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">Simt</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">MathOperation</span><span class="o">.</span><span class="n">multiply_add</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
<span class="n">td</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">TileDescription</span><span class="p">(</span>
|
||||
<span class="p">[</span><span class="mi">128</span><span class="p">,</span> <span class="mi">128</span><span class="p">,</span> <span class="mi">8</span><span class="p">],</span> <span class="mi">2</span><span class="p">,</span> <span class="p">[</span><span class="mi">4</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">math_inst</span><span class="p">,</span> <span class="mi">50</span><span class="p">,</span> <span class="mi">1024</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># Prune operations that don't fit in shared memory</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="n">valid_stage_count</span><span class="p">(</span><span class="n">target_cc</span><span class="p">,</span> <span class="n">td_from_profiler_td</span><span class="p">(</span><span class="n">td</span><span class="p">))[</span><span class="mi">0</span><span class="p">]:</span>
|
||||
<span class="k">continue</span>
|
||||
|
||||
<span class="n">new_operation</span> <span class="o">=</span> <span class="n">prof_manifest</span><span class="o">.</span><span class="n">GemmOperation</span><span class="p">(</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">GemmKind</span><span class="o">.</span><span class="n">Universal</span><span class="p">,</span> <span class="n">td</span><span class="o">.</span><span class="n">minimum_compute_capability</span><span class="p">,</span>
|
||||
<span class="n">td</span><span class="p">,</span> <span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">type_comb</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">epilogue_functor</span><span class="p">,</span> <span class="n">swizzling_functor</span><span class="p">)</span>
|
||||
|
||||
<span class="n">new_kernels</span> <span class="o">=</span> <span class="n">KernelsForDataType</span><span class="p">(</span><span class="n">type_comb</span><span class="p">,</span> <span class="n">layout_comb</span><span class="p">)</span>
|
||||
<span class="n">new_kernels</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">new_operation</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">operations_by_opclass</span><span class="p">[</span><span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">Simt</span><span class="p">][</span><span class="n">comb</span><span class="p">]</span> <span class="o">=</span> <span class="n">new_kernels</span>
|
||||
|
||||
<span class="c1"># Sort all operations</span>
|
||||
<span class="k">for</span> <span class="n">oc</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">operations_by_opclass</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span>
|
||||
<span class="k">for</span> <span class="n">comb</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">operations_by_opclass</span><span class="p">[</span><span class="n">oc</span><span class="p">]</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">operations_by_opclass</span><span class="p">[</span><span class="n">oc</span><span class="p">][</span><span class="n">comb</span><span class="p">]</span><span class="o">.</span><span class="n">sort</span><span class="p">()</span>
|
||||
|
||||
<div class="viewcode-block" id="ArchOptions.opclass_supports_combination"><a class="viewcode-back" href="../../cutlass.html#cutlass.library_defaults.ArchOptions.opclass_supports_combination">[docs]</a> <span class="k">def</span> <span class="nf">opclass_supports_combination</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="p">,</span> <span class="n">op_class</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="p">,</span> <span class="n">datatype_comb</span><span class="p">:</span> <span class="nb">tuple</span><span class="p">,</span> <span class="n">layout_comb</span><span class="p">:</span> <span class="nb">tuple</span>
|
||||
<span class="p">)</span> <span class="o">-></span> <span class="nb">bool</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns whether the provided operation class supports the provided data type and layout combination</span>
|
||||
|
||||
<span class="sd"> :param op_class: operation class to consider</span>
|
||||
<span class="sd"> :type op_class: cutlass.OpcodeClass</span>
|
||||
<span class="sd"> :param datatype_comb: tuple of data types for (element_A, element_B, element_accumulator)</span>
|
||||
<span class="sd"> :type datatype_comb: tuple[cutlass.DataType]</span>
|
||||
<span class="sd"> :param layout_comb: tuple of data types for (layout_A, layout_B)</span>
|
||||
<span class="sd"> :type layout_comb: tuple[cutlass.LayoutType]</span>
|
||||
|
||||
<span class="sd"> :return: set of operation classes that support the provided data type and layout combination</span>
|
||||
<span class="sd"> :rtype: set</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">op_class</span> <span class="ow">not</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">operations_by_opclass</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Unexpected or unsupported operation class </span><span class="si">{</span><span class="n">op_class</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="p">(</span><span class="n">datatype_comb</span><span class="p">,</span> <span class="n">layout_comb</span><span class="p">)</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">operations_by_opclass</span><span class="p">[</span><span class="n">op_class</span><span class="p">]</span></div>
|
||||
|
||||
<div class="viewcode-block" id="ArchOptions.supporting_opclasses"><a class="viewcode-back" href="../../cutlass.html#cutlass.library_defaults.ArchOptions.supporting_opclasses">[docs]</a> <span class="k">def</span> <span class="nf">supporting_opclasses</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="p">,</span>
|
||||
<span class="n">element_a</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="p">,</span>
|
||||
<span class="n">element_b</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="p">,</span>
|
||||
<span class="n">element_accumulator</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="p">,</span>
|
||||
<span class="n">layout_a</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="p">,</span>
|
||||
<span class="n">layout_b</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="p">,</span>
|
||||
<span class="p">)</span> <span class="o">-></span> <span class="nb">set</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns a set of operation classes that support the provided data type combination</span>
|
||||
|
||||
<span class="sd"> :param element_a: data type of operand A</span>
|
||||
<span class="sd"> :type element_a: cutlass.DataType</span>
|
||||
<span class="sd"> :param element_b: data type of operand B</span>
|
||||
<span class="sd"> :type element_b: cutlass.DataType</span>
|
||||
<span class="sd"> :param element_accumulator: data type of accumulator</span>
|
||||
<span class="sd"> :type element_accumulator: cutlass.DataType</span>
|
||||
<span class="sd"> :param layout_a: layout of operand A</span>
|
||||
<span class="sd"> :type layout_a: cutlass.LayoutType</span>
|
||||
<span class="sd"> :param layout_b: layout of operand B</span>
|
||||
<span class="sd"> :type layout_b: cutlass.LayoutType</span>
|
||||
|
||||
<span class="sd"> :return: set of operation classes that support the provided data type combination</span>
|
||||
<span class="sd"> :rtype: set</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">supporting_op_classes</span> <span class="o">=</span> <span class="nb">set</span><span class="p">()</span>
|
||||
<span class="n">datatype_comb</span> <span class="o">=</span> <span class="p">(</span><span class="n">element_a</span><span class="p">,</span> <span class="n">element_b</span><span class="p">,</span> <span class="n">element_accumulator</span><span class="p">)</span>
|
||||
<span class="n">layout_comb</span> <span class="o">=</span> <span class="p">(</span><span class="n">layout_a</span><span class="p">,</span> <span class="n">layout_b</span><span class="p">)</span>
|
||||
|
||||
<span class="k">for</span> <span class="n">op_class</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">operations_by_opclass</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">opclass_supports_combination</span><span class="p">(</span><span class="n">op_class</span><span class="p">,</span> <span class="n">datatype_comb</span><span class="p">,</span> <span class="n">layout_comb</span><span class="p">):</span>
|
||||
<span class="n">supporting_op_classes</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">op_class</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">supporting_op_classes</span></div>
|
||||
|
||||
<div class="viewcode-block" id="ArchOptions.operations"><a class="viewcode-back" href="../../cutlass.html#cutlass.library_defaults.ArchOptions.operations">[docs]</a> <span class="k">def</span> <span class="nf">operations</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="p">,</span>
|
||||
<span class="n">op_class</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="p">,</span>
|
||||
<span class="n">element_a</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="p">,</span>
|
||||
<span class="n">element_b</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="p">,</span>
|
||||
<span class="n">element_accumulator</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="p">,</span>
|
||||
<span class="n">layout_a</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="p">,</span>
|
||||
<span class="n">layout_b</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="p">,</span>
|
||||
<span class="p">)</span> <span class="o">-></span> <span class="n">KernelsForDataType</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns whether the provided operation class supports the provided data type combination</span>
|
||||
|
||||
<span class="sd"> :param op_class: operation class to consider</span>
|
||||
<span class="sd"> :type op_class: cutlass.OpcodeClass</span>
|
||||
<span class="sd"> :param element_a: data type of operand A</span>
|
||||
<span class="sd"> :type element_a: cutlass.DataType</span>
|
||||
<span class="sd"> :param element_b: data type of operand B</span>
|
||||
<span class="sd"> :type element_b: cutlass.DataType</span>
|
||||
<span class="sd"> :param element_accumulator: data type of accumulator</span>
|
||||
<span class="sd"> :type element_accumulator: cutlass.DataType</span>
|
||||
<span class="sd"> :param layout_a: layout of operand A</span>
|
||||
<span class="sd"> :type layout_a: cutlass.LayoutType</span>
|
||||
<span class="sd"> :param layout_b: layout of operand B</span>
|
||||
<span class="sd"> :type layout_b: cutlass.LayoutType</span>
|
||||
|
||||
<span class="sd"> :return: container of kernels by alignment supported by the provided combination of parameters</span>
|
||||
<span class="sd"> :rtype: KernelsForDataType</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">datatype_comb</span> <span class="o">=</span> <span class="p">(</span><span class="n">element_a</span><span class="p">,</span> <span class="n">element_b</span><span class="p">,</span> <span class="n">element_accumulator</span><span class="p">)</span>
|
||||
<span class="n">layout_comb</span> <span class="o">=</span> <span class="p">(</span><span class="n">layout_a</span><span class="p">,</span> <span class="n">layout_b</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">opclass_supports_combination</span><span class="p">(</span><span class="n">op_class</span><span class="p">,</span> <span class="n">datatype_comb</span><span class="p">,</span> <span class="n">layout_comb</span><span class="p">):</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span>
|
||||
<span class="sa">f</span><span class="s2">"Data type layout combination </span><span class="si">{</span><span class="n">datatype_comb</span><span class="si">}</span><span class="s2">, </span><span class="si">{</span><span class="n">layout_comb</span><span class="si">}</span><span class="s2"> "</span>
|
||||
<span class="sa">f</span><span class="s2">"is not supported by opcode class </span><span class="si">{</span><span class="n">op_class</span><span class="si">}</span><span class="s2"> on CC </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">cc</span><span class="si">}</span><span class="s2">."</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">operations_by_opclass</span><span class="p">[</span><span class="n">op_class</span><span class="p">][(</span><span class="n">datatype_comb</span><span class="p">,</span> <span class="n">layout_comb</span><span class="p">)]</span></div></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="OptionRegistry"><a class="viewcode-back" href="../../cutlass.html#cutlass.library_defaults.OptionRegistry">[docs]</a><span class="k">class</span> <span class="nc">OptionRegistry</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Container of all architecture-specific options</span>
|
||||
|
||||
<span class="sd"> :param target_cc: compute capability of the device on which operations will be run</span>
|
||||
<span class="sd"> :type target_cc: int</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">target_cc</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">registry</span> <span class="o">=</span> <span class="p">{}</span>
|
||||
|
||||
<span class="n">gemm_kinds</span> <span class="o">=</span> <span class="p">[</span><span class="n">cutlass</span><span class="o">.</span><span class="n">GemmKind</span><span class="o">.</span><span class="n">Universal</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">GemmKind</span><span class="o">.</span><span class="n">Universal3x</span><span class="p">]</span>
|
||||
<span class="c1"># Construct options for each CC</span>
|
||||
<span class="k">for</span> <span class="n">kernel_cc</span> <span class="ow">in</span> <span class="n">_generator_ccs</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">registry</span><span class="p">[</span><span class="n">kernel_cc</span><span class="p">]</span> <span class="o">=</span> <span class="n">ArchOptions</span><span class="p">(</span><span class="n">target_cc</span><span class="p">,</span> <span class="n">kernel_cc</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OperationKind</span><span class="o">.</span><span class="n">Gemm</span><span class="p">,</span> <span class="n">gemm_kinds</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="OptionRegistry.options_for_cc"><a class="viewcode-back" href="../../cutlass.html#cutlass.library_defaults.OptionRegistry.options_for_cc">[docs]</a> <span class="k">def</span> <span class="nf">options_for_cc</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">cc</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="n">ArchOptions</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">registry</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">cc</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span></div></div>
|
||||
</pre></div>
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
|
||||
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer no-toc">
|
||||
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
|
||||
<script src="../../_static/doctools.js"></script>
|
||||
<script src="../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../_static/scripts/furo.js"></script>
|
||||
<script src="../../_static/clipboard.min.js"></script>
|
||||
<script src="../../_static/copybutton.js"></script>
|
||||
<script src="../../_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
980
python/docs/_modules/cutlass/op/gemm.html
Normal file
@ -0,0 +1,980 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><link rel="index" title="Index" href="../../../genindex.html" /><link rel="search" title="Search" href="../../../search.html" />
|
||||
<link rel="canonical" href="docs/_modules/cutlass/op/gemm.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>cutlass.op.gemm - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="../../../index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="../../../index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="../../../_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="../../../_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="../../../search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../../../modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="../../../cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../../cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../../cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../../cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../../../examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../../externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../../externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../../externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<h1>Source code for cutlass.op.gemm</h1><div class="highlight"><pre>
|
||||
<span></span><span class="c1">#################################################################################################</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># Copyright (c) 2023 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.</span>
|
||||
<span class="c1"># SPDX-License-Identifier: BSD-3-Clause</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># Redistribution and use in source and binary forms, with or without</span>
|
||||
<span class="c1"># modification, are permitted provided that the following conditions are met:</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 1. Redistributions of source code must retain the above copyright notice, this</span>
|
||||
<span class="c1"># list of conditions and the following disclaimer.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 2. Redistributions in binary form must reproduce the above copyright notice,</span>
|
||||
<span class="c1"># this list of conditions and the following disclaimer in the documentation</span>
|
||||
<span class="c1"># and/or other materials provided with the distribution.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 3. Neither the name of the copyright holder nor the names of its</span>
|
||||
<span class="c1"># contributors may be used to endorse or promote products derived from</span>
|
||||
<span class="c1"># this software without specific prior written permission.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"</span>
|
||||
<span class="c1"># AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE</span>
|
||||
<span class="c1"># IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE</span>
|
||||
<span class="c1"># DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE</span>
|
||||
<span class="c1"># FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL</span>
|
||||
<span class="c1"># DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR</span>
|
||||
<span class="c1"># SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER</span>
|
||||
<span class="c1"># CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,</span>
|
||||
<span class="c1"># OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE</span>
|
||||
<span class="c1"># OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1">#################################################################################################</span>
|
||||
|
||||
<span class="sd">"""</span>
|
||||
<span class="sd"> Ease-of-use interface for constructing, compiling, and running GEMMs.</span>
|
||||
|
||||
<span class="sd"> The ``Gemm`` interface is meant to allow one to easily instantiate, compile, and run</span>
|
||||
<span class="sd"> GEMM operations in CUTLASS via Python, without specifying many configuration parameters.</span>
|
||||
<span class="sd"> Under the hood, the interface will select sensible default parameters for the many template</span>
|
||||
<span class="sd"> parameters for CUTLASS GEMMs.</span>
|
||||
|
||||
<span class="sd"> Note: optimal performance is not to be expected from this interface. To achieve optimal</span>
|
||||
<span class="sd"> performance, one should specify and tune each configuration parameter.</span>
|
||||
|
||||
<span class="sd"> The simplest example of using this interface is the following:</span>
|
||||
|
||||
<span class="sd"> .. highlight:: python</span>
|
||||
<span class="sd"> .. code-block:: python</span>
|
||||
|
||||
<span class="sd"> # A, B, C, and D are torch/numpy/cupy tensor objects</span>
|
||||
<span class="sd"> plan = cutlass.op.Gemm(A, B, C, D)</span>
|
||||
<span class="sd"> plan.run()</span>
|
||||
|
||||
|
||||
<span class="sd"> One can also use the interface by specifying data types of operands at construction</span>
|
||||
<span class="sd"> and using different tensor objects with these data types at runtime:</span>
|
||||
|
||||
<span class="sd"> .. highlight:: python</span>
|
||||
<span class="sd"> .. code-block:: python</span>
|
||||
|
||||
<span class="sd"> # The following is shorthand for:</span>
|
||||
<span class="sd"> # cutlass.op.Gemm(element_A=torch.float32, element_B=torch.float32,</span>
|
||||
<span class="sd"> # element_C=torch.float32, element_D=torch.float32,</span>
|
||||
<span class="sd"> # element_accumulator=torch.float32,</span>
|
||||
<span class="sd"> # layout=cutlass.LayoutType.RowMajor)</span>
|
||||
<span class="sd"> plan = cutlass.op.Gemm(element=torch.float32, layout=cutlass.LayoutType.RowMajor)</span>
|
||||
|
||||
<span class="sd"> A0 = torch.rand((128, 256), device='cuda')</span>
|
||||
<span class="sd"> B0 = torch.rand((256, 64), device='cuda')</span>
|
||||
<span class="sd"> C0 = torch.zeros((128, 64), device='cuda')</span>
|
||||
<span class="sd"> D0 = torch.zeros((128, 64), device.'cuda')</span>
|
||||
<span class="sd"> plan.run(A0, B0, C0, D0)</span>
|
||||
|
||||
<span class="sd"> A = torch.rand((32, 128), device='cuda')</span>
|
||||
<span class="sd"> B = torch.rand((128, 256), device='cuda')</span>
|
||||
<span class="sd"> C = torch.zeros((32, 256), device='cuda')</span>
|
||||
<span class="sd"> D = torch.zeros((32, 256), device.'cuda')</span>
|
||||
<span class="sd"> plan.run(A1, B1, C1, D1)</span>
|
||||
|
||||
<span class="sd"> The interface additionally enables one to decouple the compilation of the underlying CUTLASS</span>
|
||||
<span class="sd"> kernel from its execution:</span>
|
||||
|
||||
<span class="sd"> .. highlight:: python</span>
|
||||
<span class="sd"> .. code-block:: python</span>
|
||||
|
||||
<span class="sd"> plan = cutlass.op.Gemm(element=np.float32, layout=cutlass.LayoutType.RowMajor)</span>
|
||||
<span class="sd"> plan.compile()</span>
|
||||
|
||||
<span class="sd"> # Do other work...</span>
|
||||
|
||||
<span class="sd"> plan.run(A0, B0, C0, D0)</span>
|
||||
|
||||
<span class="sd"> # Do other work...</span>
|
||||
|
||||
<span class="sd"> plan.run(A1, B1, C1, D1)</span>
|
||||
|
||||
<span class="sd"> Elementwise activation functions are easily fused to the GEMM via the interface:</span>
|
||||
|
||||
<span class="sd"> .. highlight:: python</span>
|
||||
<span class="sd"> .. code-block:: python</span>
|
||||
|
||||
<span class="sd"> plan = cutlass.op.Gemm(element=np.float32, layout=cutlass.LayoutType.RowMajor)</span>
|
||||
<span class="sd"> plan.activation = cutlass.epilogue.relu</span>
|
||||
|
||||
<span class="sd"> Operations can also be run asynchronously:</span>
|
||||
|
||||
<span class="sd"> .. highlight:: python</span>
|
||||
<span class="sd"> .. code-block:: python</span>
|
||||
|
||||
<span class="sd"> plan = cutlass.op.Gemm(element=np.float32, layout=cutlass.LayoutType.RowMajor)</span>
|
||||
<span class="sd"> args = plan.run()</span>
|
||||
|
||||
<span class="sd"> # Do other work...</span>
|
||||
|
||||
<span class="sd"> args.sync()</span>
|
||||
<span class="sd">"""</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">cutlass_bindings</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">cutlass</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass</span> <span class="kn">import</span> <span class="n">epilogue</span><span class="p">,</span> <span class="n">swizzle</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.backend</span> <span class="kn">import</span> <span class="n">compiler</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.backend.gemm_operation</span> <span class="kn">import</span> <span class="n">GemmArguments</span><span class="p">,</span> <span class="n">GemmOperationUniversal</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.backend.library</span> <span class="kn">import</span> <span class="n">TensorDescription</span><span class="p">,</span> <span class="n">TileDescription</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.op.op</span> <span class="kn">import</span> <span class="n">OperationBase</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.utils</span> <span class="kn">import</span> <span class="n">check</span><span class="p">,</span> <span class="n">datatypes</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="Gemm"><a class="viewcode-back" href="../../../cutlass.op.html#cutlass.op.gemm.Gemm">[docs]</a><span class="k">class</span> <span class="nc">Gemm</span><span class="p">(</span><span class="n">OperationBase</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Constructs a ``Gemm`` object.</span>
|
||||
|
||||
<span class="sd"> The data types and layouts of operands A, B, and C, along with the data type of output D</span>
|
||||
<span class="sd"> and that used for accumulation, are bound to the ``Gemm`` object throughout its lifetime --</span>
|
||||
<span class="sd"> these are not to be changed after a ``Gemm`` has been constructed.</span>
|
||||
|
||||
<span class="sd"> The constructor has optional parameters for flexibly setting these parameters. The following</span>
|
||||
<span class="sd"> constructors are equivalent:</span>
|
||||
|
||||
<span class="sd"> .. highlight:: python</span>
|
||||
<span class="sd"> .. code-block:: python</span>
|
||||
|
||||
<span class="sd"> # Use F32 for A, B, C, D, and accumulation. All operands are row major.</span>
|
||||
|
||||
<span class="sd"> # Use the generic ``element`` and ``layout`` parameters to concisely set all data types and layouts</span>
|
||||
<span class="sd"> # for operands to the same values.</span>
|
||||
<span class="sd"> Gemm(element=cutlass.DataType.f32, layout=cutlass.LayoutType.RowMajor)</span>
|
||||
|
||||
<span class="sd"> # Explicitly specify the data types to use for A, B, C, and D. Use the generic ``layout``.</span>
|
||||
<span class="sd"> Gemm(element_A=cutlass.DataType.f32, element_B=cutlass.DataType.f32, element_C=cutlass.DataType.f32,</span>
|
||||
<span class="sd"> element_D=cutlass.DataType.f32, layout=cutlass.LayoutType.RowMajor)</span>
|
||||
|
||||
<span class="sd"> # Set the data types and elements from existing tensors. Note that one can use different tensors when</span>
|
||||
<span class="sd"> # executing GEMM via the ``run()`` method than passed in here (though those passed in to ``run()`` must</span>
|
||||
<span class="sd"> # have the same data type and layout as those passed in here).</span>
|
||||
<span class="sd"> # A, B, C, and D are row-major torch.Tensor objects of type torch.float32</span>
|
||||
<span class="sd"> Gemm(A=A, B=B, C=C, D=D)</span>
|
||||
|
||||
<span class="sd"> # Use the generic ``element`` and explicitly specify the layouts to use for A, B, and C (layout of D is</span>
|
||||
<span class="sd"> # the same as that for D, at present)</span>
|
||||
<span class="sd"> Gemm(element=cutlass.DataType.f32, layout_A=cutlass.LayoutType.RowMajor,</span>
|
||||
<span class="sd"> layout_B=cutlass.LayoutType.RowMajor, layout_C=cutlass.LayoutType.RowMajor)</span>
|
||||
|
||||
<span class="sd"> # Explicitly specify the data type and layout for only some of A, B, C, and D. Unspecified data types</span>
|
||||
<span class="sd"> # and layouts will inherit those passed in via the generic ``element`` and ``layout``</span>
|
||||
<span class="sd"> Gemm(element_A=cutlass.DataType.f32, layout_B=cutlass.LayoutType.RowMajor,</span>
|
||||
<span class="sd"> element=cutlass.DataType.f32, layout=cutlass.LayoutType.RowMajor)</span>
|
||||
|
||||
<span class="sd"> The order of precedence for the setting of the data type and layout for a given operand/output is as follows:</span>
|
||||
<span class="sd"> 1) If the tensor type is specified (e.g., ``A``), use the data type and layout inferred from this tensor</span>
|
||||
<span class="sd"> 2) Otherwise, if the data type/layout (e.g., ``element_A``, ``layout_A``) is specified, use those</span>
|
||||
<span class="sd"> 3) Otherwise, use the generic values (e.g., ``element``, ``layout``)</span>
|
||||
|
||||
<span class="sd"> :param cc: compute capability of device for which kernels should be compiled. For example, if running on H100, this should be set to 90</span>
|
||||
<span class="sd"> :type cc: int</span>
|
||||
<span class="sd"> :param kernel_cc: compute capability of kernels to generate. For example, if running on SM90, but desiring to use a CUTLASS 2.x-style Ampere kernel, this should be set to 80</span>
|
||||
<span class="sd"> :type kernel_cc: int</span>
|
||||
<span class="sd"> :param A: tensor representing data type and layout of operand A</span>
|
||||
<span class="sd"> :param B: tensor representing data type and layout of operand B</span>
|
||||
<span class="sd"> :param C: tensor representing data type and layout of operand C</span>
|
||||
<span class="sd"> :param D: tensor representing data type and layout of operand D</span>
|
||||
<span class="sd"> :param alpha: scalar paramter alpha from GEMM computation that scales the product of operands A and B</span>
|
||||
<span class="sd"> :param beta: scalar parameter beta from GEMM operation that scales operand C</span>
|
||||
<span class="sd"> :param element_accumulator: data type to be used in accumulation of the product of operands A and B</span>
|
||||
<span class="sd"> :type element_accumulator: cutlass.DataType</span>
|
||||
<span class="sd"> :param element: generic data type to be used for operands A, B, C, D, as well as the accumulation data type</span>
|
||||
<span class="sd"> :type element: cutlass.DataType</span>
|
||||
<span class="sd"> :param layout: generic layout type to be used for operands A, B, C, and D</span>
|
||||
<span class="sd"> :type layout: cutlass.LayoutType</span>
|
||||
<span class="sd"> :param element_A: data type to be used for operand A</span>
|
||||
<span class="sd"> :type element_A: cutlass.DataType</span>
|
||||
<span class="sd"> :param element_B: data type to be used for operand B</span>
|
||||
<span class="sd"> :type element_B: cutlass.DataType</span>
|
||||
<span class="sd"> :param element_C: data type to be used for operand C</span>
|
||||
<span class="sd"> :type element_C: cutlass.DataType</span>
|
||||
<span class="sd"> :param element_D: data type to be used for operand D</span>
|
||||
<span class="sd"> :type element_D: cutlass.DataType</span>
|
||||
<span class="sd"> :type layout_A: layout of operand A</span>
|
||||
<span class="sd"> :param layout_A: cutlass.LayoutType</span>
|
||||
<span class="sd"> :type layout_B: layout of operand B</span>
|
||||
<span class="sd"> :param layout_B: cutlass.LayoutType</span>
|
||||
<span class="sd"> :type layout_C: layout of operand C</span>
|
||||
<span class="sd"> :param layout_C: cutlass.LayoutType</span>
|
||||
<span class="sd"> :type layout_D: layout of operand D</span>
|
||||
<span class="sd"> :param layout_D: cutlass.LayoutType</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="p">,</span> <span class="n">A</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">B</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">C</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">D</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">alpha</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span> <span class="n">beta</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">element_accumulator</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">element</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">element_A</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">element_B</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">element_C</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">element_D</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">layout_A</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">layout_B</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">layout_C</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">cc</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="n">kernel_cc</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
<span class="p">):</span>
|
||||
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span><span class="n">cc</span><span class="o">=</span><span class="n">cc</span><span class="p">,</span> <span class="n">kernel_cc</span><span class="o">=</span><span class="n">kernel_cc</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="s2">"gemm"</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">compiled</span> <span class="o">=</span> <span class="kc">False</span>
|
||||
|
||||
<span class="n">elements</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="n">layouts</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
|
||||
<span class="c1"># Check that at least one of the following is set for each tensor (illustrated assuming tensor A):</span>
|
||||
<span class="c1"># ``A``, ``element_A``, ``element`` and ``A``, ``layout_A``, ``layout``</span>
|
||||
<span class="k">for</span> <span class="n">elt</span><span class="p">,</span> <span class="n">lay</span><span class="p">,</span> <span class="n">tens</span><span class="p">,</span> <span class="n">name</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">([</span><span class="n">element_A</span><span class="p">,</span> <span class="n">element_B</span><span class="p">,</span> <span class="n">element_C</span><span class="p">,</span> <span class="n">element_D</span><span class="p">],</span>
|
||||
<span class="p">[</span><span class="n">layout_A</span><span class="p">,</span> <span class="n">layout_B</span><span class="p">,</span> <span class="n">layout_C</span><span class="p">,</span> <span class="n">layout_C</span><span class="p">],</span>
|
||||
<span class="p">[</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">],</span>
|
||||
<span class="p">[</span><span class="s2">"A"</span><span class="p">,</span> <span class="s2">"B"</span><span class="p">,</span> <span class="s2">"C"</span><span class="p">,</span> <span class="s2">"D"</span><span class="p">]):</span>
|
||||
<span class="k">if</span> <span class="n">elt</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">tens</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Must not specify both element_</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s1"> and tensor </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">lay</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">tens</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Must not specify both layout_</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s1"> and tensor </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">elt</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">tens</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">element</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Must specify one of element_</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s1">, tensor </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s1">, or generic element.'</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">lay</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">tens</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">and</span> <span class="n">layout</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Must specify one of layout_</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s1">, tensor </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s1">, or generic layout.'</span><span class="p">)</span>
|
||||
|
||||
<span class="n">elt_to_set</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
<span class="n">lay_to_set</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
<span class="k">if</span> <span class="n">tens</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">elt_to_set</span><span class="p">,</span> <span class="n">lay_to_set</span> <span class="o">=</span> <span class="n">datatypes</span><span class="o">.</span><span class="n">get_datatype_and_layout</span><span class="p">(</span><span class="n">tens</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">elt_to_set</span> <span class="o">=</span> <span class="n">elt</span> <span class="k">if</span> <span class="n">elt</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="k">else</span> <span class="n">element</span>
|
||||
<span class="n">lay_to_set</span> <span class="o">=</span> <span class="n">lay</span> <span class="k">if</span> <span class="n">lay</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="k">else</span> <span class="n">layout</span>
|
||||
|
||||
<span class="n">elements</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">datatypes</span><span class="o">.</span><span class="n">library_type</span><span class="p">(</span><span class="n">elt_to_set</span><span class="p">))</span>
|
||||
<span class="n">layouts</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">datatypes</span><span class="o">.</span><span class="n">library_layout</span><span class="p">(</span><span class="n">lay_to_set</span><span class="p">))</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_element_a</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_b</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_c</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_d</span> <span class="o">=</span> <span class="n">elements</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_layout_a</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_b</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_c</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_d</span> <span class="o">=</span> <span class="n">layouts</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">element_accumulator</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_element_accumulator</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_c</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_element_accumulator</span> <span class="o">=</span> <span class="n">datatypes</span><span class="o">.</span><span class="n">library_type</span><span class="p">(</span><span class="n">element_accumulator</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">A</span> <span class="o">=</span> <span class="n">A</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">B</span> <span class="o">=</span> <span class="n">B</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">C</span> <span class="o">=</span> <span class="n">C</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">D</span> <span class="o">=</span> <span class="n">D</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">alpha</span> <span class="o">=</span> <span class="n">alpha</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">beta</span> <span class="o">=</span> <span class="n">beta</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">epilogue_functor</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">op_class</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_reset_operations</span><span class="p">()</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_swizzling_functor</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">swizzle</span><span class="o">.</span><span class="n">IdentitySwizzle1</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_reset_operations</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">reset_epilogue</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">True</span><span class="p">):</span>
|
||||
<span class="c1"># Set the default op class</span>
|
||||
<span class="n">datatype_comb</span> <span class="o">=</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_a</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_b</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_accumulator</span><span class="p">)</span>
|
||||
<span class="n">layout_comb</span> <span class="o">=</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_layout_a</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_b</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">possible_op_classes</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">supporting_opclasses</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_element_a</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_b</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_accumulator</span><span class="p">,</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_layout_a</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_b</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">TensorOp</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">possible_op_classes</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">opclass</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">TensorOp</span>
|
||||
<span class="k">elif</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">Simt</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">possible_op_classes</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">opclass</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">Simt</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s1">'No kernel configuration found for supported data type and layout '</span>
|
||||
<span class="sa">f</span><span class="s1">'combination </span><span class="si">{</span><span class="n">datatype_comb</span><span class="si">}</span><span class="s1">x</span><span class="si">{</span><span class="n">layout_comb</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">reset_epilogue</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_reset_epilogue_functor_activation</span><span class="p">(</span><span class="n">epilogue</span><span class="o">.</span><span class="n">identity</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_reset_epilogue_functor_activation</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">activation</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">epilogue_functor</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">op_class</span> <span class="o">==</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">Simt</span><span class="p">:</span>
|
||||
<span class="n">elements_per_access</span> <span class="o">=</span> <span class="mi">1</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">elements_per_access</span> <span class="o">=</span> <span class="mi">128</span> <span class="o">//</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataTypeSize</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_c</span><span class="p">]</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">elements_per_access</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">epilogue_functor</span><span class="o">.</span><span class="n">epilogue_vector_length</span>
|
||||
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="bp">self</span><span class="o">.</span><span class="n">specified_kernel_cc</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_cc</span> <span class="o">==</span> <span class="mi">90</span> <span class="ow">and</span> <span class="n">activation</span> <span class="o">!=</span> <span class="n">epilogue</span><span class="o">.</span><span class="n">identity</span><span class="p">:</span>
|
||||
<span class="c1"># CUTLASS 3.0 kernels currently only support identity activation. If one requests a non-identity activation,</span>
|
||||
<span class="c1"># revert to using a CUTLASS 2.x kernel by using SM80-tagged kernels.</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">logger</span><span class="o">.</span><span class="n">warning</span><span class="p">(</span><span class="s2">"Reverting to using SM80-tagged kernel. Opclass may change."</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_reset_options</span><span class="p">(</span><span class="mi">80</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_reset_operations</span><span class="p">(</span><span class="n">reset_epilogue</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">cc</span> <span class="o">==</span> <span class="mi">90</span> <span class="ow">and</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_cc</span> <span class="o">!=</span> <span class="mi">90</span> <span class="ow">and</span> <span class="n">activation</span> <span class="o">==</span> <span class="n">epilogue</span><span class="o">.</span><span class="n">identity</span><span class="p">):</span>
|
||||
<span class="c1"># SM80 fallback kernels are currently used. Since an identity activation is requested,</span>
|
||||
<span class="c1"># we can switch back to using SM90 kernels.</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_reset_options</span><span class="p">(</span><span class="mi">90</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_reset_operations</span><span class="p">(</span><span class="n">reset_epilogue</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_cc</span> <span class="o">==</span> <span class="mi">90</span> <span class="ow">and</span> <span class="n">activation</span> <span class="o">!=</span> <span class="n">epilogue</span><span class="o">.</span><span class="n">identity</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s2">"Epilogues with elementwise fusion are not currently supported "</span>
|
||||
<span class="s2">"in the Python interface for 3.x kernels. To use 2.x kernels "</span>
|
||||
<span class="s2">"with fused elementwise epilogues, do not set the `kernel_cc` "</span>
|
||||
<span class="s2">"parameter when constructing the Gemm object."</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">epilogue_functor</span> <span class="o">=</span> <span class="n">epilogue</span><span class="o">.</span><span class="n">get_activation_epilogue</span><span class="p">(</span>
|
||||
<span class="n">activation</span><span class="p">,</span>
|
||||
<span class="n">datatypes</span><span class="o">.</span><span class="n">binding_type</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_c</span><span class="p">),</span>
|
||||
<span class="n">elements_per_access</span><span class="p">,</span>
|
||||
<span class="n">datatypes</span><span class="o">.</span><span class="n">binding_type</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_accumulator</span><span class="p">),</span>
|
||||
<span class="n">datatypes</span><span class="o">.</span><span class="n">binding_type</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_accumulator</span><span class="p">),</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_reset_epilogue_functor_alignment</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">alignment</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">epilogue_functor</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="ow">not</span> <span class="nb">hasattr</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">epilogue_functor</span><span class="p">,</span> <span class="s1">'activation_functor'</span><span class="p">):</span>
|
||||
<span class="n">activation</span> <span class="o">=</span> <span class="n">epilogue</span><span class="o">.</span><span class="n">identity</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">activation</span> <span class="o">=</span> <span class="nb">type</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">epilogue_functor</span><span class="o">.</span><span class="n">activation_functor</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">epilogue_functor</span> <span class="o">=</span> <span class="n">epilogue</span><span class="o">.</span><span class="n">get_activation_epilogue</span><span class="p">(</span>
|
||||
<span class="n">activation</span><span class="p">,</span>
|
||||
<span class="n">datatypes</span><span class="o">.</span><span class="n">binding_type</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_c</span><span class="p">),</span>
|
||||
<span class="n">alignment</span><span class="p">,</span>
|
||||
<span class="n">datatypes</span><span class="o">.</span><span class="n">binding_type</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_accumulator</span><span class="p">),</span>
|
||||
<span class="n">datatypes</span><span class="o">.</span><span class="n">binding_type</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_accumulator</span><span class="p">),</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">activation</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the type of the current activation function used</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="nb">type</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">epilogue_functor</span><span class="o">.</span><span class="n">activation_functor</span><span class="p">)</span>
|
||||
|
||||
<span class="nd">@activation</span><span class="o">.</span><span class="n">setter</span>
|
||||
<span class="k">def</span> <span class="nf">activation</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">act</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Sets the type of the activation function to use</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_reset_epilogue_functor_activation</span><span class="p">(</span><span class="n">act</span><span class="p">)</span>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">opclass</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-></span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the opcode class currently in use by the GEMM</span>
|
||||
|
||||
<span class="sd"> :return: opcode class currently in use</span>
|
||||
<span class="sd"> :rtype: cutlass.OpcodeClass</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">op_class</span>
|
||||
|
||||
<span class="nd">@opclass</span><span class="o">.</span><span class="n">setter</span>
|
||||
<span class="k">def</span> <span class="nf">opclass</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">oc</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Sets the opcode class to use in the GEMM. If the opcode class is not supported under</span>
|
||||
<span class="sd"> the given compute capability and element/layout combinations of the GEMM, an exception is raised.</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">oc</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">possible_op_classes</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">op_class</span> <span class="o">=</span> <span class="n">oc</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span>
|
||||
<span class="sa">f</span><span class="s1">'Unsupported operation class </span><span class="si">{</span><span class="n">oc</span><span class="si">}</span><span class="s1"> for CC </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">cc</span><span class="si">}</span><span class="s1"> and data type combination '</span>
|
||||
<span class="sa">f</span><span class="s1">'(</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_a</span><span class="si">}</span><span class="s1">, </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_b</span><span class="si">}</span><span class="s1">, </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_accumulator</span><span class="si">}</span><span class="s1">) and '</span>
|
||||
<span class="sa">f</span><span class="s1">'layout combination (</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">_layout_a</span><span class="si">}</span><span class="s1">, </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">_layout_b</span><span class="si">}</span><span class="s1">).'</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># Changing the op class changes the elements per access in the epilogue. Reset this.</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">op_class</span> <span class="o">==</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">Simt</span><span class="p">:</span>
|
||||
<span class="n">elements_per_access</span> <span class="o">=</span> <span class="mi">1</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">elements_per_access</span> <span class="o">=</span> <span class="mi">128</span> <span class="o">//</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataTypeSize</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_c</span><span class="p">]</span>
|
||||
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">epilogue_functor</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_reset_epilogue_functor_alignment</span><span class="p">(</span><span class="n">elements_per_access</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># Changing the op class also changes the possible operations available. Reset these.</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">possible_operations</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">options</span><span class="o">.</span><span class="n">operations</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">op_class</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_a</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_b</span><span class="p">,</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_element_accumulator</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_a</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_b</span><span class="p">)</span>
|
||||
|
||||
<span class="nd">@property</span>
|
||||
<span class="k">def</span> <span class="nf">swizzling_functor</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the type of the swizzling functor currently being used by the GEMM</span>
|
||||
|
||||
<span class="sd"> :return: swizzing functor type</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">_swizzling_functor</span>
|
||||
|
||||
<span class="nd">@swizzling_functor</span><span class="o">.</span><span class="n">setter</span>
|
||||
<span class="k">def</span> <span class="nf">swizzling_functor</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">swizzling_functor</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Sets the swizzling functor to the type specified by `swizzling_functor`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">swizzling_functor</span> <span class="o">==</span> <span class="n">swizzle</span><span class="o">.</span><span class="n">ThreadblockSwizzleStreamK</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">op_class</span> <span class="o">==</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">Simt</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'ThreadblockSwizzleStreamK is currently only supported with opcode class TensorOp'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_cc</span> <span class="o">==</span> <span class="mi">90</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'ThreadblockSwizzleStreamK is currently unsupported on SM90'</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_swizzling_functor</span> <span class="o">=</span> <span class="n">swizzling_functor</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_valid_tile_description</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">td</span><span class="p">:</span> <span class="n">TileDescription</span><span class="p">)</span> <span class="o">-></span> <span class="nb">tuple</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Checks whether the provided tile description is valid for the given compute capability. At present,</span>
|
||||
<span class="sd"> this checks the following:</span>
|
||||
|
||||
<span class="sd"> - Does the tile description use a number of stages supported by the compute capability in question?</span>
|
||||
<span class="sd"> - Does the tile size requested fit within shared memory?</span>
|
||||
<span class="sd"> - Are cluster dimensions outside the valid range requested for a given architecture (e.g.,</span>
|
||||
<span class="sd"> more non-unit cluster dimensions for pre-SM90 architectures)?</span>
|
||||
<span class="sd"> - Is the kernel schedule being used supported on the architecture in question?</span>
|
||||
|
||||
<span class="sd"> :param td: tile description to validate</span>
|
||||
<span class="sd"> :type td: cutlass.backend.TileDescription</span>
|
||||
<span class="sd"> :return: tuple in which the first element is a bool indicating that the tile description is valid</span>
|
||||
<span class="sd"> and the second element is a string providing an optional error message.</span>
|
||||
<span class="sd"> :rtype: tuple</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="c1"># Check stage count based on the CC to which we are compiling (self.cc), rather</span>
|
||||
<span class="c1"># than the CC from which we find kernels (self.current_cc)</span>
|
||||
<span class="n">valid</span><span class="p">,</span> <span class="n">msg</span> <span class="o">=</span> <span class="n">check</span><span class="o">.</span><span class="n">valid_stage_count</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">cc</span><span class="p">,</span> <span class="n">td</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="n">valid</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="n">valid</span><span class="p">,</span> <span class="n">msg</span><span class="p">)</span>
|
||||
|
||||
<span class="n">valid</span><span class="p">,</span> <span class="n">msg</span> <span class="o">=</span> <span class="n">check</span><span class="o">.</span><span class="n">valid_cluster_shape</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">current_cc</span><span class="p">,</span> <span class="n">td</span><span class="o">.</span><span class="n">cluster_shape</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="n">valid</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="n">valid</span><span class="p">,</span> <span class="n">msg</span><span class="p">)</span>
|
||||
|
||||
<span class="n">valid</span><span class="p">,</span> <span class="n">msg</span> <span class="o">=</span> <span class="n">check</span><span class="o">.</span><span class="n">valid_kernel_schedule</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">current_cc</span><span class="p">,</span> <span class="n">td</span><span class="o">.</span><span class="n">kernel_schedule</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">valid</span><span class="p">,</span> <span class="n">msg</span>
|
||||
|
||||
<div class="viewcode-block" id="Gemm.tile_descriptions"><a class="viewcode-back" href="../../../cutlass.op.html#cutlass.op.gemm.Gemm.tile_descriptions">[docs]</a> <span class="k">def</span> <span class="nf">tile_descriptions</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-></span> <span class="nb">list</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns a list of valid tile descriptions for the operations</span>
|
||||
|
||||
<span class="sd"> :returns: list of valid tile descriptions for the operations</span>
|
||||
<span class="sd"> :rtype: list</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="p">[</span><span class="n">datatypes</span><span class="o">.</span><span class="n">td_from_profiler_op</span><span class="p">(</span><span class="n">op</span><span class="p">)</span> <span class="k">for</span> <span class="n">op</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">possible_operations</span><span class="o">.</span><span class="n">all_operations</span><span class="p">]</span></div>
|
||||
|
||||
<div class="viewcode-block" id="Gemm.construct"><a class="viewcode-back" href="../../../cutlass.op.html#cutlass.op.gemm.Gemm.construct">[docs]</a> <span class="k">def</span> <span class="nf">construct</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="p">,</span> <span class="n">tile_description</span><span class="p">:</span> <span class="n">TileDescription</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">alignment_A</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="n">alignment_B</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="n">alignment_C</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="kc">None</span><span class="p">)</span> <span class="o">-></span> <span class="n">GemmOperationUniversal</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Constructs a ``cutlass.backend.GemmUniversalOperation`` based on the input parameters and current</span>
|
||||
<span class="sd"> kernel specification of the ``Gemm`` object.</span>
|
||||
|
||||
<span class="sd"> :param tile_description: tile description specifying shapes and operand types to use in the kernel</span>
|
||||
<span class="sd"> :type tile_description: cutlass.backend.TileDescription</span>
|
||||
<span class="sd"> :param alignment_A: alignment of operand A</span>
|
||||
<span class="sd"> :type alignment_A: int</span>
|
||||
<span class="sd"> :param alignment_B: alignment of operand B</span>
|
||||
<span class="sd"> :type alignment_B: int</span>
|
||||
<span class="sd"> :param alignment_C: alignment of operand C</span>
|
||||
<span class="sd"> :type alignment_C: int</span>
|
||||
|
||||
<span class="sd"> :return: operation that was constructed</span>
|
||||
<span class="sd"> :rtype: cutlass.backend.GemmOperationUniversal</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">alignment_pref_A</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="mi">128</span> <span class="o">//</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataTypeSize</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_a</span><span class="p">],</span> <span class="nb">max</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">possible_operations</span><span class="o">.</span><span class="n">alignments</span><span class="p">))</span>
|
||||
<span class="n">alignment_pref_B</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="mi">128</span> <span class="o">//</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataTypeSize</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_b</span><span class="p">],</span> <span class="nb">max</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">possible_operations</span><span class="o">.</span><span class="n">alignments</span><span class="p">))</span>
|
||||
<span class="n">alignment_pref_C</span> <span class="o">=</span> <span class="nb">min</span><span class="p">(</span><span class="mi">128</span> <span class="o">//</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataTypeSize</span><span class="p">[</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_c</span><span class="p">],</span> <span class="nb">max</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">possible_operations</span><span class="o">.</span><span class="n">alignments</span><span class="p">))</span>
|
||||
<span class="n">alignment_A</span> <span class="o">=</span> <span class="n">check</span><span class="o">.</span><span class="n">alignment_or_default</span><span class="p">(</span><span class="n">alignment_A</span><span class="p">,</span> <span class="n">alignment_pref_A</span><span class="p">)</span>
|
||||
<span class="n">alignment_B</span> <span class="o">=</span> <span class="n">check</span><span class="o">.</span><span class="n">alignment_or_default</span><span class="p">(</span><span class="n">alignment_B</span><span class="p">,</span> <span class="n">alignment_pref_B</span><span class="p">)</span>
|
||||
<span class="n">alignment_C</span> <span class="o">=</span> <span class="n">check</span><span class="o">.</span><span class="n">alignment_or_default</span><span class="p">(</span><span class="n">alignment_C</span><span class="p">,</span> <span class="n">alignment_pref_C</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_reset_epilogue_functor_alignment</span><span class="p">(</span><span class="n">alignment_C</span><span class="p">)</span>
|
||||
|
||||
<span class="n">tensor_A</span> <span class="o">=</span> <span class="n">TensorDescription</span><span class="p">(</span>
|
||||
<span class="n">datatypes</span><span class="o">.</span><span class="n">binding_type</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_a</span><span class="p">),</span>
|
||||
<span class="n">datatypes</span><span class="o">.</span><span class="n">binding_layout</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_layout_a</span><span class="p">),</span>
|
||||
<span class="n">alignment_A</span>
|
||||
<span class="p">)</span>
|
||||
<span class="n">tensor_B</span> <span class="o">=</span> <span class="n">TensorDescription</span><span class="p">(</span>
|
||||
<span class="n">datatypes</span><span class="o">.</span><span class="n">binding_type</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_b</span><span class="p">),</span>
|
||||
<span class="n">datatypes</span><span class="o">.</span><span class="n">binding_layout</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_layout_b</span><span class="p">),</span>
|
||||
<span class="n">alignment_B</span>
|
||||
<span class="p">)</span>
|
||||
<span class="n">tensor_C</span> <span class="o">=</span> <span class="n">TensorDescription</span><span class="p">(</span>
|
||||
<span class="n">datatypes</span><span class="o">.</span><span class="n">binding_type</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_c</span><span class="p">),</span>
|
||||
<span class="n">datatypes</span><span class="o">.</span><span class="n">binding_layout</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_layout_c</span><span class="p">),</span>
|
||||
<span class="n">alignment_C</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">tile_description</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">op</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">possible_operations</span><span class="o">.</span><span class="n">operations</span><span class="p">(</span><span class="n">alignment_A</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
|
||||
<span class="n">tile_description</span> <span class="o">=</span> <span class="n">datatypes</span><span class="o">.</span><span class="n">td_from_profiler_op</span><span class="p">(</span><span class="n">op</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">valid</span><span class="p">,</span> <span class="n">err_str</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_valid_tile_description</span><span class="p">(</span><span class="n">tile_description</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="n">valid</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Invalid tile description. </span><span class="si">{</span><span class="n">err_str</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">tile_description</span> <span class="o">=</span> <span class="n">tile_description</span>
|
||||
|
||||
<span class="n">operation</span> <span class="o">=</span> <span class="n">GemmOperationUniversal</span><span class="p">(</span>
|
||||
<span class="n">arch</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">current_cc</span><span class="p">,</span>
|
||||
<span class="n">tile_description</span><span class="o">=</span><span class="n">tile_description</span><span class="p">,</span>
|
||||
<span class="n">A</span><span class="o">=</span><span class="n">tensor_A</span><span class="p">,</span> <span class="n">B</span><span class="o">=</span><span class="n">tensor_B</span><span class="p">,</span> <span class="n">C</span><span class="o">=</span><span class="n">tensor_C</span><span class="p">,</span>
|
||||
<span class="n">epilogue_functor</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">epilogue_functor</span><span class="p">,</span>
|
||||
<span class="n">swizzling_functor</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">_swizzling_functor</span><span class="p">,</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">operation</span></div>
|
||||
|
||||
<div class="viewcode-block" id="Gemm.compile"><a class="viewcode-back" href="../../../cutlass.op.html#cutlass.op.gemm.Gemm.compile">[docs]</a> <span class="k">def</span> <span class="nf">compile</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tile_description</span><span class="p">:</span> <span class="n">TileDescription</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">alignment_A</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="n">alignment_B</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="n">alignment_C</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">print_module</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">)</span> <span class="o">-></span> <span class="n">cutlass</span><span class="o">.</span><span class="n">backend</span><span class="o">.</span><span class="n">GemmOperationUniversal</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Emits and compiles the kernel currently specified. If ``tile_description`` and any</span>
|
||||
<span class="sd"> of the ``alignment`` parameters are set, the kernel will be chosen using this</span>
|
||||
<span class="sd"> tile description and alignments. Otherwise, a default tile description and alignment</span>
|
||||
<span class="sd"> will be used.</span>
|
||||
|
||||
<span class="sd"> :param tile_description: tile description specifying shapes and operand types to use in the kernel</span>
|
||||
<span class="sd"> :type tile_description: cutlass.backend.TileDescription</span>
|
||||
<span class="sd"> :param alignment_A: alignment of operand A</span>
|
||||
<span class="sd"> :type alignment_A: int</span>
|
||||
<span class="sd"> :param alignment_B: alignment of operand B</span>
|
||||
<span class="sd"> :type alignment_B: int</span>
|
||||
<span class="sd"> :param alignment_C: alignment of operand C</span>
|
||||
<span class="sd"> :type alignment_C: int</span>
|
||||
<span class="sd"> :param print_module: whether to print the emitted C++ code</span>
|
||||
<span class="sd"> :type print_module: bool</span>
|
||||
|
||||
<span class="sd"> :return: operation that was compiled</span>
|
||||
<span class="sd"> :rtype: cutlass.backend.GemmOperationUniversal</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">operation</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">construct</span><span class="p">(</span><span class="n">tile_description</span><span class="p">,</span> <span class="n">alignment_A</span><span class="p">,</span> <span class="n">alignment_B</span><span class="p">,</span> <span class="n">alignment_C</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">print_module</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">operation</span><span class="o">.</span><span class="n">rt_module</span><span class="o">.</span><span class="n">emit</span><span class="p">())</span>
|
||||
|
||||
<span class="n">compiler</span><span class="o">.</span><span class="n">add_module</span><span class="p">([</span><span class="bp">self</span><span class="o">.</span><span class="n">operation</span><span class="p">,])</span>
|
||||
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">operation</span></div>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_verify_type_and_layout</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tensor</span><span class="p">,</span> <span class="n">ref_type</span><span class="p">,</span> <span class="n">ref_layout</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Verifies that ``tensor`` has data type ``ref_type`` and layout ``ref_layout``. An exception</span>
|
||||
<span class="sd"> is raised if it does not.</span>
|
||||
|
||||
<span class="sd"> :param tensor: object representing a tensor passed in to verify, or ``None`` if no tensor was passed in</span>
|
||||
<span class="sd"> :type tensor: numpy/cupy/torch array/tensor object</span>
|
||||
<span class="sd"> :param ref_dtype: data type for the tensor that this object was initialized to</span>
|
||||
<span class="sd"> :param ref_layout: layout for the tensor that this object was initialized to</span>
|
||||
<span class="sd"> :param name: identifier of the tensor to verify. Used in raising exceptions</span>
|
||||
<span class="sd"> :type name: str</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">dtype</span><span class="p">,</span> <span class="n">layout</span> <span class="o">=</span> <span class="n">datatypes</span><span class="o">.</span><span class="n">get_datatype_and_layout</span><span class="p">(</span><span class="n">tensor</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">dtype</span> <span class="o">!=</span> <span class="n">ref_type</span> <span class="ow">or</span> <span class="n">layout</span> <span class="o">!=</span> <span class="n">ref_layout</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Tensor </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s1"> with type and layout (</span><span class="si">{</span><span class="n">dtype</span><span class="si">}</span><span class="s1">, </span><span class="si">{</span><span class="n">layout</span><span class="si">}</span><span class="s1">) '</span>
|
||||
<span class="sa">f</span><span class="s1">'does not match the expected type and '</span>
|
||||
<span class="sa">f</span><span class="s1">'layout of (</span><span class="si">{</span><span class="n">ref_type</span><span class="si">}</span><span class="s1">, </span><span class="si">{</span><span class="n">ref_layout</span><span class="si">}</span><span class="s1">).'</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_verify_tensor</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tensor</span><span class="p">,</span> <span class="n">ref_tensor</span><span class="p">,</span> <span class="n">ref_dtype</span><span class="p">,</span> <span class="n">ref_layout</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Verifies the following properties:</span>
|
||||
<span class="sd"> 1) Either ``tensor`` or ``ref_tensor`` must be set (i.e., not ``None``)</span>
|
||||
<span class="sd"> 2) If ``tensor`` is not ``None``, its datatype and layout must match matches the current versions</span>
|
||||
<span class="sd"> set by the plan (i.e., those in ``ref_dtype`` and ``ref_layout``)</span>
|
||||
|
||||
<span class="sd"> If either of these properties does not hold, an exception is raised. If these properties hold and</span>
|
||||
<span class="sd"> ``tensor`` is not ``None``, ``tensor`` is returned. Otherwise, ``ref_tensor`` is returned.</span>
|
||||
|
||||
<span class="sd"> :param tensor: object representing a tensor passed in to verify, or ``None`` if no tensor was passed in</span>
|
||||
<span class="sd"> :type tensor: numpy/cupy/torch array/tensor object</span>
|
||||
<span class="sd"> :param ref_tensor: object representing a tensor passed in on construction of this object, or ``None`` if no tensor was passed in</span>
|
||||
<span class="sd"> :type ref_tensor: numpy/cupy/torch array/tensor object</span>
|
||||
<span class="sd"> :param ref_dtype: data type for the tensor that this object was initialized to</span>
|
||||
<span class="sd"> :param ref_layout: layout for the tensor that this object was initialized to</span>
|
||||
<span class="sd"> :param name: identifier of the tensor to verify. Used in raising exceptions</span>
|
||||
<span class="sd"> :type name: str</span>
|
||||
|
||||
<span class="sd"> :return: valid tensor object to use</span>
|
||||
<span class="sd"> :rtype: numpy/cupy/torch array/tensor object</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">tensor</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">ref_tensor</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Tensor </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s2"> must be set."</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">ref_tensor</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_verify_type_and_layout</span><span class="p">(</span><span class="n">tensor</span><span class="p">,</span> <span class="n">ref_dtype</span><span class="p">,</span> <span class="n">ref_layout</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">tensor</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_verify_scalar</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">scalar</span><span class="p">,</span> <span class="n">ref_scalar</span><span class="p">,</span> <span class="n">ref_dtype</span><span class="p">,</span> <span class="n">name</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Verifies the following properties:</span>
|
||||
<span class="sd"> 1) Either ``scalar`` or ``ref_scakar`` must be set (i.e., not ``None``)</span>
|
||||
<span class="sd"> 2) If ``scalar`` is not ``None``, its datatype must match matches the current version</span>
|
||||
<span class="sd"> set by the plan (i.e., those in ``ref_dtype``)</span>
|
||||
|
||||
<span class="sd"> If either of these properties does not hold, an exception is raised. If these properties hold and</span>
|
||||
<span class="sd"> ``scalar`` is not ``None``, ``scalar`` is returned. Otherwise, ``ref_scalar`` is returned.</span>
|
||||
|
||||
<span class="sd"> :param scalar: object representing a tensor passed in to verify, or ``None`` if no tensor was passed in</span>
|
||||
<span class="sd"> :type scalar: numpy/cupy/torch scalar</span>
|
||||
<span class="sd"> :param ref_scalar: object representing a tensor passed in on construction of this object, or ``None`` if no tensor was passed in</span>
|
||||
<span class="sd"> :type ref_scalar: numpy/cupy/torch scalar</span>
|
||||
<span class="sd"> :param ref_dtype: data type for the scalar that this object was initialized to</span>
|
||||
<span class="sd"> :param name: identifier of the scalar to verify. Used in raising exceptions</span>
|
||||
<span class="sd"> :type name: str</span>
|
||||
|
||||
<span class="sd"> :return: valid scalar to use</span>
|
||||
<span class="sd"> :rtype: numpy/cupy/torch scalar</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">scalar</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">ref_scalar</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Scalar </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s2"> must be set."</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">ref_scalar</span>
|
||||
<span class="n">dtype</span> <span class="o">=</span> <span class="n">datatypes</span><span class="o">.</span><span class="n">library_type</span><span class="p">(</span><span class="n">scalar</span><span class="o">.</span><span class="n">dtype</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">dtype</span> <span class="o">!=</span> <span class="n">ref_dtype</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span>
|
||||
<span class="sa">f</span><span class="s2">"Tensor </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s2"> with type </span><span class="si">{</span><span class="n">dtype</span><span class="si">}</span><span class="s2"> does not match expected type </span><span class="si">{</span><span class="n">ref_dtype</span><span class="si">}</span><span class="s2">."</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">scalar</span>
|
||||
|
||||
<div class="viewcode-block" id="Gemm.run"><a class="viewcode-back" href="../../../cutlass.op.html#cutlass.op.gemm.Gemm.run">[docs]</a> <span class="k">def</span> <span class="nf">run</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">A</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">B</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">C</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">D</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">alpha</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">beta</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">batch_count</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">1</span><span class="p">,</span>
|
||||
<span class="n">sync</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">True</span><span class="p">,</span> <span class="n">print_module</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">)</span> <span class="o">-></span> <span class="n">GemmArguments</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Runs the kernel currently specified. If it has not already been, the kernel is emitted and</span>
|
||||
<span class="sd"> compiled. Tensors holding operands and outputs of the kernel are sourced either from the</span>
|
||||
<span class="sd"> ``A``, ``B``, ``C``, ``D``, ``alpha``, and ``beta``</span>
|
||||
<span class="sd"> parameters provided in this call, or from those</span>
|
||||
<span class="sd"> passed in on the construction of this object -- one of the two must be specified.</span>
|
||||
|
||||
<span class="sd"> By default, this call returns only once the kernel has completed. To launch the kernel</span>
|
||||
<span class="sd"> and immediately return, set ``sync=False``. In this case, it is the responsibility of the</span>
|
||||
<span class="sd"> caller to syncrhonize the results of the kernel before attempting to access outputs</span>
|
||||
<span class="sd"> by calling ``sync()`` on the arguments returned from this call.</span>
|
||||
|
||||
<span class="sd"> :param A: tensor representing data type and layout of operand A</span>
|
||||
<span class="sd"> :param B: tensor representing data type and layout of operand B</span>
|
||||
<span class="sd"> :param C: tensor representing data type and layout of operand C</span>
|
||||
<span class="sd"> :param D: tensor representing data type and layout of operand D</span>
|
||||
<span class="sd"> :param alpha: scalar paramter alpha from GEMM computation that scales the product of operands A and B</span>
|
||||
<span class="sd"> :param beta: scalar parameter beta from GEMM operation that scales operand C</span>
|
||||
<span class="sd"> :param batch_count: number of GEMMs in the batch</span>
|
||||
<span class="sd"> :type batch_count: int</span>
|
||||
<span class="sd"> :param sync: whether the call should wait for the kernel to complete before returning</span>
|
||||
<span class="sd"> :type sync: bool</span>
|
||||
<span class="sd"> :param print_module: whether to print the emitted C++ code</span>
|
||||
<span class="sd"> :type print_module: bool</span>
|
||||
|
||||
<span class="sd"> :return: arguments passed in to the kernel</span>
|
||||
<span class="sd"> :rtype: cutlass.backend.GemmArguments</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">batch_count</span> <span class="o"><</span> <span class="mi">1</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Invalid batch count </span><span class="si">{</span><span class="n">batch_count</span><span class="si">}</span><span class="s2">. Value must be an integer >= 1."</span><span class="p">)</span>
|
||||
|
||||
<span class="n">A</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_verify_tensor</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">A</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_a</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_a</span><span class="p">,</span> <span class="s2">"A"</span><span class="p">)</span>
|
||||
<span class="n">B</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_verify_tensor</span><span class="p">(</span><span class="n">B</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">B</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_b</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_b</span><span class="p">,</span> <span class="s2">"B"</span><span class="p">)</span>
|
||||
<span class="n">C</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_verify_tensor</span><span class="p">(</span><span class="n">C</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">C</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_c</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_c</span><span class="p">,</span> <span class="s2">"C"</span><span class="p">)</span>
|
||||
<span class="n">D</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_verify_tensor</span><span class="p">(</span><span class="n">D</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">D</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_d</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_d</span><span class="p">,</span> <span class="s2">"D"</span><span class="p">)</span>
|
||||
<span class="n">alpha</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_verify_scalar</span><span class="p">(</span><span class="n">alpha</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">alpha</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_c</span><span class="p">,</span> <span class="s2">"alpha"</span><span class="p">)</span>
|
||||
<span class="n">beta</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_verify_scalar</span><span class="p">(</span><span class="n">beta</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">beta</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_c</span><span class="p">,</span> <span class="s2">"beta"</span><span class="p">)</span>
|
||||
|
||||
<span class="n">alignment_a</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">possible_operations</span><span class="o">.</span><span class="n">find_alignment</span><span class="p">(</span><span class="n">A</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_a</span><span class="p">)</span>
|
||||
<span class="n">alignment_b</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">possible_operations</span><span class="o">.</span><span class="n">find_alignment</span><span class="p">(</span><span class="n">B</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_b</span><span class="p">)</span>
|
||||
<span class="n">alignment_c</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">possible_operations</span><span class="o">.</span><span class="n">find_alignment</span><span class="p">(</span><span class="n">C</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_c</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">tile_description</span><span class="p">,</span> <span class="n">alignment_A</span><span class="o">=</span><span class="n">alignment_a</span><span class="p">,</span> <span class="n">alignment_B</span><span class="o">=</span><span class="n">alignment_b</span><span class="p">,</span>
|
||||
<span class="n">alignment_C</span><span class="o">=</span><span class="n">alignment_c</span><span class="p">,</span> <span class="n">print_module</span><span class="o">=</span><span class="n">print_module</span><span class="p">)</span>
|
||||
|
||||
<span class="n">problem_size</span> <span class="o">=</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">gemm</span><span class="o">.</span><span class="n">GemmCoord</span><span class="p">(</span><span class="n">A</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">B</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">A</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">batch_count</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span>
|
||||
<span class="n">mode</span> <span class="o">=</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">gemm</span><span class="o">.</span><span class="n">Mode</span><span class="o">.</span><span class="n">Gemm</span>
|
||||
<span class="n">kwargs</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'split_k_slices'</span><span class="p">:</span> <span class="mi">1</span><span class="p">}</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">mode</span> <span class="o">=</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">gemm</span><span class="o">.</span><span class="n">Mode</span><span class="o">.</span><span class="n">Batched</span>
|
||||
<span class="n">kwargs</span> <span class="o">=</span> <span class="p">{</span><span class="s1">'batch'</span><span class="p">:</span> <span class="n">batch_count</span><span class="p">}</span>
|
||||
|
||||
<span class="n">arguments</span> <span class="o">=</span> <span class="n">GemmArguments</span><span class="p">(</span>
|
||||
<span class="n">operation</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">operation</span><span class="p">,</span> <span class="n">problem_size</span><span class="o">=</span><span class="n">problem_size</span><span class="p">,</span>
|
||||
<span class="n">A</span><span class="o">=</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="o">=</span><span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="o">=</span><span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="o">=</span><span class="n">D</span><span class="p">,</span>
|
||||
<span class="n">output_op</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">operation</span><span class="o">.</span><span class="n">epilogue_type</span><span class="p">(</span><span class="n">alpha</span><span class="p">,</span> <span class="n">beta</span><span class="p">),</span>
|
||||
<span class="n">gemm_mode</span><span class="o">=</span><span class="n">mode</span><span class="p">,</span>
|
||||
<span class="o">**</span><span class="n">kwargs</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">operation</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">arguments</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">sync</span><span class="p">:</span>
|
||||
<span class="n">arguments</span><span class="o">.</span><span class="n">sync</span><span class="p">()</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">arguments</span></div></div>
|
||||
</pre></div>
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
|
||||
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer no-toc">
|
||||
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
|
||||
<script src="../../../_static/doctools.js"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../../_static/scripts/furo.js"></script>
|
||||
<script src="../../../_static/clipboard.min.js"></script>
|
||||
<script src="../../../_static/copybutton.js"></script>
|
||||
<script src="../../../_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
554
python/docs/_modules/cutlass/op/gemm_grouped.html
Normal file
@ -0,0 +1,554 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><link rel="index" title="Index" href="../../../genindex.html" /><link rel="search" title="Search" href="../../../search.html" />
|
||||
<link rel="canonical" href="docs/_modules/cutlass/op/gemm_grouped.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>cutlass.op.gemm_grouped - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="../../../index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="../../../index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="../../../_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="../../../_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="../../../search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../../../modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="../../../cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../../cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../../cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../../cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../../../examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../../externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../../externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../../externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<h1>Source code for cutlass.op.gemm_grouped</h1><div class="highlight"><pre>
|
||||
<span></span><span class="c1">#################################################################################################</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># Copyright (c) 2023 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.</span>
|
||||
<span class="c1"># SPDX-License-Identifier: BSD-3-Clause</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># Redistribution and use in source and binary forms, with or without</span>
|
||||
<span class="c1"># modification, are permitted provided that the following conditions are met:</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 1. Redistributions of source code must retain the above copyright notice, this</span>
|
||||
<span class="c1"># list of conditions and the following disclaimer.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 2. Redistributions in binary form must reproduce the above copyright notice,</span>
|
||||
<span class="c1"># this list of conditions and the following disclaimer in the documentation</span>
|
||||
<span class="c1"># and/or other materials provided with the distribution.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 3. Neither the name of the copyright holder nor the names of its</span>
|
||||
<span class="c1"># contributors may be used to endorse or promote products derived from</span>
|
||||
<span class="c1"># this software without specific prior written permission.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"</span>
|
||||
<span class="c1"># AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE</span>
|
||||
<span class="c1"># IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE</span>
|
||||
<span class="c1"># DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE</span>
|
||||
<span class="c1"># FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL</span>
|
||||
<span class="c1"># DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR</span>
|
||||
<span class="c1"># SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER</span>
|
||||
<span class="c1"># CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,</span>
|
||||
<span class="c1"># OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE</span>
|
||||
<span class="c1"># OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1">#################################################################################################</span>
|
||||
|
||||
<span class="sd">"""</span>
|
||||
<span class="sd"> Ease-of-use interface for constructing, compiling, and running GEMMs.</span>
|
||||
|
||||
<span class="sd"> The ``GroupedGemm`` interface is meant to allow one to easily instantiate, compile, and run</span>
|
||||
<span class="sd"> grouped GEMM operations in CUTLASS via Python, without specifying many configuration parameters.</span>
|
||||
<span class="sd"> Under the hood, the interface will select sensible default parameters for the many template</span>
|
||||
<span class="sd"> parameters for CUTLASS grouped GEMMs.</span>
|
||||
|
||||
<span class="sd"> Note: optimal performance is not to be expected from this interface. To achieve optimal</span>
|
||||
<span class="sd"> performance, one should specify and tune each configuration parameter.</span>
|
||||
|
||||
<span class="sd"> The simplest example of using this interface is the following:</span>
|
||||
|
||||
<span class="sd"> .. highlight:: python</span>
|
||||
<span class="sd"> .. code-block:: python</span>
|
||||
|
||||
<span class="sd"> # As, Bs, Cs, and Ds are torch/numpy/cupy tensor objects</span>
|
||||
<span class="sd"> plan = cutlass.op.GroupedGemm(element=cutlass.DataType.f16, layout=cutlass.LayoutType.RowMajor)</span>
|
||||
<span class="sd"> plan.run([A0, A1], [B0, B1], [C0, C1], [D0, D1])</span>
|
||||
<span class="sd">"""</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">cutlass_bindings</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">cutlass.backend.gemm_operation</span> <span class="kn">import</span> <span class="p">(</span>
|
||||
<span class="n">GemmGroupedArguments</span><span class="p">,</span>
|
||||
<span class="n">GemmOperationGrouped</span><span class="p">,</span>
|
||||
<span class="p">)</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.backend.library</span> <span class="kn">import</span> <span class="p">(</span>
|
||||
<span class="n">DataTypeSize</span><span class="p">,</span>
|
||||
<span class="n">SchedulerMode</span><span class="p">,</span>
|
||||
<span class="n">TensorDescription</span><span class="p">,</span>
|
||||
<span class="n">TileDescription</span><span class="p">,</span>
|
||||
<span class="p">)</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.op.gemm</span> <span class="kn">import</span> <span class="n">Gemm</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.utils</span> <span class="kn">import</span> <span class="n">check</span><span class="p">,</span> <span class="n">datatypes</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="GroupedGemm"><a class="viewcode-back" href="../../../cutlass.op.html#cutlass.op.gemm_grouped.GroupedGemm">[docs]</a><span class="k">class</span> <span class="nc">GroupedGemm</span><span class="p">(</span><span class="n">Gemm</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Constructs a ``GroupedGemm`` object.</span>
|
||||
|
||||
<span class="sd"> The data types and layouts of operands A, B, and C, along with the data type of output D</span>
|
||||
<span class="sd"> and that used for accumulation, are bound to the ``GroupedGemm`` object throughout its lifetime --</span>
|
||||
<span class="sd"> these are not to be changed after a ``GroupedGemm`` has been constructed.</span>
|
||||
|
||||
<span class="sd"> The constructor has optional parameters for flexibly setting these parameters. Please see the constructor</span>
|
||||
<span class="sd"> for ``Gemm`` for examples of these.</span>
|
||||
|
||||
<span class="sd"> :param cc: compute capability of device to generate kernels for</span>
|
||||
<span class="sd"> :type cc: int</span>
|
||||
<span class="sd"> :param A: tensor representing data type and layout of operands A</span>
|
||||
<span class="sd"> :param B: tensor representing data type and layout of operands B</span>
|
||||
<span class="sd"> :param C: tensor representing data type and layout of operands C</span>
|
||||
<span class="sd"> :param D: tensor representing data type and layout of operands D</span>
|
||||
<span class="sd"> :param alpha: scalar paramter alpha from GEMM computation that scales the product of operands A and B</span>
|
||||
<span class="sd"> :param beta: scalar parameter beta from GEMM operation that scales operand C</span>
|
||||
<span class="sd"> :param element_accumulator: data type to be used in accumulation of the product of operands A and B</span>
|
||||
<span class="sd"> :type element_accumulator: cutlass.DataType</span>
|
||||
<span class="sd"> :param element: generic data type to be used for operands A, B, C, D, as well as the accumulation data type</span>
|
||||
<span class="sd"> :type element: cutlass.DataType</span>
|
||||
<span class="sd"> :param layout: generic layout type to be used for operands A, B, C, and D</span>
|
||||
<span class="sd"> :type layout: cutlass.LayoutType</span>
|
||||
<span class="sd"> :param element_A: data type to be used for operand A</span>
|
||||
<span class="sd"> :type element_A: cutlass.DataType</span>
|
||||
<span class="sd"> :param element_B: data type to be used for operand B</span>
|
||||
<span class="sd"> :type element_B: cutlass.DataType</span>
|
||||
<span class="sd"> :param element_C: data type to be used for operand C</span>
|
||||
<span class="sd"> :type element_C: cutlass.DataType</span>
|
||||
<span class="sd"> :param element_D: data type to be used for operand D</span>
|
||||
<span class="sd"> :type element_D: cutlass.DataType</span>
|
||||
<span class="sd"> :type layout_A: layout of operand A</span>
|
||||
<span class="sd"> :param layout_A: cutlass.LayoutType</span>
|
||||
<span class="sd"> :type layout_B: layout of operand B</span>
|
||||
<span class="sd"> :param layout_B: cutlass.LayoutType</span>
|
||||
<span class="sd"> :type layout_C: layout of operand C</span>
|
||||
<span class="sd"> :param layout_C: cutlass.LayoutType</span>
|
||||
<span class="sd"> :type layout_D: layout of operand D</span>
|
||||
<span class="sd"> :param layout_D: cutlass.LayoutType</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span>
|
||||
<span class="bp">self</span><span class="p">,</span> <span class="n">A</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">B</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">C</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">D</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">alpha</span><span class="o">=</span><span class="mf">1.0</span><span class="p">,</span> <span class="n">beta</span><span class="o">=</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">element_accumulator</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">element</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">element_A</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">element_B</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">element_C</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">element_D</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">layout_A</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">layout_B</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">layout_C</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">cc</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
|
||||
<span class="p">):</span>
|
||||
<span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">(</span>
|
||||
<span class="n">A</span><span class="o">=</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="o">=</span><span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="o">=</span><span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="o">=</span><span class="n">D</span><span class="p">,</span>
|
||||
<span class="n">alpha</span><span class="o">=</span><span class="n">alpha</span><span class="p">,</span> <span class="n">beta</span><span class="o">=</span><span class="n">beta</span><span class="p">,</span>
|
||||
<span class="n">element_accumulator</span><span class="o">=</span><span class="n">element_accumulator</span><span class="p">,</span>
|
||||
<span class="n">element</span><span class="o">=</span><span class="n">element</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">layout</span><span class="p">,</span>
|
||||
<span class="n">element_A</span><span class="o">=</span><span class="n">element_A</span><span class="p">,</span> <span class="n">element_B</span><span class="o">=</span><span class="n">element_B</span><span class="p">,</span>
|
||||
<span class="n">element_C</span><span class="o">=</span><span class="n">element_C</span><span class="p">,</span> <span class="n">element_D</span><span class="o">=</span><span class="n">element_D</span><span class="p">,</span>
|
||||
<span class="n">layout_A</span><span class="o">=</span><span class="n">layout_A</span><span class="p">,</span> <span class="n">layout_B</span><span class="o">=</span><span class="n">layout_B</span><span class="p">,</span> <span class="n">layout_C</span><span class="o">=</span><span class="n">layout_C</span><span class="p">,</span>
|
||||
<span class="n">cc</span><span class="o">=</span><span class="n">cc</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
<span class="c1"># Grouped GEMM specializations for SM90 are currently unavailable. Revert to using SM80</span>
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_cc</span> <span class="o">==</span> <span class="mi">90</span><span class="p">:</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_reset_options</span><span class="p">(</span><span class="mi">80</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_reset_operations</span><span class="p">(</span><span class="n">reset_epilogue</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="s2">"grouped_gemm"</span>
|
||||
|
||||
<span class="nd">@Gemm</span><span class="o">.</span><span class="n">swizzling_functor</span><span class="o">.</span><span class="n">setter</span>
|
||||
<span class="k">def</span> <span class="nf">swizzling_functor</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">swizzling_functor</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Sets the swizzling functor to the type specified by `swizzling_functor`</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">'Grouped GEMM does not currently support different swizzling functors'</span><span class="p">)</span>
|
||||
|
||||
<div class="viewcode-block" id="GroupedGemm.construct"><a class="viewcode-back" href="../../../cutlass.op.html#cutlass.op.gemm_grouped.GroupedGemm.construct">[docs]</a> <span class="k">def</span> <span class="nf">construct</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">tile_description</span><span class="p">:</span> <span class="n">TileDescription</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">alignment_A</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">alignment_B</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span>
|
||||
<span class="n">alignment_C</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="kc">None</span><span class="p">)</span> <span class="o">-></span> <span class="n">GemmOperationGrouped</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Constructs a ``cutlass.backend.GemmOperationGrouped`` based on the input parameters and current</span>
|
||||
<span class="sd"> kernel specification of the ``Gemm`` object.</span>
|
||||
|
||||
<span class="sd"> :param tile_description: tile description specifying shapes and operand types to use in the kernel</span>
|
||||
<span class="sd"> :type tile_description: cutlass.backend.TileDescription</span>
|
||||
<span class="sd"> :param alignment_A: alignment of operand A</span>
|
||||
<span class="sd"> :type alignment_A: int</span>
|
||||
<span class="sd"> :param alignment_B: alignment of operand B</span>
|
||||
<span class="sd"> :type alignment_B: int</span>
|
||||
<span class="sd"> :param alignment_C: alignment of operand C</span>
|
||||
<span class="sd"> :type alignment_C: int</span>
|
||||
|
||||
<span class="sd"> :return: operation that was constructed</span>
|
||||
<span class="sd"> :rtype: cutlass.backend.GemmOperationGrouped</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">alignment_preference</span> <span class="o">=</span> <span class="nb">max</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">possible_operations</span><span class="o">.</span><span class="n">alignments</span><span class="p">)</span>
|
||||
<span class="n">alignment_A</span> <span class="o">=</span> <span class="n">check</span><span class="o">.</span><span class="n">alignment_or_default</span><span class="p">(</span><span class="n">alignment_A</span><span class="p">,</span> <span class="n">alignment_preference</span><span class="p">)</span>
|
||||
<span class="n">alignment_B</span> <span class="o">=</span> <span class="n">check</span><span class="o">.</span><span class="n">alignment_or_default</span><span class="p">(</span><span class="n">alignment_B</span><span class="p">,</span> <span class="n">alignment_preference</span><span class="p">)</span>
|
||||
<span class="n">alignment_C</span> <span class="o">=</span> <span class="n">check</span><span class="o">.</span><span class="n">alignment_or_default</span><span class="p">(</span><span class="n">alignment_C</span><span class="p">,</span> <span class="n">alignment_preference</span><span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">_reset_epilogue_functor_alignment</span><span class="p">(</span><span class="n">alignment_C</span><span class="p">)</span>
|
||||
|
||||
<span class="n">tensor_A</span> <span class="o">=</span> <span class="n">TensorDescription</span><span class="p">(</span>
|
||||
<span class="n">datatypes</span><span class="o">.</span><span class="n">binding_type</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_a</span><span class="p">),</span>
|
||||
<span class="n">datatypes</span><span class="o">.</span><span class="n">binding_layout</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_layout_a</span><span class="p">),</span>
|
||||
<span class="n">alignment_A</span>
|
||||
<span class="p">)</span>
|
||||
<span class="n">tensor_B</span> <span class="o">=</span> <span class="n">TensorDescription</span><span class="p">(</span>
|
||||
<span class="n">datatypes</span><span class="o">.</span><span class="n">binding_type</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_b</span><span class="p">),</span>
|
||||
<span class="n">datatypes</span><span class="o">.</span><span class="n">binding_layout</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_layout_b</span><span class="p">),</span>
|
||||
<span class="n">alignment_B</span>
|
||||
<span class="p">)</span>
|
||||
<span class="n">tensor_C</span> <span class="o">=</span> <span class="n">TensorDescription</span><span class="p">(</span>
|
||||
<span class="n">datatypes</span><span class="o">.</span><span class="n">binding_type</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_element_c</span><span class="p">),</span>
|
||||
<span class="n">datatypes</span><span class="o">.</span><span class="n">binding_layout</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">_layout_c</span><span class="p">),</span>
|
||||
<span class="n">alignment_C</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">tile_description</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="n">op</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">possible_operations</span><span class="o">.</span><span class="n">operations</span><span class="p">(</span><span class="n">alignment_A</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span>
|
||||
<span class="n">tile_description</span> <span class="o">=</span> <span class="n">datatypes</span><span class="o">.</span><span class="n">td_from_profiler_op</span><span class="p">(</span><span class="n">op</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="n">valid</span><span class="p">,</span> <span class="n">err_str</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_valid_tile_description</span><span class="p">(</span><span class="n">tile_description</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="n">valid</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Invalid tile description. </span><span class="si">{</span><span class="n">err_str</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">tile_description</span> <span class="o">=</span> <span class="n">tile_description</span>
|
||||
|
||||
<span class="n">operation</span> <span class="o">=</span> <span class="n">GemmOperationGrouped</span><span class="p">(</span>
|
||||
<span class="n">arch</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">current_cc</span><span class="p">,</span>
|
||||
<span class="n">tile_description</span><span class="o">=</span><span class="n">tile_description</span><span class="p">,</span>
|
||||
<span class="n">A</span><span class="o">=</span><span class="n">tensor_A</span><span class="p">,</span> <span class="n">B</span><span class="o">=</span><span class="n">tensor_B</span><span class="p">,</span> <span class="n">C</span><span class="o">=</span><span class="n">tensor_C</span><span class="p">,</span>
|
||||
<span class="n">epilogue_functor</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">epilogue_functor</span><span class="p">,</span>
|
||||
<span class="n">swizzling_functor</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">_swizzling_functor</span><span class="p">,</span>
|
||||
<span class="n">precompute_mode</span><span class="o">=</span><span class="n">SchedulerMode</span><span class="o">.</span><span class="n">Device</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">operation</span></div>
|
||||
|
||||
<div class="viewcode-block" id="GroupedGemm.run"><a class="viewcode-back" href="../../../cutlass.op.html#cutlass.op.gemm_grouped.GroupedGemm.run">[docs]</a> <span class="k">def</span> <span class="nf">run</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">,</span>
|
||||
<span class="n">alpha</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">beta</span><span class="o">=</span><span class="kc">None</span><span class="p">,</span> <span class="n">sync</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">True</span><span class="p">,</span>
|
||||
<span class="n">print_module</span><span class="p">:</span> <span class="nb">bool</span> <span class="o">=</span> <span class="kc">False</span><span class="p">)</span> <span class="o">-></span> <span class="n">GemmGroupedArguments</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Runs the kernel currently specified.</span>
|
||||
|
||||
<span class="sd"> By default, this call returns only once the kernel has completed. To launch the kernel</span>
|
||||
<span class="sd"> and immediately return, set ``sync=False``. In this case, it is the responsibility of the</span>
|
||||
<span class="sd"> caller to syncrhonize the results of the kernel before attempting to access outputs</span>
|
||||
<span class="sd"> by calling ``sync()`` on the arguments returned from this call.</span>
|
||||
|
||||
<span class="sd"> :param A: list of tensors representing data type and layout of operand A</span>
|
||||
<span class="sd"> :type A: list</span>
|
||||
<span class="sd"> :param B: list of tensors representing data type and layout of operand B</span>
|
||||
<span class="sd"> :type B: list</span>
|
||||
<span class="sd"> :param C: list of tensors representing data type and layout of operand C</span>
|
||||
<span class="sd"> :type C: list</span>
|
||||
<span class="sd"> :param D: list of tensors representing data type and layout of operand D</span>
|
||||
<span class="sd"> :type D: list</span>
|
||||
<span class="sd"> :param alpha: scalar paramter alpha from GEMM computation that scales the product of operands A and B</span>
|
||||
<span class="sd"> :param beta: scalar parameter beta from GEMM operation that scales operand C</span>
|
||||
<span class="sd"> :param sync: whether the call should wait for the kernel to complete before returning</span>
|
||||
<span class="sd"> :type sync: bool</span>
|
||||
<span class="sd"> :param print_module: whether to print the emitted C++ code</span>
|
||||
<span class="sd"> :type print_module: bool</span>
|
||||
|
||||
<span class="sd"> :return: arguments passed in to the kernel</span>
|
||||
<span class="sd"> :rtype: cutlass.backend.GemmGroupedArguments</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">A</span><span class="p">)</span> <span class="o">!=</span> <span class="nb">len</span><span class="p">(</span><span class="n">B</span><span class="p">)</span> <span class="ow">or</span> <span class="nb">len</span><span class="p">(</span><span class="n">A</span><span class="p">)</span> <span class="o">!=</span> <span class="nb">len</span><span class="p">(</span><span class="n">C</span><span class="p">)</span> <span class="ow">or</span> <span class="nb">len</span><span class="p">(</span><span class="n">A</span><span class="p">)</span> <span class="o">!=</span> <span class="nb">len</span><span class="p">(</span><span class="n">D</span><span class="p">):</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s2">"Lengths of A, B, C, and D lists must be equal"</span><span class="p">)</span>
|
||||
|
||||
<span class="n">problem_sizes</span> <span class="o">=</span> <span class="p">[]</span>
|
||||
<span class="n">As</span><span class="p">,</span> <span class="n">Bs</span><span class="p">,</span> <span class="n">Cs</span><span class="p">,</span> <span class="n">Ds</span> <span class="o">=</span> <span class="p">([</span><span class="kc">None</span><span class="p">]</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">A</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">))</span>
|
||||
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">A</span><span class="p">)):</span>
|
||||
<span class="n">As</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_verify_tensor</span><span class="p">(</span><span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="bp">self</span><span class="o">.</span><span class="n">A</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_a</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_a</span><span class="p">,</span> <span class="s2">"A"</span><span class="p">)</span>
|
||||
<span class="n">Bs</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_verify_tensor</span><span class="p">(</span><span class="n">B</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="bp">self</span><span class="o">.</span><span class="n">B</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_b</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_b</span><span class="p">,</span> <span class="s2">"B"</span><span class="p">)</span>
|
||||
<span class="n">Cs</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_verify_tensor</span><span class="p">(</span><span class="n">C</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="bp">self</span><span class="o">.</span><span class="n">C</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_c</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_c</span><span class="p">,</span> <span class="s2">"C"</span><span class="p">)</span>
|
||||
<span class="n">Ds</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_verify_tensor</span><span class="p">(</span><span class="n">D</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="bp">self</span><span class="o">.</span><span class="n">D</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_d</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_d</span><span class="p">,</span> <span class="s2">"D"</span><span class="p">)</span>
|
||||
<span class="n">problem_sizes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">gemm</span><span class="o">.</span><span class="n">GemmCoord</span><span class="p">(</span><span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">B</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">A</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span>
|
||||
|
||||
<span class="n">alpha</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_verify_scalar</span><span class="p">(</span><span class="n">alpha</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">alpha</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_c</span><span class="p">,</span> <span class="s2">"alpha"</span><span class="p">)</span>
|
||||
<span class="n">beta</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_verify_scalar</span><span class="p">(</span><span class="n">beta</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">beta</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_element_c</span><span class="p">,</span> <span class="s2">"beta"</span><span class="p">)</span>
|
||||
|
||||
<span class="n">alignment_a</span> <span class="o">=</span> <span class="nb">min</span><span class="p">((</span><span class="bp">self</span><span class="o">.</span><span class="n">possible_operations</span><span class="o">.</span><span class="n">find_alignment</span><span class="p">(</span><span class="n">A</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_a</span><span class="p">)</span> <span class="k">for</span> <span class="n">A</span> <span class="ow">in</span> <span class="n">As</span><span class="p">))</span>
|
||||
<span class="n">alignment_b</span> <span class="o">=</span> <span class="nb">min</span><span class="p">((</span><span class="bp">self</span><span class="o">.</span><span class="n">possible_operations</span><span class="o">.</span><span class="n">find_alignment</span><span class="p">(</span><span class="n">B</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_b</span><span class="p">)</span> <span class="k">for</span> <span class="n">B</span> <span class="ow">in</span> <span class="n">Bs</span><span class="p">))</span>
|
||||
<span class="n">alignment_c</span> <span class="o">=</span> <span class="nb">min</span><span class="p">((</span><span class="bp">self</span><span class="o">.</span><span class="n">possible_operations</span><span class="o">.</span><span class="n">find_alignment</span><span class="p">(</span><span class="n">C</span><span class="o">.</span><span class="n">shape</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">_layout_c</span><span class="p">)</span> <span class="k">for</span> <span class="n">C</span> <span class="ow">in</span> <span class="n">Cs</span><span class="p">))</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">tile_description</span><span class="p">,</span> <span class="n">alignment_A</span><span class="o">=</span><span class="n">alignment_a</span><span class="p">,</span> <span class="n">alignment_B</span><span class="o">=</span><span class="n">alignment_b</span><span class="p">,</span>
|
||||
<span class="n">alignment_C</span><span class="o">=</span><span class="n">alignment_c</span><span class="p">,</span> <span class="n">print_module</span><span class="o">=</span><span class="n">print_module</span><span class="p">)</span>
|
||||
|
||||
<span class="n">arguments</span> <span class="o">=</span> <span class="n">GemmGroupedArguments</span><span class="p">(</span>
|
||||
<span class="n">operation</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">operation</span><span class="p">,</span>
|
||||
<span class="n">problem_sizes</span><span class="o">=</span><span class="n">problem_sizes</span><span class="p">,</span>
|
||||
<span class="n">A</span><span class="o">=</span><span class="n">As</span><span class="p">,</span> <span class="n">B</span><span class="o">=</span><span class="n">Bs</span><span class="p">,</span> <span class="n">C</span><span class="o">=</span><span class="n">Cs</span><span class="p">,</span> <span class="n">D</span><span class="o">=</span><span class="n">Ds</span><span class="p">,</span>
|
||||
<span class="n">output_op</span><span class="o">=</span><span class="bp">self</span><span class="o">.</span><span class="n">operation</span><span class="o">.</span><span class="n">epilogue_type</span><span class="p">(</span><span class="n">alpha</span><span class="p">,</span> <span class="n">beta</span><span class="p">)</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">operation</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">arguments</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">sync</span><span class="p">:</span>
|
||||
<span class="n">arguments</span><span class="o">.</span><span class="n">sync</span><span class="p">()</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">arguments</span></div></div>
|
||||
</pre></div>
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
|
||||
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer no-toc">
|
||||
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
|
||||
<script src="../../../_static/doctools.js"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../../_static/scripts/furo.js"></script>
|
||||
<script src="../../../_static/clipboard.min.js"></script>
|
||||
<script src="../../../_static/copybutton.js"></script>
|
||||
<script src="../../../_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
400
python/docs/_modules/cutlass/op/op.html
Normal file
@ -0,0 +1,400 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><link rel="index" title="Index" href="../../../genindex.html" /><link rel="search" title="Search" href="../../../search.html" />
|
||||
<link rel="canonical" href="docs/_modules/cutlass/op/op.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>cutlass.op.op - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="../../../index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="../../../index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="../../../_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="../../../_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="../../../search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../../../modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="../../../cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../../cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../../cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../../cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../../../examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../../externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../../externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../../externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<h1>Source code for cutlass.op.op</h1><div class="highlight"><pre>
|
||||
<span></span><span class="c1">#################################################################################################</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># Copyright (c) 2023 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.</span>
|
||||
<span class="c1"># SPDX-License-Identifier: BSD-3-Clause</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># Redistribution and use in source and binary forms, with or without</span>
|
||||
<span class="c1"># modification, are permitted provided that the following conditions are met:</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 1. Redistributions of source code must retain the above copyright notice, this</span>
|
||||
<span class="c1"># list of conditions and the following disclaimer.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 2. Redistributions in binary form must reproduce the above copyright notice,</span>
|
||||
<span class="c1"># this list of conditions and the following disclaimer in the documentation</span>
|
||||
<span class="c1"># and/or other materials provided with the distribution.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 3. Neither the name of the copyright holder nor the names of its</span>
|
||||
<span class="c1"># contributors may be used to endorse or promote products derived from</span>
|
||||
<span class="c1"># this software without specific prior written permission.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"</span>
|
||||
<span class="c1"># AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE</span>
|
||||
<span class="c1"># IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE</span>
|
||||
<span class="c1"># DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE</span>
|
||||
<span class="c1"># FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL</span>
|
||||
<span class="c1"># DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR</span>
|
||||
<span class="c1"># SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER</span>
|
||||
<span class="c1"># CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,</span>
|
||||
<span class="c1"># OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE</span>
|
||||
<span class="c1"># OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1">#################################################################################################</span>
|
||||
|
||||
<span class="sd">"""</span>
|
||||
<span class="sd">Base operation used for defining high-level CUTLASS operations (e.g., GEMM, Conv2d)</span>
|
||||
<span class="sd">"""</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">bisect</span> <span class="kn">import</span> <span class="n">bisect_left</span>
|
||||
|
||||
<span class="kn">from</span> <span class="nn">cutlass</span> <span class="kn">import</span> <span class="n">option_registry</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.backend.utils.device</span> <span class="kn">import</span> <span class="n">device_cc</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.epilogue</span> <span class="kn">import</span> <span class="n">get_activations</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.library_defaults</span> <span class="kn">import</span> <span class="n">_generator_ccs</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.swizzle</span> <span class="kn">import</span> <span class="n">get_swizzling_functors</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="OperationBase"><a class="viewcode-back" href="../../../cutlass.op.html#cutlass.op.op.OperationBase">[docs]</a><span class="k">class</span> <span class="nc">OperationBase</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Base operation used for defining high-level CUTLASS operations (e.g., GEMM, Conv2d)</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">cc</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="kc">None</span><span class="p">,</span> <span class="n">kernel_cc</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="kc">None</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> :param cc: compute capability of device for which kernels should be compiled. For example, if running on H100, this should be set to 90</span>
|
||||
<span class="sd"> :type cc: int</span>
|
||||
<span class="sd"> :param kernel_cc: compute capability of kernels to generate. For example, if running on SM90, but desiring to use a CUTLASS 2.x-style Ampere kernel, this should be set to 80</span>
|
||||
<span class="sd"> :type kernel_cc: int</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">cc</span> <span class="o">=</span> <span class="n">cc</span> <span class="k">if</span> <span class="n">cc</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="k">else</span> <span class="n">device_cc</span><span class="p">()</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">specified_kernel_cc</span> <span class="o">=</span> <span class="n">kernel_cc</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">current_cc</span> <span class="o">=</span> <span class="n">kernel_cc</span> <span class="k">if</span> <span class="n">kernel_cc</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="k">else</span> <span class="bp">self</span><span class="o">.</span><span class="n">_find_closest_cc</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">cc</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">tile_description</span> <span class="o">=</span> <span class="kc">None</span>
|
||||
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">options</span> <span class="o">=</span> <span class="n">option_registry</span><span class="o">.</span><span class="n">options_for_cc</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">current_cc</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="bp">self</span><span class="o">.</span><span class="n">options</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Invalid or unsupported compute capability: </span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">current_cc</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_find_closest_cc</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">cc</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="nb">int</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the closest CC in _generator_ccs less than or equal to `cc`</span>
|
||||
|
||||
<span class="sd"> :param cc: compute capability to query</span>
|
||||
<span class="sd"> :type cc: int</span>
|
||||
|
||||
<span class="sd"> :returns: closest CC in _generator_ccs less than or equal to `cc`</span>
|
||||
<span class="sd"> :rtype: int</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">cc</span> <span class="ow">in</span> <span class="n">_generator_ccs</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">cc</span>
|
||||
|
||||
<span class="c1"># Find closest CC lower than this CC</span>
|
||||
<span class="n">idx</span> <span class="o">=</span> <span class="n">bisect_left</span><span class="p">(</span><span class="n">_generator_ccs</span><span class="p">,</span> <span class="n">cc</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">idx</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s1">'No valid CC to fall back to for </span><span class="si">{</span><span class="n">cc</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">_generator_ccs</span><span class="p">[</span><span class="n">idx</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span>
|
||||
|
||||
<div class="viewcode-block" id="OperationBase.activations"><a class="viewcode-back" href="../../../cutlass.op.html#cutlass.op.op.OperationBase.activations">[docs]</a> <span class="k">def</span> <span class="nf">activations</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-></span> <span class="nb">list</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns possible activation functions that can be used</span>
|
||||
|
||||
<span class="sd"> :return: list of activation functions that can be used</span>
|
||||
<span class="sd"> :rtype: list</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">get_activations</span><span class="p">()</span></div>
|
||||
|
||||
<div class="viewcode-block" id="OperationBase.swizzling_functors"><a class="viewcode-back" href="../../../cutlass.op.html#cutlass.op.op.OperationBase.swizzling_functors">[docs]</a> <span class="k">def</span> <span class="nf">swizzling_functors</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-></span> <span class="nb">list</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns possible swizzling functions that can be used</span>
|
||||
|
||||
<span class="sd"> :return: list of swizzling functions that can be used</span>
|
||||
<span class="sd"> :rtype: list</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">get_swizzling_functors</span><span class="p">()</span></div>
|
||||
|
||||
<span class="k">def</span> <span class="nf">_reset_options</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">cc</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Resets the kernel options based on cc</span>
|
||||
|
||||
<span class="sd"> :param cc: compute capability to reset to</span>
|
||||
<span class="sd"> :type cc: int</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">cc</span> <span class="o">!=</span> <span class="bp">self</span><span class="o">.</span><span class="n">current_cc</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">cc</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">_generator_ccs</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Invalid CC for CUTLASS kernels: </span><span class="si">{</span><span class="n">cc</span><span class="si">}</span><span class="s1">.'</span><span class="p">)</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">current_cc</span> <span class="o">=</span> <span class="n">cc</span>
|
||||
<span class="bp">self</span><span class="o">.</span><span class="n">options</span> <span class="o">=</span> <span class="n">option_registry</span><span class="o">.</span><span class="n">options_for_cc</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">current_cc</span><span class="p">)</span></div>
|
||||
</pre></div>
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
|
||||
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer no-toc">
|
||||
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
|
||||
<script src="../../../_static/doctools.js"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../../_static/scripts/furo.js"></script>
|
||||
<script src="../../../_static/clipboard.min.js"></script>
|
||||
<script src="../../../_static/copybutton.js"></script>
|
||||
<script src="../../../_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
350
python/docs/_modules/cutlass/swizzle.html
Normal file
@ -0,0 +1,350 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><link rel="index" title="Index" href="../../genindex.html" /><link rel="search" title="Search" href="../../search.html" />
|
||||
<link rel="canonical" href="docs/_modules/cutlass/swizzle.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>cutlass.swizzle - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="../../index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="../../index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="../../_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="../../_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="../../search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../../modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="../../cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../../examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<h1>Source code for cutlass.swizzle</h1><div class="highlight"><pre>
|
||||
<span></span><span class="c1">#################################################################################################</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># Copyright (c) 2023 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.</span>
|
||||
<span class="c1"># SPDX-License-Identifier: BSD-3-Clause</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># Redistribution and use in source and binary forms, with or without</span>
|
||||
<span class="c1"># modification, are permitted provided that the following conditions are met:</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 1. Redistributions of source code must retain the above copyright notice, this</span>
|
||||
<span class="c1"># list of conditions and the following disclaimer.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 2. Redistributions in binary form must reproduce the above copyright notice,</span>
|
||||
<span class="c1"># this list of conditions and the following disclaimer in the documentation</span>
|
||||
<span class="c1"># and/or other materials provided with the distribution.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 3. Neither the name of the copyright holder nor the names of its</span>
|
||||
<span class="c1"># contributors may be used to endorse or promote products derived from</span>
|
||||
<span class="c1"># this software without specific prior written permission.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"</span>
|
||||
<span class="c1"># AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE</span>
|
||||
<span class="c1"># IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE</span>
|
||||
<span class="c1"># DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE</span>
|
||||
<span class="c1"># FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL</span>
|
||||
<span class="c1"># DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR</span>
|
||||
<span class="c1"># SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER</span>
|
||||
<span class="c1"># CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,</span>
|
||||
<span class="c1"># OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE</span>
|
||||
<span class="c1"># OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1">#################################################################################################</span>
|
||||
|
||||
<span class="sd">"""</span>
|
||||
<span class="sd">Registry of swizzling functions</span>
|
||||
<span class="sd">"""</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">cutlass_bindings</span>
|
||||
|
||||
<span class="n">IdentitySwizzle1</span> <span class="o">=</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">IdentitySwizzle1</span>
|
||||
<span class="n">IdentitySwizzle2</span> <span class="o">=</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">IdentitySwizzle2</span>
|
||||
<span class="n">IdentitySwizzle4</span> <span class="o">=</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">IdentitySwizzle4</span>
|
||||
<span class="n">IdentitySwizzle8</span> <span class="o">=</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">IdentitySwizzle8</span>
|
||||
<span class="n">HorizontalSwizzle</span> <span class="o">=</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">HorizontalSwizzle</span>
|
||||
<span class="n">BatchedIdentitySwizzle</span> <span class="o">=</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">BatchedIdentitySwizzle</span>
|
||||
<span class="n">ThreadblockSwizzleStreamK</span> <span class="o">=</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">ThreadblockSwizzleStreamK</span>
|
||||
<span class="n">StridedDgradIdentitySwizzle1</span> <span class="o">=</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">StridedDgradIdentitySwizzle1</span>
|
||||
<span class="n">StridedDgradIdentitySwizzle4</span> <span class="o">=</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">StridedDgradIdentitySwizzle4</span>
|
||||
<span class="n">StridedDgradHorizontalSwizzle</span> <span class="o">=</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">StridedDgradHorizontalSwizzle</span>
|
||||
|
||||
|
||||
<span class="n">_swizzling_functors</span> <span class="o">=</span> <span class="p">[</span>
|
||||
<span class="n">IdentitySwizzle1</span><span class="p">,</span>
|
||||
<span class="n">IdentitySwizzle2</span><span class="p">,</span>
|
||||
<span class="n">IdentitySwizzle4</span><span class="p">,</span>
|
||||
<span class="n">IdentitySwizzle8</span><span class="p">,</span>
|
||||
<span class="n">HorizontalSwizzle</span><span class="p">,</span>
|
||||
<span class="n">BatchedIdentitySwizzle</span><span class="p">,</span>
|
||||
<span class="n">ThreadblockSwizzleStreamK</span><span class="p">,</span>
|
||||
<span class="n">StridedDgradIdentitySwizzle1</span><span class="p">,</span>
|
||||
<span class="n">StridedDgradIdentitySwizzle4</span><span class="p">,</span>
|
||||
<span class="n">StridedDgradHorizontalSwizzle</span><span class="p">,</span>
|
||||
<span class="p">]</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="get_swizzling_functors"><a class="viewcode-back" href="../../cutlass.html#cutlass.swizzle.get_swizzling_functors">[docs]</a><span class="k">def</span> <span class="nf">get_swizzling_functors</span><span class="p">():</span>
|
||||
<span class="k">return</span> <span class="n">_swizzling_functors</span></div>
|
||||
</pre></div>
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
|
||||
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer no-toc">
|
||||
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="../../" id="documentation_options" src="../../_static/documentation_options.js"></script>
|
||||
<script src="../../_static/doctools.js"></script>
|
||||
<script src="../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../_static/scripts/furo.js"></script>
|
||||
<script src="../../_static/clipboard.min.js"></script>
|
||||
<script src="../../_static/copybutton.js"></script>
|
||||
<script src="../../_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
476
python/docs/_modules/cutlass/utils/check.html
Normal file
@ -0,0 +1,476 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><link rel="index" title="Index" href="../../../genindex.html" /><link rel="search" title="Search" href="../../../search.html" />
|
||||
<link rel="canonical" href="docs/_modules/cutlass/utils/check.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>cutlass.utils.check - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="../../../index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="../../../index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="../../../_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="../../../_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="../../../search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../../../modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="../../../cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../../cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../../cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../../cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../../../examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../../externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../../externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../../externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<h1>Source code for cutlass.utils.check</h1><div class="highlight"><pre>
|
||||
<span></span><span class="c1">#################################################################################################</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># Copyright (c) 2023 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.</span>
|
||||
<span class="c1"># SPDX-License-Identifier: BSD-3-Clause</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># Redistribution and use in source and binary forms, with or without</span>
|
||||
<span class="c1"># modification, are permitted provided that the following conditions are met:</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 1. Redistributions of source code must retain the above copyright notice, this</span>
|
||||
<span class="c1"># list of conditions and the following disclaimer.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 2. Redistributions in binary form must reproduce the above copyright notice,</span>
|
||||
<span class="c1"># this list of conditions and the following disclaimer in the documentation</span>
|
||||
<span class="c1"># and/or other materials provided with the distribution.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 3. Neither the name of the copyright holder nor the names of its</span>
|
||||
<span class="c1"># contributors may be used to endorse or promote products derived from</span>
|
||||
<span class="c1"># this software without specific prior written permission.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"</span>
|
||||
<span class="c1"># AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE</span>
|
||||
<span class="c1"># IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE</span>
|
||||
<span class="c1"># DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE</span>
|
||||
<span class="c1"># FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL</span>
|
||||
<span class="c1"># DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR</span>
|
||||
<span class="c1"># SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER</span>
|
||||
<span class="c1"># CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,</span>
|
||||
<span class="c1"># OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE</span>
|
||||
<span class="c1"># OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1">#################################################################################################</span>
|
||||
|
||||
<span class="sd">"""</span>
|
||||
<span class="sd">Utility functions for checking constraints on kernels and calculating kernel attributes</span>
|
||||
<span class="sd">"""</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">ctypes</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">cutlass_bindings</span>
|
||||
<span class="kn">import</span> <span class="nn">cutlass</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.backend.library</span> <span class="kn">import</span> <span class="n">DataTypeSize</span><span class="p">,</span> <span class="n">TileDescription</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="calculate_smem_usage_per_stage"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.check.calculate_smem_usage_per_stage">[docs]</a><span class="k">def</span> <span class="nf">calculate_smem_usage_per_stage</span><span class="p">(</span><span class="n">tile_description</span><span class="p">,</span> <span class="n">operation_kind</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the amount of shared memory in bytes consumed in a single stage of a kernel.</span>
|
||||
|
||||
<span class="sd"> :return: number of bytes of shared memory consumed by a single stage</span>
|
||||
<span class="sd"> :rtype: int</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">,</span> <span class="n">k</span> <span class="o">=</span> <span class="n">tile_description</span><span class="o">.</span><span class="n">threadblock_shape</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">operation_kind</span> <span class="o">==</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OperationKind</span><span class="o">.</span><span class="n">Gemm</span><span class="p">:</span>
|
||||
<span class="n">stage_barrier_bytes</span> <span class="o">=</span> <span class="mi">32</span>
|
||||
<span class="k">return</span> <span class="p">(</span>
|
||||
<span class="p">(</span><span class="n">DataTypeSize</span><span class="p">[</span><span class="n">tile_description</span><span class="o">.</span><span class="n">math_instruction</span><span class="o">.</span><span class="n">element_a</span><span class="p">]</span> <span class="o">*</span> <span class="n">m</span> <span class="o">*</span> <span class="n">k</span> <span class="o">//</span> <span class="mi">8</span><span class="p">)</span>
|
||||
<span class="o">+</span> <span class="p">(</span><span class="n">DataTypeSize</span><span class="p">[</span><span class="n">tile_description</span><span class="o">.</span><span class="n">math_instruction</span><span class="o">.</span><span class="n">element_b</span><span class="p">]</span> <span class="o">*</span> <span class="n">k</span> <span class="o">*</span> <span class="n">n</span> <span class="o">//</span> <span class="mi">8</span><span class="p">)</span>
|
||||
<span class="o">+</span> <span class="n">stage_barrier_bytes</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"No available shared memory calculation for operation kind </span><span class="si">{</span><span class="n">operation</span><span class="o">.</span><span class="n">operation_kind</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="calculate_smem_usage"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.check.calculate_smem_usage">[docs]</a><span class="k">def</span> <span class="nf">calculate_smem_usage</span><span class="p">(</span><span class="n">operation</span><span class="p">):</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns the amount of shared memory in bytes consumed by a kernel.</span>
|
||||
|
||||
<span class="sd"> :return: number of bytes of shared memory consumed by the operation</span>
|
||||
<span class="sd"> :return: int</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">_per_stage</span> <span class="o">=</span> <span class="n">calculate_smem_usage_per_stage</span><span class="p">(</span><span class="n">operation</span><span class="o">.</span><span class="n">tile_description</span><span class="p">,</span> <span class="n">operation</span><span class="o">.</span><span class="n">operation_kind</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">_per_stage</span> <span class="o">*</span> <span class="n">operation</span><span class="o">.</span><span class="n">tile_description</span><span class="o">.</span><span class="n">stages</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="valid_stage_count"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.check.valid_stage_count">[docs]</a><span class="k">def</span> <span class="nf">valid_stage_count</span><span class="p">(</span><span class="n">cc</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">td</span><span class="p">:</span> <span class="n">TileDescription</span><span class="p">)</span> <span class="o">-></span> <span class="nb">tuple</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Checks whether a device with `cc` supports the number of stages within `tile_description`, both</span>
|
||||
<span class="sd"> based on raw limits on the number of stages and based on shared memory capacity</span>
|
||||
|
||||
<span class="sd"> :param cc: compute capability of device in question</span>
|
||||
<span class="sd"> :type cc: int</span>
|
||||
<span class="sd"> :param td: tile description to check</span>
|
||||
<span class="sd"> :type td: TileDescription</span>
|
||||
|
||||
<span class="sd"> :return: tuple with the first element indicating whether the provided tile description is</span>
|
||||
<span class="sd"> valid for the provided device and the second element being an error message</span>
|
||||
<span class="sd"> :rtype: tuple</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">cc</span> <span class="o">==</span> <span class="mi">90</span> <span class="ow">and</span> <span class="p">(</span><span class="n">td</span><span class="o">.</span><span class="n">stages</span> <span class="ow">is</span> <span class="kc">None</span> <span class="ow">or</span> <span class="n">td</span><span class="o">.</span><span class="n">stages</span> <span class="o">==</span> <span class="mi">0</span><span class="p">):</span>
|
||||
<span class="c1"># Stage count of None or 0 for SM90 indicates that the CollectiveBuilder automatically</span>
|
||||
<span class="c1"># determines the stage count to use. Thus, all settings are valid in these scenarios.</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="kc">True</span><span class="p">,</span> <span class="s2">""</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">td</span><span class="o">.</span><span class="n">stages</span> <span class="o"><=</span> <span class="mi">0</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="kc">False</span><span class="p">,</span> <span class="sa">f</span><span class="s2">"Stage counts must be positive integers. Tile description has stage count of </span><span class="si">{</span><span class="n">td</span><span class="o">.</span><span class="n">stages</span><span class="si">}</span><span class="s2">."</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">cc</span> <span class="o"><</span> <span class="mi">80</span> <span class="ow">and</span> <span class="n">td</span><span class="o">.</span><span class="n">stages</span> <span class="o">!=</span> <span class="mi">2</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="kc">False</span><span class="p">,</span> <span class="sa">f</span><span class="s2">"Tile description has stage count of </span><span class="si">{</span><span class="n">td</span><span class="o">.</span><span class="n">stages</span><span class="si">}</span><span class="s2">, "</span>
|
||||
<span class="sa">f</span><span class="s2">"but only 2 stages are supported on SM</span><span class="si">{</span><span class="n">cc</span><span class="si">}</span><span class="s2">."</span><span class="p">)</span>
|
||||
|
||||
<span class="n">smem_per_stage</span> <span class="o">=</span> <span class="n">calculate_smem_usage_per_stage</span><span class="p">(</span><span class="n">td</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OperationKind</span><span class="o">.</span><span class="n">Gemm</span><span class="p">)</span>
|
||||
<span class="n">smem_arch</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">SharedMemPerCC</span><span class="p">[</span><span class="n">cc</span><span class="p">]</span> <span class="o"><<</span> <span class="mi">10</span>
|
||||
<span class="k">if</span> <span class="p">(</span><span class="n">smem_per_stage</span> <span class="o">*</span> <span class="n">td</span><span class="o">.</span><span class="n">stages</span><span class="p">)</span> <span class="o">></span> <span class="n">smem_arch</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="p">(</span> <span class="kc">False</span><span class="p">,</span>
|
||||
<span class="s2">"Configuration uses too much shared memory. Consider reducing stage count or tile shape.</span><span class="se">\n</span><span class="s2">"</span>
|
||||
<span class="sa">f</span><span class="s2">"Details: configuration uses </span><span class="si">{</span><span class="n">smem_per_stage</span><span class="si">}</span><span class="s2"> bytes of shared memory per stage, and "</span>
|
||||
<span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">td</span><span class="o">.</span><span class="n">stages</span><span class="si">}</span><span class="s2"> stages for a total of </span><span class="si">{</span><span class="n">smem_per_stage</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">td</span><span class="o">.</span><span class="n">stages</span><span class="si">}</span><span class="s2"> bytes.</span><span class="se">\n</span><span class="s2">"</span>
|
||||
<span class="sa">f</span><span class="s2">"The maxmium amoung of shared memory that can be used per block on CC </span><span class="si">{</span><span class="n">cc</span><span class="si">}</span><span class="s2"> is </span><span class="si">{</span><span class="n">smem_arch</span><span class="si">}</span><span class="s2">."</span><span class="p">)</span>
|
||||
|
||||
<span class="k">return</span> <span class="p">(</span><span class="kc">True</span><span class="p">,</span> <span class="s2">""</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="valid_cluster_shape"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.check.valid_cluster_shape">[docs]</a><span class="k">def</span> <span class="nf">valid_cluster_shape</span><span class="p">(</span><span class="n">cc</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">cluster_shape</span><span class="p">:</span> <span class="nb">list</span><span class="p">)</span> <span class="o">-></span> <span class="nb">tuple</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Checks whether a device with `cc` supports a thread block cluster of shape `cluster_shape`.</span>
|
||||
|
||||
<span class="sd"> :param cc: compute capability of device in question</span>
|
||||
<span class="sd"> :type cc: int</span>
|
||||
<span class="sd"> :param cluster_shape: dimensions of thread block cluster shape to check</span>
|
||||
<span class="sd"> :type cluster_shape: list</span>
|
||||
|
||||
<span class="sd"> :return: tuple with the first element indicating whether the provided cluster shape is</span>
|
||||
<span class="sd"> valid for the provided device and the second element being an error message</span>
|
||||
<span class="sd"> :rtype: tuple</span>
|
||||
<span class="sd"> """</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">cc</span> <span class="o"><</span> <span class="mi">90</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">cluster_shape</span> <span class="o">!=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">]:</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="kc">False</span><span class="p">,</span>
|
||||
<span class="sa">f</span><span class="s2">"Cluster shape for pre-SM90 architectures must be [1, 1, 1]. Received cluster shape of "</span>
|
||||
<span class="sa">f</span><span class="s2">"</span><span class="si">{</span><span class="n">cluster_shape</span><span class="si">}</span><span class="s2"> for SM</span><span class="si">{</span><span class="n">cc</span><span class="si">}</span><span class="s2">."</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="kc">True</span><span class="p">,</span> <span class="s2">""</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">cluster_shape</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">3</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="kc">False</span><span class="p">,</span>
|
||||
<span class="sa">f</span><span class="s2">"Cluster shapes must be rank-3. Received </span><span class="si">{</span><span class="n">cluster_shape</span><span class="si">}</span><span class="s2"> (rank </span><span class="si">{</span><span class="nb">len</span><span class="p">(</span><span class="n">cluster_shape</span><span class="p">)</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
|
||||
|
||||
<span class="k">if</span> <span class="n">cluster_shape</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="kc">False</span><span class="p">,</span>
|
||||
<span class="s2">"CUTLASS kernels currently require the third dimension of cluster shape to be 1. "</span>
|
||||
<span class="sa">f</span><span class="s2">"Received cluster shape of </span><span class="si">{</span><span class="n">cluster_shape</span><span class="si">}</span><span class="s2">."</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># The CUDA programming guide currently defines a maximum of 8 thread blocks per cluster</span>
|
||||
<span class="c1"># as being portably supported (https://docs.nvidia.com/cuda/cuda-c-programming-guide/#thread-block-clusters).</span>
|
||||
<span class="c1"># Current CUTLASS kernels only have non-unit cluster dimensions within the first two dimensions,</span>
|
||||
<span class="c1"># so we check that the first two dimensions of the cluster shape do not exceed 8 thread blocks in total.</span>
|
||||
<span class="n">blocks_in_2d</span> <span class="o">=</span> <span class="n">cluster_shape</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">cluster_shape</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
|
||||
<span class="k">if</span> <span class="n">blocks_in_2d</span> <span class="o">></span> <span class="mi">8</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="kc">False</span><span class="p">,</span>
|
||||
<span class="sa">f</span><span class="s2">"Thread block clusters with more than 8 thread blocks are currently unsupported on SM</span><span class="si">{</span><span class="n">cc</span><span class="si">}</span><span class="s2">. "</span>
|
||||
<span class="sa">f</span><span class="s2">"Received cluster shape </span><span class="si">{</span><span class="n">cluster_shape</span><span class="si">}</span><span class="s2">, which has </span><span class="si">{</span><span class="n">blocks_in_2d</span><span class="si">}</span><span class="s2"> thread blocks."</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="kc">True</span><span class="p">,</span> <span class="s2">""</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="valid_kernel_schedule"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.check.valid_kernel_schedule">[docs]</a><span class="k">def</span> <span class="nf">valid_kernel_schedule</span><span class="p">(</span><span class="n">cc</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">kernel_schedule</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">KernelScheduleType</span><span class="p">)</span> <span class="o">-></span> <span class="nb">tuple</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Checks whether a device with ``cc`` supports ``kernel_schedule``.</span>
|
||||
|
||||
<span class="sd"> :param cc: compute capability of device in question</span>
|
||||
<span class="sd"> :type cc: int</span>
|
||||
<span class="sd"> :param kernel_schedule: kernel schedule type</span>
|
||||
<span class="sd"> :type KernelScheduleType: cutlass.KernelScheduleType</span>
|
||||
|
||||
<span class="sd"> :return: tuple with the first element indicating whether the provided kernel schedule is</span>
|
||||
<span class="sd"> valid for the provided device and the second element being an error message</span>
|
||||
<span class="sd"> :rtype: tuple</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">kernel_schedule</span> <span class="o">!=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">KernelScheduleType</span><span class="o">.</span><span class="n">ScheduleAuto</span> <span class="ow">and</span> <span class="n">cc</span> <span class="o"><</span> <span class="mi">90</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="kc">False</span><span class="p">,</span> <span class="s2">"Non-default kernel schedules are only supported on SM90 and beyond"</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="kc">True</span><span class="p">,</span> <span class="s2">""</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="alignment_or_default"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.check.alignment_or_default">[docs]</a><span class="k">def</span> <span class="nf">alignment_or_default</span><span class="p">(</span><span class="n">alignment_provided</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">default_alignment</span><span class="p">:</span> <span class="nb">int</span><span class="p">)</span> <span class="o">-></span> <span class="nb">int</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Returns `alignment_provided` if it is set, otherwise `default_alignment` and checks</span>
|
||||
<span class="sd"> that `alignment_provided` does not exceed `default_alignment`.</span>
|
||||
|
||||
<span class="sd"> :param alignment_provided: alignment preference specified. Can be None.</span>
|
||||
<span class="sd"> :type alignment_provided: int</span>
|
||||
<span class="sd"> :param default_alignment: alignment to use if `alignment_provided` is None</span>
|
||||
<span class="sd"> :type default_alignment: int</span>
|
||||
|
||||
<span class="sd"> :return: alignment to use</span>
|
||||
<span class="sd"> :rtype: int</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">if</span> <span class="n">alignment_provided</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">alignment_provided</span> <span class="o">></span> <span class="n">default_alignment</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Alignment </span><span class="si">{</span><span class="n">alignment_provided</span><span class="si">}</span><span class="s2"> exceeds the maximum supported of </span><span class="si">{</span><span class="n">default_alignment</span><span class="si">}</span><span class="s2">."</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">alignment_provided</span>
|
||||
|
||||
<span class="k">return</span> <span class="n">default_alignment</span></div>
|
||||
</pre></div>
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
|
||||
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer no-toc">
|
||||
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
|
||||
<script src="../../../_static/doctools.js"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../../_static/scripts/furo.js"></script>
|
||||
<script src="../../../_static/clipboard.min.js"></script>
|
||||
<script src="../../../_static/copybutton.js"></script>
|
||||
<script src="../../../_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
623
python/docs/_modules/cutlass/utils/datatypes.html
Normal file
@ -0,0 +1,623 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><link rel="index" title="Index" href="../../../genindex.html" /><link rel="search" title="Search" href="../../../search.html" />
|
||||
<link rel="canonical" href="docs/_modules/cutlass/utils/datatypes.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>cutlass.utils.datatypes - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../../../_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="../../../index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="../../../index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="../../../_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="../../../_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="../../../search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../../../contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../../../modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="../../../cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../../cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../../cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../../../cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../../../examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../../externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../../externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../../../externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<h1>Source code for cutlass.utils.datatypes</h1><div class="highlight"><pre>
|
||||
<span></span><span class="c1">#################################################################################################</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># Copyright (c) 2023 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.</span>
|
||||
<span class="c1"># SPDX-License-Identifier: BSD-3-Clause</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># Redistribution and use in source and binary forms, with or without</span>
|
||||
<span class="c1"># modification, are permitted provided that the following conditions are met:</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 1. Redistributions of source code must retain the above copyright notice, this</span>
|
||||
<span class="c1"># list of conditions and the following disclaimer.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 2. Redistributions in binary form must reproduce the above copyright notice,</span>
|
||||
<span class="c1"># this list of conditions and the following disclaimer in the documentation</span>
|
||||
<span class="c1"># and/or other materials provided with the distribution.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># 3. Neither the name of the copyright holder nor the names of its</span>
|
||||
<span class="c1"># contributors may be used to endorse or promote products derived from</span>
|
||||
<span class="c1"># this software without specific prior written permission.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1"># THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"</span>
|
||||
<span class="c1"># AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE</span>
|
||||
<span class="c1"># IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE</span>
|
||||
<span class="c1"># DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE</span>
|
||||
<span class="c1"># FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL</span>
|
||||
<span class="c1"># DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR</span>
|
||||
<span class="c1"># SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER</span>
|
||||
<span class="c1"># CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,</span>
|
||||
<span class="c1"># OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE</span>
|
||||
<span class="c1"># OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</span>
|
||||
<span class="c1">#</span>
|
||||
<span class="c1">#################################################################################################</span>
|
||||
|
||||
<span class="sd">"""</span>
|
||||
<span class="sd">Utility functions for converting between frontend datatypes and CUTLASS datatypes</span>
|
||||
<span class="sd">"""</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">cutlass_bindings</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">cutlass</span>
|
||||
<span class="kn">from</span> <span class="nn">cutlass.backend.library</span> <span class="kn">import</span> <span class="p">(</span>
|
||||
<span class="n">DataTypeSize</span><span class="p">,</span>
|
||||
<span class="n">MathInstruction</span><span class="p">,</span>
|
||||
<span class="n">MathOperation</span><span class="p">,</span>
|
||||
<span class="n">ShortLayoutTypeNames</span><span class="p">,</span>
|
||||
<span class="n">TileDescription</span><span class="p">,</span>
|
||||
<span class="p">)</span>
|
||||
|
||||
<span class="k">try</span><span class="p">:</span>
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
|
||||
<span class="n">numpy_available</span> <span class="o">=</span> <span class="kc">True</span>
|
||||
<span class="n">_library_to_numpy_dict</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f16</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">float16</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f64</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">float64</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">s8</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">int8</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">s32</span><span class="p">:</span> <span class="n">np</span><span class="o">.</span><span class="n">int32</span><span class="p">,</span>
|
||||
<span class="p">}</span>
|
||||
<span class="k">except</span> <span class="ne">ImportError</span><span class="p">:</span>
|
||||
<span class="n">numpy_available</span> <span class="o">=</span> <span class="kc">False</span>
|
||||
<span class="n">_library_to_numpy_dict</span> <span class="o">=</span> <span class="p">{}</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="numpy_library_type"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.numpy_library_type">[docs]</a><span class="k">def</span> <span class="nf">numpy_library_type</span><span class="p">(</span><span class="n">inp</span><span class="p">)</span> <span class="o">-></span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">numpy_available</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">inp</span> <span class="o">==</span> <span class="n">np</span><span class="o">.</span><span class="n">float16</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f16</span>
|
||||
<span class="k">elif</span> <span class="n">inp</span> <span class="o">==</span> <span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span>
|
||||
<span class="k">elif</span> <span class="n">inp</span> <span class="o">==</span> <span class="n">np</span><span class="o">.</span><span class="n">float64</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f64</span>
|
||||
<span class="k">elif</span> <span class="n">inp</span> <span class="o">==</span> <span class="n">np</span><span class="o">.</span><span class="n">int8</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">s8</span>
|
||||
<span class="k">elif</span> <span class="n">inp</span> <span class="o">==</span> <span class="n">np</span><span class="o">.</span><span class="n">int32</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">s32</span>
|
||||
<span class="k">return</span> <span class="kc">None</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="numpy_type"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.numpy_type">[docs]</a><span class="k">def</span> <span class="nf">numpy_type</span><span class="p">(</span><span class="n">inp</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="n">_library_to_numpy_dict</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">inp</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<span class="k">try</span><span class="p">:</span>
|
||||
<span class="kn">import</span> <span class="nn">cupy</span> <span class="k">as</span> <span class="nn">cp</span>
|
||||
|
||||
<span class="n">cupy_available</span> <span class="o">=</span> <span class="kc">True</span>
|
||||
<span class="n">_library_to_cupy_dict</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f16</span><span class="p">:</span> <span class="n">cp</span><span class="o">.</span><span class="n">float16</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">:</span> <span class="n">cp</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f64</span><span class="p">:</span> <span class="n">cp</span><span class="o">.</span><span class="n">float64</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">s8</span><span class="p">:</span> <span class="n">cp</span><span class="o">.</span><span class="n">int8</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">s32</span><span class="p">:</span> <span class="n">cp</span><span class="o">.</span><span class="n">int32</span><span class="p">,</span>
|
||||
<span class="p">}</span>
|
||||
<span class="k">except</span> <span class="ne">ImportError</span><span class="p">:</span>
|
||||
<span class="n">cupy_available</span> <span class="o">=</span> <span class="kc">False</span>
|
||||
<span class="n">_library_to_cupy_dict</span> <span class="o">=</span> <span class="p">{}</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="cupy_library_type"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.cupy_library_type">[docs]</a><span class="k">def</span> <span class="nf">cupy_library_type</span><span class="p">(</span><span class="n">inp</span><span class="p">)</span> <span class="o">-></span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">cupy_available</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">inp</span> <span class="o">==</span> <span class="n">cp</span><span class="o">.</span><span class="n">float16</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f16</span>
|
||||
<span class="k">elif</span> <span class="n">inp</span> <span class="o">==</span> <span class="n">cp</span><span class="o">.</span><span class="n">float32</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span>
|
||||
<span class="k">elif</span> <span class="n">inp</span> <span class="o">==</span> <span class="n">cp</span><span class="o">.</span><span class="n">float64</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f64</span>
|
||||
<span class="k">return</span> <span class="kc">None</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="cupy_type"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.cupy_type">[docs]</a><span class="k">def</span> <span class="nf">cupy_type</span><span class="p">(</span><span class="n">inp</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="n">_library_to_cupy_dict</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">inp</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<span class="k">try</span><span class="p">:</span>
|
||||
<span class="kn">import</span> <span class="nn">torch</span>
|
||||
|
||||
<span class="n">torch_available</span> <span class="o">=</span> <span class="kc">True</span>
|
||||
<span class="n">_torch_to_library_dict</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="n">torch</span><span class="o">.</span><span class="n">half</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f16</span><span class="p">,</span>
|
||||
<span class="n">torch</span><span class="o">.</span><span class="n">float16</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f16</span><span class="p">,</span>
|
||||
<span class="n">torch</span><span class="o">.</span><span class="n">float</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">,</span>
|
||||
<span class="n">torch</span><span class="o">.</span><span class="n">float32</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">,</span>
|
||||
<span class="n">torch</span><span class="o">.</span><span class="n">double</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f64</span><span class="p">,</span>
|
||||
<span class="n">torch</span><span class="o">.</span><span class="n">float64</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f64</span><span class="p">,</span>
|
||||
<span class="p">}</span>
|
||||
|
||||
<span class="n">_library_to_torch_dict</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f16</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">half</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f16</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">float16</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">float</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f64</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">double</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f64</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">float64</span><span class="p">,</span>
|
||||
<span class="p">}</span>
|
||||
<span class="k">except</span> <span class="ne">ImportError</span><span class="p">:</span>
|
||||
<span class="n">torch_available</span> <span class="o">=</span> <span class="kc">False</span>
|
||||
<span class="n">_torch_to_library_dict</span> <span class="o">=</span> <span class="p">{}</span>
|
||||
<span class="n">_library_to_torch_dict</span> <span class="o">=</span> <span class="p">{}</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="torch_library_type"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.torch_library_type">[docs]</a><span class="k">def</span> <span class="nf">torch_library_type</span><span class="p">(</span><span class="n">inp</span><span class="p">)</span> <span class="o">-></span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">_torch_to_library_dict</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">inp</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="torch_type"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.torch_type">[docs]</a><span class="k">def</span> <span class="nf">torch_type</span><span class="p">(</span><span class="n">inp</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="n">_library_to_torch_dict</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">inp</span><span class="p">,</span> <span class="kc">None</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<span class="k">try</span><span class="p">:</span>
|
||||
<span class="kn">import</span> <span class="nn">bfloat16</span>
|
||||
|
||||
<span class="n">bfloat16_available</span> <span class="o">=</span> <span class="kc">True</span>
|
||||
<span class="k">except</span> <span class="ne">ImportError</span><span class="p">:</span>
|
||||
<span class="n">bfloat16_available</span> <span class="o">=</span> <span class="kc">False</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="bfloat16_library_type"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.bfloat16_library_type">[docs]</a><span class="k">def</span> <span class="nf">bfloat16_library_type</span><span class="p">(</span><span class="n">inp</span><span class="p">)</span> <span class="o">-></span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">bfloat16_available</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">inp</span> <span class="o">==</span> <span class="n">bfloat16</span><span class="o">.</span><span class="n">bfloat16</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">bf16</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="bfloat16_type"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.bfloat16_type">[docs]</a><span class="k">def</span> <span class="nf">bfloat16_type</span><span class="p">(</span><span class="n">inp</span><span class="p">)</span> <span class="o">-></span> <span class="n">bfloat16</span><span class="o">.</span><span class="n">bfloat16</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">bfloat16_available</span><span class="p">:</span>
|
||||
<span class="k">if</span> <span class="n">inp</span> <span class="o">==</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">bf16</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">bfloat16</span><span class="o">.</span><span class="n">bfloat16</span></div>
|
||||
|
||||
|
||||
<span class="c1"># Mapping from library data type to Python-bound CUTLASS data type</span>
|
||||
<span class="n">library_to_binding_dict</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">s8</span><span class="p">:</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">int8</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">s32</span><span class="p">:</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">int32</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f16</span><span class="p">:</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">float16</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">bf16</span><span class="p">:</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">bfloat16</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">:</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f64</span><span class="p">:</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">float64</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">tf32</span><span class="p">:</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">tfloat32</span><span class="p">,</span>
|
||||
<span class="p">}</span>
|
||||
|
||||
<span class="c1"># Mapping from Python-bound CUTLASS data type to library data type</span>
|
||||
<span class="n">binding_to_library</span> <span class="o">=</span> <span class="p">{</span>
|
||||
<span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">int8</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">s8</span><span class="p">,</span>
|
||||
<span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">int32</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">s32</span><span class="p">,</span>
|
||||
<span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">float16</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f16</span><span class="p">,</span>
|
||||
<span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">bfloat16</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">bf16</span><span class="p">,</span>
|
||||
<span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">float32</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">,</span>
|
||||
<span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">float64</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f64</span><span class="p">,</span>
|
||||
<span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">tfloat32</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">tf32</span><span class="p">,</span>
|
||||
<span class="p">}</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="binding_library_type"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.binding_library_type">[docs]</a><span class="k">def</span> <span class="nf">binding_library_type</span><span class="p">(</span><span class="n">inp</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">inp</span> <span class="ow">in</span> <span class="n">binding_to_library</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">binding_to_library</span><span class="p">[</span><span class="n">inp</span><span class="p">]</span>
|
||||
<span class="k">return</span> <span class="kc">None</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="has_binding_type"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.has_binding_type">[docs]</a><span class="k">def</span> <span class="nf">has_binding_type</span><span class="p">(</span><span class="n">inp</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="n">inp</span> <span class="ow">in</span> <span class="n">library_to_binding_dict</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="library_to_binding"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.library_to_binding">[docs]</a><span class="k">def</span> <span class="nf">library_to_binding</span><span class="p">(</span><span class="n">inp</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="ow">not</span> <span class="n">has_binding_type</span><span class="p">(</span><span class="n">inp</span><span class="p">):</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"No available conversion from library type </span><span class="si">{</span><span class="n">inp</span><span class="si">}</span><span class="s2"> to Python-bound CUTLASS type"</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">library_to_binding_dict</span><span class="p">[</span><span class="n">inp</span><span class="p">]</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="library_type"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.library_type">[docs]</a><span class="k">def</span> <span class="nf">library_type</span><span class="p">(</span><span class="n">inp</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">inp</span> <span class="ow">in</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">DataTypeSize</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span>
|
||||
<span class="k">return</span> <span class="n">inp</span>
|
||||
|
||||
<span class="k">for</span> <span class="n">cvt_fn</span> <span class="ow">in</span> <span class="p">[</span>
|
||||
<span class="n">bfloat16_library_type</span><span class="p">,</span>
|
||||
<span class="n">cupy_library_type</span><span class="p">,</span>
|
||||
<span class="n">numpy_library_type</span><span class="p">,</span>
|
||||
<span class="n">torch_library_type</span><span class="p">,</span>
|
||||
<span class="n">binding_library_type</span><span class="p">,</span>
|
||||
<span class="p">]:</span>
|
||||
<span class="n">out</span> <span class="o">=</span> <span class="n">cvt_fn</span><span class="p">(</span><span class="n">inp</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">out</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">out</span>
|
||||
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"No available conversion from type </span><span class="si">{</span><span class="n">inp</span><span class="si">}</span><span class="s2"> to a library type."</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="library_layout"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.library_layout">[docs]</a><span class="k">def</span> <span class="nf">library_layout</span><span class="p">(</span><span class="n">layout</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">layout</span> <span class="ow">in</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutTag</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span>
|
||||
<span class="k">return</span> <span class="n">layout</span>
|
||||
|
||||
<span class="c1"># Convert Python-bound CUTLASS layout to profiler library layout</span>
|
||||
<span class="k">if</span> <span class="n">layout</span> <span class="o">==</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span>
|
||||
<span class="k">elif</span> <span class="n">layout</span> <span class="o">==</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">ColumnMajor</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">ColumnMajor</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"No conversion available for layout </span><span class="si">{</span><span class="n">layout</span><span class="si">}</span><span class="s2"> to library layout."</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="binding_type"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.binding_type">[docs]</a><span class="k">def</span> <span class="nf">binding_type</span><span class="p">(</span><span class="n">inp</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">inp</span> <span class="ow">in</span> <span class="n">DataTypeSize</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span>
|
||||
<span class="k">return</span> <span class="n">inp</span>
|
||||
|
||||
<span class="n">libtype</span> <span class="o">=</span> <span class="n">library_type</span><span class="p">(</span><span class="n">inp</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">library_to_binding</span><span class="p">(</span><span class="n">libtype</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="binding_layout"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.binding_layout">[docs]</a><span class="k">def</span> <span class="nf">binding_layout</span><span class="p">(</span><span class="n">layout</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">layout</span> <span class="ow">in</span> <span class="n">ShortLayoutTypeNames</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span>
|
||||
<span class="k">return</span> <span class="n">layout</span>
|
||||
<span class="k">elif</span> <span class="n">layout</span> <span class="o">==</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">RowMajor</span>
|
||||
<span class="k">elif</span> <span class="n">layout</span> <span class="o">==</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">ColumnMajor</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">ColumnMajor</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"No conversion available for layout </span><span class="si">{</span><span class="n">layout</span><span class="si">}</span><span class="s2"> to Python-bound CUTLASS layout."</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_tensor_from_numpy</span><span class="p">(</span><span class="n">np_tensor</span><span class="p">):</span>
|
||||
<span class="n">dtype</span> <span class="o">=</span> <span class="n">library_type</span><span class="p">(</span><span class="n">np_tensor</span><span class="o">.</span><span class="n">dtype</span><span class="p">)</span>
|
||||
<span class="k">if</span> <span class="n">np_tensor</span><span class="o">.</span><span class="n">flags</span><span class="o">.</span><span class="n">c_contiguous</span><span class="p">:</span>
|
||||
<span class="n">layout</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span>
|
||||
<span class="k">elif</span> <span class="n">np_tensor</span><span class="o">.</span><span class="n">flags</span><span class="o">.</span><span class="n">f_contiguous</span><span class="p">:</span>
|
||||
<span class="n">layout</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">ColumnMajor</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="n">dtype</span><span class="p">,</span> <span class="n">layout</span><span class="p">)</span>
|
||||
|
||||
|
||||
<span class="k">def</span> <span class="nf">_tensor_from_torch</span><span class="p">(</span><span class="n">pt_tensor</span><span class="p">):</span>
|
||||
<span class="n">dtype</span> <span class="o">=</span> <span class="n">library_type</span><span class="p">(</span><span class="n">pt_tensor</span><span class="o">.</span><span class="n">dtype</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="p">(</span><span class="n">dtype</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">)</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="get_datatype_and_layout"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.get_datatype_and_layout">[docs]</a><span class="k">def</span> <span class="nf">get_datatype_and_layout</span><span class="p">(</span><span class="n">tensor</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="p">(</span><span class="n">numpy_available</span> <span class="ow">and</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">tensor</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">ndarray</span><span class="p">))</span> <span class="ow">or</span> <span class="p">(</span>
|
||||
<span class="n">cupy_available</span> <span class="ow">and</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">tensor</span><span class="p">,</span> <span class="n">cp</span><span class="o">.</span><span class="n">ndarray</span><span class="p">)</span>
|
||||
<span class="p">):</span>
|
||||
<span class="k">return</span> <span class="n">_tensor_from_numpy</span><span class="p">(</span><span class="n">tensor</span><span class="p">)</span>
|
||||
<span class="k">elif</span> <span class="n">torch_available</span> <span class="ow">and</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">tensor</span><span class="p">,</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">):</span>
|
||||
<span class="k">return</span> <span class="n">_tensor_from_torch</span><span class="p">(</span><span class="n">tensor</span><span class="p">)</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Unable to convert tensor of type </span><span class="si">{</span><span class="nb">type</span><span class="p">(</span><span class="n">tensor</span><span class="p">)</span><span class="si">}</span><span class="s2"> to Python-bound CUTLASS datatype and layout."</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="binding_opclass"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.binding_opclass">[docs]</a><span class="k">def</span> <span class="nf">binding_opclass</span><span class="p">(</span><span class="n">opclass</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">opclass</span> <span class="o">==</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">TensorOp</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">OpClass</span><span class="o">.</span><span class="n">TensorOp</span>
|
||||
<span class="k">elif</span> <span class="n">opclass</span> <span class="o">==</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">Simt</span><span class="p">:</span>
|
||||
<span class="k">return</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">OpClass</span><span class="o">.</span><span class="n">Simt</span>
|
||||
<span class="k">else</span><span class="p">:</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Unable to convert opcode class of type </span><span class="si">{</span><span class="n">opclass</span><span class="si">}</span><span class="s2"> to Python-bound CUTLASS opcode class."</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<span class="n">_math_operation_value_map</span> <span class="o">=</span> <span class="p">{</span><span class="n">x</span><span class="o">.</span><span class="n">value</span><span class="p">:</span> <span class="n">x</span> <span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">MathOperation</span><span class="p">}</span>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="backend_math_operation"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.backend_math_operation">[docs]</a><span class="k">def</span> <span class="nf">backend_math_operation</span><span class="p">(</span><span class="n">math_op</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">MathOperation</span><span class="p">):</span>
|
||||
<span class="k">if</span> <span class="n">math_op</span><span class="o">.</span><span class="n">value</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">_math_operation_value_map</span><span class="o">.</span><span class="n">keys</span><span class="p">():</span>
|
||||
<span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Unable to convert math operation of type </span><span class="si">{</span><span class="n">math_op</span><span class="si">}</span><span class="s2"> to backend math operation."</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">_math_operation_value_map</span><span class="p">[</span><span class="n">math_op</span><span class="o">.</span><span class="n">value</span><span class="p">]</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="construct_backend_td"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.construct_backend_td">[docs]</a><span class="k">def</span> <span class="nf">construct_backend_td</span><span class="p">(</span><span class="n">td</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">TileDescription</span><span class="p">,</span>
|
||||
<span class="n">kernel_schedule</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">KernelScheduleType</span><span class="p">)</span> <span class="o">-></span> <span class="n">TileDescription</span><span class="p">:</span>
|
||||
<span class="n">mi</span> <span class="o">=</span> <span class="n">td</span><span class="o">.</span><span class="n">math_instruction</span>
|
||||
<span class="n">backend_mi</span> <span class="o">=</span> <span class="n">MathInstruction</span><span class="p">(</span>
|
||||
<span class="n">mi</span><span class="o">.</span><span class="n">instruction_shape</span><span class="p">,</span>
|
||||
<span class="n">binding_type</span><span class="p">(</span><span class="n">mi</span><span class="o">.</span><span class="n">element_a</span><span class="p">),</span>
|
||||
<span class="n">binding_type</span><span class="p">(</span><span class="n">mi</span><span class="o">.</span><span class="n">element_b</span><span class="p">),</span>
|
||||
<span class="n">binding_type</span><span class="p">(</span><span class="n">mi</span><span class="o">.</span><span class="n">element_accumulator</span><span class="p">),</span>
|
||||
<span class="n">binding_opclass</span><span class="p">(</span><span class="n">mi</span><span class="o">.</span><span class="n">opcode_class</span><span class="p">),</span>
|
||||
<span class="n">backend_math_operation</span><span class="p">(</span><span class="n">mi</span><span class="o">.</span><span class="n">math_operation</span><span class="p">)</span>
|
||||
<span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">TileDescription</span><span class="p">(</span><span class="n">td</span><span class="o">.</span><span class="n">threadblock_shape</span><span class="p">,</span> <span class="n">td</span><span class="o">.</span><span class="n">stages</span><span class="p">,</span> <span class="n">td</span><span class="o">.</span><span class="n">warp_count</span><span class="p">,</span>
|
||||
<span class="n">backend_mi</span><span class="p">,</span> <span class="n">td</span><span class="o">.</span><span class="n">cluster_shape</span><span class="p">,</span> <span class="n">kernel_schedule</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="td_from_profiler_op"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.td_from_profiler_op">[docs]</a><span class="k">def</span> <span class="nf">td_from_profiler_op</span><span class="p">(</span><span class="n">op</span><span class="p">)</span> <span class="o">-></span> <span class="n">TileDescription</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Converts the profiler's TileDescription in ``op`` into the backend TileDescription</span>
|
||||
|
||||
<span class="sd"> :param op: profiler Operation</span>
|
||||
|
||||
<span class="sd"> :returns: backend TileDescription</span>
|
||||
<span class="sd"> :rtype: cutlass.backend.TileDescription</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="n">schedule</span> <span class="o">=</span> <span class="n">op</span><span class="o">.</span><span class="n">kernel_schedule</span> <span class="k">if</span> <span class="nb">hasattr</span><span class="p">(</span><span class="n">op</span><span class="p">,</span> <span class="s1">'kernel_schedule'</span><span class="p">)</span> <span class="k">else</span> <span class="kc">None</span>
|
||||
<span class="k">return</span> <span class="n">construct_backend_td</span><span class="p">(</span><span class="n">op</span><span class="o">.</span><span class="n">tile_description</span><span class="p">,</span> <span class="n">schedule</span><span class="p">)</span></div>
|
||||
|
||||
|
||||
<div class="viewcode-block" id="td_from_profiler_td"><a class="viewcode-back" href="../../../cutlass.utils.html#cutlass.utils.datatypes.td_from_profiler_td">[docs]</a><span class="k">def</span> <span class="nf">td_from_profiler_td</span><span class="p">(</span><span class="n">td</span><span class="p">:</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">backend</span><span class="o">.</span><span class="n">TileDescription</span><span class="p">)</span> <span class="o">-></span> <span class="n">TileDescription</span><span class="p">:</span>
|
||||
<span class="w"> </span><span class="sd">"""</span>
|
||||
<span class="sd"> Converts the profiler's TileDescription into the backend TileDescription</span>
|
||||
|
||||
<span class="sd"> :param td: profiler TileDescription</span>
|
||||
<span class="sd"> :type td: cutlass.TileDescription</span>
|
||||
|
||||
<span class="sd"> :returns: backend TileDescription</span>
|
||||
<span class="sd"> :rtype: cutlass.backend.TileDescription</span>
|
||||
<span class="sd"> """</span>
|
||||
<span class="k">return</span> <span class="n">construct_backend_td</span><span class="p">(</span><span class="n">td</span><span class="p">,</span> <span class="n">kernel_schedule</span><span class="o">=</span><span class="kc">None</span><span class="p">)</span></div>
|
||||
</pre></div>
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
|
||||
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer no-toc">
|
||||
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="../../../" id="documentation_options" src="../../../_static/documentation_options.js"></script>
|
||||
<script src="../../../_static/doctools.js"></script>
|
||||
<script src="../../../_static/sphinx_highlight.js"></script>
|
||||
<script src="../../../_static/scripts/furo.js"></script>
|
||||
<script src="../../../_static/clipboard.min.js"></script>
|
||||
<script src="../../../_static/copybutton.js"></script>
|
||||
<script src="../../../_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
293
python/docs/_modules/index.html
Normal file
@ -0,0 +1,293 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><link rel="index" title="Index" href="../genindex.html" /><link rel="search" title="Search" href="../search.html" />
|
||||
<link rel="canonical" href="docs/_modules/index.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>Overview: module code - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="../index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="../index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="../_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="../_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="../search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="../cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="../externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<h1>All modules for which code is available</h1>
|
||||
<ul><li><a href="cutlass/emit/pytorch.html">cutlass.emit.pytorch</a></li>
|
||||
<li><a href="cutlass/epilogue.html">cutlass.epilogue</a></li>
|
||||
<li><a href="cutlass/library_defaults.html">cutlass.library_defaults</a></li>
|
||||
<li><a href="cutlass/op/gemm.html">cutlass.op.gemm</a></li>
|
||||
<li><a href="cutlass/op/gemm_grouped.html">cutlass.op.gemm_grouped</a></li>
|
||||
<li><a href="cutlass/op/op.html">cutlass.op.op</a></li>
|
||||
<li><a href="cutlass/swizzle.html">cutlass.swizzle</a></li>
|
||||
<li><a href="cutlass/utils/check.html">cutlass.utils.check</a></li>
|
||||
<li><a href="cutlass/utils/datatypes.html">cutlass.utils.datatypes</a></li>
|
||||
</ul>
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
|
||||
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer no-toc">
|
||||
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
|
||||
<script src="../_static/doctools.js"></script>
|
||||
<script src="../_static/sphinx_highlight.js"></script>
|
||||
<script src="../_static/scripts/furo.js"></script>
|
||||
<script src="../_static/clipboard.min.js"></script>
|
||||
<script src="../_static/copybutton.js"></script>
|
||||
<script src="../_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
9
python/docs/_sources/contribute.md.txt
Normal file
@ -0,0 +1,9 @@
|
||||
# Contributing
|
||||
|
||||
Thank you for your interest in contributing to the CUTLASS Python interface. Based on the type of contribution, it will fall into two categories:
|
||||
|
||||
1. You want to report a bug, feature request, or documentation issue
|
||||
- File an [issue](https://github.com/NVIDIA/cutlass/issues/new/choose) describing what you encountered or what you want to see changed.
|
||||
- The CUTLASS team will evaluate the issues and triage them, scheduling them for a release. If you believe the issue needs priority attention, comment on the issue to notify the team.
|
||||
2. You want to implement a feature or bug-fix
|
||||
- We welcome contributions from the community. We recommend that you contribute via a [pull request](https://github.com/NVIDIA/cutlass/pulls). If you have questions about CUTLASS, consider asking a question via the [Discussions](https://github.com/NVIDIA/cutlass/discussions) tab. Please be sure to search through both existing issues and discussions to see whether your question has already been answered.
|
||||
18
python/docs/_sources/cutlass.emit.rst.txt
Normal file
@ -0,0 +1,18 @@
|
||||
Emitters
|
||||
========
|
||||
|
||||
Common
|
||||
------
|
||||
|
||||
.. automodule:: cutlass.emit.common
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
PyTorch
|
||||
-------
|
||||
|
||||
.. automodule:: cutlass.emit.pytorch
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
26
python/docs/_sources/cutlass.op.rst.txt
Normal file
@ -0,0 +1,26 @@
|
||||
Operations
|
||||
==========
|
||||
|
||||
GEMM
|
||||
----
|
||||
|
||||
.. automodule:: cutlass.op.gemm
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
Grouped GEMM
|
||||
------------
|
||||
|
||||
.. automodule:: cutlass.op.gemm_grouped
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
Operation
|
||||
---------
|
||||
|
||||
.. automodule:: cutlass.op.op
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
36
python/docs/_sources/cutlass.rst.txt
Normal file
@ -0,0 +1,36 @@
|
||||
CUTLASS
|
||||
=======
|
||||
|
||||
Subpackages
|
||||
-----------
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
|
||||
cutlass.emit
|
||||
cutlass.op
|
||||
cutlass.utils
|
||||
|
||||
Epilogue
|
||||
--------
|
||||
|
||||
.. automodule:: cutlass.epilogue
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
Library Defaults
|
||||
----------------
|
||||
|
||||
.. automodule:: cutlass.library_defaults
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
Swizzle
|
||||
----------
|
||||
|
||||
.. automodule:: cutlass.swizzle
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
18
python/docs/_sources/cutlass.utils.rst.txt
Normal file
@ -0,0 +1,18 @@
|
||||
Utilities
|
||||
=========
|
||||
|
||||
Checks
|
||||
------
|
||||
|
||||
.. automodule:: cutlass.utils.check
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
|
||||
Data Types
|
||||
----------
|
||||
|
||||
.. automodule:: cutlass.utils.datatypes
|
||||
:members:
|
||||
:undoc-members:
|
||||
:show-inheritance:
|
||||
9
python/docs/_sources/examples.rst.txt
Normal file
@ -0,0 +1,9 @@
|
||||
Examples
|
||||
==================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 5
|
||||
|
||||
Basic GEMM <externals/00_basic_gemm.nblink>
|
||||
Epilogue <externals/01_epilogue.nblink>
|
||||
PyTorch Extension <externals/02_pytorch_extension_grouped_gemm.nblink>
|
||||
3
python/docs/_sources/externals/00_basic_gemm.nblink.txt
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
{
|
||||
"path": "./../../../../examples/python/00_basic_gemm.ipynb"
|
||||
}
|
||||
3
python/docs/_sources/externals/01_epilogue.nblink.txt
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
{
|
||||
"path": "./../../../../examples/python/01_epilogue.ipynb"
|
||||
}
|
||||
3
python/docs/_sources/externals/02_pytorch_extension_grouped_gemm.nblink.txt
vendored
Normal file
@ -0,0 +1,3 @@
|
||||
{
|
||||
"path": "./../../../../examples/python/02_pytorch_extension_grouped_gemm.ipynb"
|
||||
}
|
||||
55
python/docs/_sources/index.rst.txt
Normal file
@ -0,0 +1,55 @@
|
||||
.. CUTLASS Python interface documentation master file, created by
|
||||
sphinx-quickstart on Mon Feb 13 17:57:39 2023.
|
||||
You can adapt this file completely to your liking, but it should at least
|
||||
contain the root `toctree` directive.
|
||||
|
||||
.. include:: ../../README.md
|
||||
:start-line: 1
|
||||
:parser: markdown
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
|
||||
Home <self>
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
:caption: Getting Started:
|
||||
|
||||
install.md
|
||||
Getting Started <externals/00_basic_gemm.nblink>
|
||||
contribute.md
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
:caption: Python Documentation:
|
||||
|
||||
modules.rst
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
:caption: Examples and Tutorials:
|
||||
|
||||
examples.rst
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
:caption: Advanced:
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
:caption: FAQ:
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
:caption: Reference:
|
||||
|
||||
Github <https://github.com/NVIDIA/cutlass>
|
||||
|
||||
|
||||
Indices and tables
|
||||
==================
|
||||
|
||||
* :ref:`genindex`
|
||||
* :ref:`modindex`
|
||||
* :ref:`search`
|
||||
36
python/docs/_sources/install.md.txt
Normal file
@ -0,0 +1,36 @@
|
||||
# Installation
|
||||
|
||||
## Installing from source
|
||||
|
||||
Installing from source requires the latest CUDA Toolkit that matches the major.minor of CUDA Python installed.
|
||||
|
||||
Prior to installing the CUTLASS Python interface, one may optionally set the following environment variables:
|
||||
* `CUTLASS_PATH`: the path to the cloned CUTLASS repository
|
||||
* `CUDA_INSTALL_PATH`: the path to the installation of CUDA
|
||||
|
||||
If these environment variables are not set, the installation process will infer them to be the following:
|
||||
* `CUTLASS_PATH`: one directory level above the current directory (i.e., `$(pwd)/..`)
|
||||
* `CUDA_INSTALL_PATH`: the directory holding `/bin/nvcc` for the first version of `nvcc` on `$PATH` (i.e., `which nvcc | awk -F'/bin/nvcc' '{print $1}'`)
|
||||
|
||||
**NOTE:** The version of `cuda-python` installed must match the CUDA version in `CUDA_INSTALL_PATH`.
|
||||
|
||||
### Installing a developer-mode package
|
||||
The CUTLASS Python interface can currently be installed via:
|
||||
```bash
|
||||
python setup.py develop --user
|
||||
```
|
||||
This will allow changes to the Python interface source to be reflected when using the Python interface.
|
||||
|
||||
We plan to add support for installing via `python setup.py install` in a future release.
|
||||
|
||||
## Docker
|
||||
To ensure that you have all of the necessary Python modules for running the examples using the
|
||||
CUTLASS Python interface, we recommend using one of the Docker images located in the docker directory.
|
||||
|
||||
For example, to build and launch a container that uses CUDA 12.1 via an NGC PyTorch container, run:
|
||||
```bash
|
||||
docker build -t cutlass-cuda12.1:latest -f docker/Dockerfile-cuda12.1-pytorch .
|
||||
docker run --gpus all -it --rm cutlass-cuda12.1:latest
|
||||
```
|
||||
|
||||
The CUTLASS Python interface has been tested with CUDA 11.8, 12.0, and 12.1 on Python 3.8.10 and 3.9.7.
|
||||
7
python/docs/_sources/modules.rst.txt
Normal file
@ -0,0 +1,7 @@
|
||||
CUTLASS Python API
|
||||
==================
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 5
|
||||
|
||||
cutlass
|
||||
903
python/docs/_static/basic.css
vendored
Normal file
@ -0,0 +1,903 @@
|
||||
/*
|
||||
* basic.css
|
||||
* ~~~~~~~~~
|
||||
*
|
||||
* Sphinx stylesheet -- basic theme.
|
||||
*
|
||||
* :copyright: Copyright 2007-2023 by the Sphinx team, see AUTHORS.
|
||||
* :license: BSD, see LICENSE for details.
|
||||
*
|
||||
*/
|
||||
|
||||
/* -- main layout ----------------------------------------------------------- */
|
||||
|
||||
div.clearer {
|
||||
clear: both;
|
||||
}
|
||||
|
||||
div.section::after {
|
||||
display: block;
|
||||
content: '';
|
||||
clear: left;
|
||||
}
|
||||
|
||||
/* -- relbar ---------------------------------------------------------------- */
|
||||
|
||||
div.related {
|
||||
width: 100%;
|
||||
font-size: 90%;
|
||||
}
|
||||
|
||||
div.related h3 {
|
||||
display: none;
|
||||
}
|
||||
|
||||
div.related ul {
|
||||
margin: 0;
|
||||
padding: 0 0 0 10px;
|
||||
list-style: none;
|
||||
}
|
||||
|
||||
div.related li {
|
||||
display: inline;
|
||||
}
|
||||
|
||||
div.related li.right {
|
||||
float: right;
|
||||
margin-right: 5px;
|
||||
}
|
||||
|
||||
/* -- sidebar --------------------------------------------------------------- */
|
||||
|
||||
div.sphinxsidebarwrapper {
|
||||
padding: 10px 5px 0 10px;
|
||||
}
|
||||
|
||||
div.sphinxsidebar {
|
||||
float: left;
|
||||
width: 230px;
|
||||
margin-left: -100%;
|
||||
font-size: 90%;
|
||||
word-wrap: break-word;
|
||||
overflow-wrap : break-word;
|
||||
}
|
||||
|
||||
div.sphinxsidebar ul {
|
||||
list-style: none;
|
||||
}
|
||||
|
||||
div.sphinxsidebar ul ul,
|
||||
div.sphinxsidebar ul.want-points {
|
||||
margin-left: 20px;
|
||||
list-style: square;
|
||||
}
|
||||
|
||||
div.sphinxsidebar ul ul {
|
||||
margin-top: 0;
|
||||
margin-bottom: 0;
|
||||
}
|
||||
|
||||
div.sphinxsidebar form {
|
||||
margin-top: 10px;
|
||||
}
|
||||
|
||||
div.sphinxsidebar input {
|
||||
border: 1px solid #98dbcc;
|
||||
font-family: sans-serif;
|
||||
font-size: 1em;
|
||||
}
|
||||
|
||||
div.sphinxsidebar #searchbox form.search {
|
||||
overflow: hidden;
|
||||
}
|
||||
|
||||
div.sphinxsidebar #searchbox input[type="text"] {
|
||||
float: left;
|
||||
width: 80%;
|
||||
padding: 0.25em;
|
||||
box-sizing: border-box;
|
||||
}
|
||||
|
||||
div.sphinxsidebar #searchbox input[type="submit"] {
|
||||
float: left;
|
||||
width: 20%;
|
||||
border-left: none;
|
||||
padding: 0.25em;
|
||||
box-sizing: border-box;
|
||||
}
|
||||
|
||||
|
||||
img {
|
||||
border: 0;
|
||||
max-width: 100%;
|
||||
}
|
||||
|
||||
/* -- search page ----------------------------------------------------------- */
|
||||
|
||||
ul.search {
|
||||
margin: 10px 0 0 20px;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
ul.search li {
|
||||
padding: 5px 0 5px 20px;
|
||||
background-image: url(file.png);
|
||||
background-repeat: no-repeat;
|
||||
background-position: 0 7px;
|
||||
}
|
||||
|
||||
ul.search li a {
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
ul.search li p.context {
|
||||
color: #888;
|
||||
margin: 2px 0 0 30px;
|
||||
text-align: left;
|
||||
}
|
||||
|
||||
ul.keywordmatches li.goodmatch a {
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
/* -- index page ------------------------------------------------------------ */
|
||||
|
||||
table.contentstable {
|
||||
width: 90%;
|
||||
margin-left: auto;
|
||||
margin-right: auto;
|
||||
}
|
||||
|
||||
table.contentstable p.biglink {
|
||||
line-height: 150%;
|
||||
}
|
||||
|
||||
a.biglink {
|
||||
font-size: 1.3em;
|
||||
}
|
||||
|
||||
span.linkdescr {
|
||||
font-style: italic;
|
||||
padding-top: 5px;
|
||||
font-size: 90%;
|
||||
}
|
||||
|
||||
/* -- general index --------------------------------------------------------- */
|
||||
|
||||
table.indextable {
|
||||
width: 100%;
|
||||
}
|
||||
|
||||
table.indextable td {
|
||||
text-align: left;
|
||||
vertical-align: top;
|
||||
}
|
||||
|
||||
table.indextable ul {
|
||||
margin-top: 0;
|
||||
margin-bottom: 0;
|
||||
list-style-type: none;
|
||||
}
|
||||
|
||||
table.indextable > tbody > tr > td > ul {
|
||||
padding-left: 0em;
|
||||
}
|
||||
|
||||
table.indextable tr.pcap {
|
||||
height: 10px;
|
||||
}
|
||||
|
||||
table.indextable tr.cap {
|
||||
margin-top: 10px;
|
||||
background-color: #f2f2f2;
|
||||
}
|
||||
|
||||
img.toggler {
|
||||
margin-right: 3px;
|
||||
margin-top: 3px;
|
||||
cursor: pointer;
|
||||
}
|
||||
|
||||
div.modindex-jumpbox {
|
||||
border-top: 1px solid #ddd;
|
||||
border-bottom: 1px solid #ddd;
|
||||
margin: 1em 0 1em 0;
|
||||
padding: 0.4em;
|
||||
}
|
||||
|
||||
div.genindex-jumpbox {
|
||||
border-top: 1px solid #ddd;
|
||||
border-bottom: 1px solid #ddd;
|
||||
margin: 1em 0 1em 0;
|
||||
padding: 0.4em;
|
||||
}
|
||||
|
||||
/* -- domain module index --------------------------------------------------- */
|
||||
|
||||
table.modindextable td {
|
||||
padding: 2px;
|
||||
border-collapse: collapse;
|
||||
}
|
||||
|
||||
/* -- general body styles --------------------------------------------------- */
|
||||
|
||||
div.body {
|
||||
min-width: 360px;
|
||||
max-width: 800px;
|
||||
}
|
||||
|
||||
div.body p, div.body dd, div.body li, div.body blockquote {
|
||||
-moz-hyphens: auto;
|
||||
-ms-hyphens: auto;
|
||||
-webkit-hyphens: auto;
|
||||
hyphens: auto;
|
||||
}
|
||||
|
||||
a.headerlink {
|
||||
visibility: hidden;
|
||||
}
|
||||
|
||||
h1:hover > a.headerlink,
|
||||
h2:hover > a.headerlink,
|
||||
h3:hover > a.headerlink,
|
||||
h4:hover > a.headerlink,
|
||||
h5:hover > a.headerlink,
|
||||
h6:hover > a.headerlink,
|
||||
dt:hover > a.headerlink,
|
||||
caption:hover > a.headerlink,
|
||||
p.caption:hover > a.headerlink,
|
||||
div.code-block-caption:hover > a.headerlink {
|
||||
visibility: visible;
|
||||
}
|
||||
|
||||
div.body p.caption {
|
||||
text-align: inherit;
|
||||
}
|
||||
|
||||
div.body td {
|
||||
text-align: left;
|
||||
}
|
||||
|
||||
.first {
|
||||
margin-top: 0 !important;
|
||||
}
|
||||
|
||||
p.rubric {
|
||||
margin-top: 30px;
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
img.align-left, figure.align-left, .figure.align-left, object.align-left {
|
||||
clear: left;
|
||||
float: left;
|
||||
margin-right: 1em;
|
||||
}
|
||||
|
||||
img.align-right, figure.align-right, .figure.align-right, object.align-right {
|
||||
clear: right;
|
||||
float: right;
|
||||
margin-left: 1em;
|
||||
}
|
||||
|
||||
img.align-center, figure.align-center, .figure.align-center, object.align-center {
|
||||
display: block;
|
||||
margin-left: auto;
|
||||
margin-right: auto;
|
||||
}
|
||||
|
||||
img.align-default, figure.align-default, .figure.align-default {
|
||||
display: block;
|
||||
margin-left: auto;
|
||||
margin-right: auto;
|
||||
}
|
||||
|
||||
.align-left {
|
||||
text-align: left;
|
||||
}
|
||||
|
||||
.align-center {
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.align-default {
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.align-right {
|
||||
text-align: right;
|
||||
}
|
||||
|
||||
/* -- sidebars -------------------------------------------------------------- */
|
||||
|
||||
div.sidebar,
|
||||
aside.sidebar {
|
||||
margin: 0 0 0.5em 1em;
|
||||
border: 1px solid #ddb;
|
||||
padding: 7px;
|
||||
background-color: #ffe;
|
||||
width: 40%;
|
||||
float: right;
|
||||
clear: right;
|
||||
overflow-x: auto;
|
||||
}
|
||||
|
||||
p.sidebar-title {
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
nav.contents,
|
||||
aside.topic,
|
||||
div.admonition, div.topic, blockquote {
|
||||
clear: left;
|
||||
}
|
||||
|
||||
/* -- topics ---------------------------------------------------------------- */
|
||||
|
||||
nav.contents,
|
||||
aside.topic,
|
||||
div.topic {
|
||||
border: 1px solid #ccc;
|
||||
padding: 7px;
|
||||
margin: 10px 0 10px 0;
|
||||
}
|
||||
|
||||
p.topic-title {
|
||||
font-size: 1.1em;
|
||||
font-weight: bold;
|
||||
margin-top: 10px;
|
||||
}
|
||||
|
||||
/* -- admonitions ----------------------------------------------------------- */
|
||||
|
||||
div.admonition {
|
||||
margin-top: 10px;
|
||||
margin-bottom: 10px;
|
||||
padding: 7px;
|
||||
}
|
||||
|
||||
div.admonition dt {
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
p.admonition-title {
|
||||
margin: 0px 10px 5px 0px;
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
div.body p.centered {
|
||||
text-align: center;
|
||||
margin-top: 25px;
|
||||
}
|
||||
|
||||
/* -- content of sidebars/topics/admonitions -------------------------------- */
|
||||
|
||||
div.sidebar > :last-child,
|
||||
aside.sidebar > :last-child,
|
||||
nav.contents > :last-child,
|
||||
aside.topic > :last-child,
|
||||
div.topic > :last-child,
|
||||
div.admonition > :last-child {
|
||||
margin-bottom: 0;
|
||||
}
|
||||
|
||||
div.sidebar::after,
|
||||
aside.sidebar::after,
|
||||
nav.contents::after,
|
||||
aside.topic::after,
|
||||
div.topic::after,
|
||||
div.admonition::after,
|
||||
blockquote::after {
|
||||
display: block;
|
||||
content: '';
|
||||
clear: both;
|
||||
}
|
||||
|
||||
/* -- tables ---------------------------------------------------------------- */
|
||||
|
||||
table.docutils {
|
||||
margin-top: 10px;
|
||||
margin-bottom: 10px;
|
||||
border: 0;
|
||||
border-collapse: collapse;
|
||||
}
|
||||
|
||||
table.align-center {
|
||||
margin-left: auto;
|
||||
margin-right: auto;
|
||||
}
|
||||
|
||||
table.align-default {
|
||||
margin-left: auto;
|
||||
margin-right: auto;
|
||||
}
|
||||
|
||||
table caption span.caption-number {
|
||||
font-style: italic;
|
||||
}
|
||||
|
||||
table caption span.caption-text {
|
||||
}
|
||||
|
||||
table.docutils td, table.docutils th {
|
||||
padding: 1px 8px 1px 5px;
|
||||
border-top: 0;
|
||||
border-left: 0;
|
||||
border-right: 0;
|
||||
border-bottom: 1px solid #aaa;
|
||||
}
|
||||
|
||||
th {
|
||||
text-align: left;
|
||||
padding-right: 5px;
|
||||
}
|
||||
|
||||
table.citation {
|
||||
border-left: solid 1px gray;
|
||||
margin-left: 1px;
|
||||
}
|
||||
|
||||
table.citation td {
|
||||
border-bottom: none;
|
||||
}
|
||||
|
||||
th > :first-child,
|
||||
td > :first-child {
|
||||
margin-top: 0px;
|
||||
}
|
||||
|
||||
th > :last-child,
|
||||
td > :last-child {
|
||||
margin-bottom: 0px;
|
||||
}
|
||||
|
||||
/* -- figures --------------------------------------------------------------- */
|
||||
|
||||
div.figure, figure {
|
||||
margin: 0.5em;
|
||||
padding: 0.5em;
|
||||
}
|
||||
|
||||
div.figure p.caption, figcaption {
|
||||
padding: 0.3em;
|
||||
}
|
||||
|
||||
div.figure p.caption span.caption-number,
|
||||
figcaption span.caption-number {
|
||||
font-style: italic;
|
||||
}
|
||||
|
||||
div.figure p.caption span.caption-text,
|
||||
figcaption span.caption-text {
|
||||
}
|
||||
|
||||
/* -- field list styles ----------------------------------------------------- */
|
||||
|
||||
table.field-list td, table.field-list th {
|
||||
border: 0 !important;
|
||||
}
|
||||
|
||||
.field-list ul {
|
||||
margin: 0;
|
||||
padding-left: 1em;
|
||||
}
|
||||
|
||||
.field-list p {
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
.field-name {
|
||||
-moz-hyphens: manual;
|
||||
-ms-hyphens: manual;
|
||||
-webkit-hyphens: manual;
|
||||
hyphens: manual;
|
||||
}
|
||||
|
||||
/* -- hlist styles ---------------------------------------------------------- */
|
||||
|
||||
table.hlist {
|
||||
margin: 1em 0;
|
||||
}
|
||||
|
||||
table.hlist td {
|
||||
vertical-align: top;
|
||||
}
|
||||
|
||||
/* -- object description styles --------------------------------------------- */
|
||||
|
||||
.sig {
|
||||
font-family: 'Consolas', 'Menlo', 'DejaVu Sans Mono', 'Bitstream Vera Sans Mono', monospace;
|
||||
}
|
||||
|
||||
.sig-name, code.descname {
|
||||
background-color: transparent;
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
.sig-name {
|
||||
font-size: 1.1em;
|
||||
}
|
||||
|
||||
code.descname {
|
||||
font-size: 1.2em;
|
||||
}
|
||||
|
||||
.sig-prename, code.descclassname {
|
||||
background-color: transparent;
|
||||
}
|
||||
|
||||
.optional {
|
||||
font-size: 1.3em;
|
||||
}
|
||||
|
||||
.sig-paren {
|
||||
font-size: larger;
|
||||
}
|
||||
|
||||
.sig-param.n {
|
||||
font-style: italic;
|
||||
}
|
||||
|
||||
/* C++ specific styling */
|
||||
|
||||
.sig-inline.c-texpr,
|
||||
.sig-inline.cpp-texpr {
|
||||
font-family: unset;
|
||||
}
|
||||
|
||||
.sig.c .k, .sig.c .kt,
|
||||
.sig.cpp .k, .sig.cpp .kt {
|
||||
color: #0033B3;
|
||||
}
|
||||
|
||||
.sig.c .m,
|
||||
.sig.cpp .m {
|
||||
color: #1750EB;
|
||||
}
|
||||
|
||||
.sig.c .s, .sig.c .sc,
|
||||
.sig.cpp .s, .sig.cpp .sc {
|
||||
color: #067D17;
|
||||
}
|
||||
|
||||
|
||||
/* -- other body styles ----------------------------------------------------- */
|
||||
|
||||
ol.arabic {
|
||||
list-style: decimal;
|
||||
}
|
||||
|
||||
ol.loweralpha {
|
||||
list-style: lower-alpha;
|
||||
}
|
||||
|
||||
ol.upperalpha {
|
||||
list-style: upper-alpha;
|
||||
}
|
||||
|
||||
ol.lowerroman {
|
||||
list-style: lower-roman;
|
||||
}
|
||||
|
||||
ol.upperroman {
|
||||
list-style: upper-roman;
|
||||
}
|
||||
|
||||
:not(li) > ol > li:first-child > :first-child,
|
||||
:not(li) > ul > li:first-child > :first-child {
|
||||
margin-top: 0px;
|
||||
}
|
||||
|
||||
:not(li) > ol > li:last-child > :last-child,
|
||||
:not(li) > ul > li:last-child > :last-child {
|
||||
margin-bottom: 0px;
|
||||
}
|
||||
|
||||
ol.simple ol p,
|
||||
ol.simple ul p,
|
||||
ul.simple ol p,
|
||||
ul.simple ul p {
|
||||
margin-top: 0;
|
||||
}
|
||||
|
||||
ol.simple > li:not(:first-child) > p,
|
||||
ul.simple > li:not(:first-child) > p {
|
||||
margin-top: 0;
|
||||
}
|
||||
|
||||
ol.simple p,
|
||||
ul.simple p {
|
||||
margin-bottom: 0;
|
||||
}
|
||||
|
||||
aside.footnote > span,
|
||||
div.citation > span {
|
||||
float: left;
|
||||
}
|
||||
aside.footnote > span:last-of-type,
|
||||
div.citation > span:last-of-type {
|
||||
padding-right: 0.5em;
|
||||
}
|
||||
aside.footnote > p {
|
||||
margin-left: 2em;
|
||||
}
|
||||
div.citation > p {
|
||||
margin-left: 4em;
|
||||
}
|
||||
aside.footnote > p:last-of-type,
|
||||
div.citation > p:last-of-type {
|
||||
margin-bottom: 0em;
|
||||
}
|
||||
aside.footnote > p:last-of-type:after,
|
||||
div.citation > p:last-of-type:after {
|
||||
content: "";
|
||||
clear: both;
|
||||
}
|
||||
|
||||
dl.field-list {
|
||||
display: grid;
|
||||
grid-template-columns: fit-content(30%) auto;
|
||||
}
|
||||
|
||||
dl.field-list > dt {
|
||||
font-weight: bold;
|
||||
word-break: break-word;
|
||||
padding-left: 0.5em;
|
||||
padding-right: 5px;
|
||||
}
|
||||
|
||||
dl.field-list > dd {
|
||||
padding-left: 0.5em;
|
||||
margin-top: 0em;
|
||||
margin-left: 0em;
|
||||
margin-bottom: 0em;
|
||||
}
|
||||
|
||||
dl {
|
||||
margin-bottom: 15px;
|
||||
}
|
||||
|
||||
dd > :first-child {
|
||||
margin-top: 0px;
|
||||
}
|
||||
|
||||
dd ul, dd table {
|
||||
margin-bottom: 10px;
|
||||
}
|
||||
|
||||
dd {
|
||||
margin-top: 3px;
|
||||
margin-bottom: 10px;
|
||||
margin-left: 30px;
|
||||
}
|
||||
|
||||
dl > dd:last-child,
|
||||
dl > dd:last-child > :last-child {
|
||||
margin-bottom: 0;
|
||||
}
|
||||
|
||||
dt:target, span.highlighted {
|
||||
background-color: #fbe54e;
|
||||
}
|
||||
|
||||
rect.highlighted {
|
||||
fill: #fbe54e;
|
||||
}
|
||||
|
||||
dl.glossary dt {
|
||||
font-weight: bold;
|
||||
font-size: 1.1em;
|
||||
}
|
||||
|
||||
.versionmodified {
|
||||
font-style: italic;
|
||||
}
|
||||
|
||||
.system-message {
|
||||
background-color: #fda;
|
||||
padding: 5px;
|
||||
border: 3px solid red;
|
||||
}
|
||||
|
||||
.footnote:target {
|
||||
background-color: #ffa;
|
||||
}
|
||||
|
||||
.line-block {
|
||||
display: block;
|
||||
margin-top: 1em;
|
||||
margin-bottom: 1em;
|
||||
}
|
||||
|
||||
.line-block .line-block {
|
||||
margin-top: 0;
|
||||
margin-bottom: 0;
|
||||
margin-left: 1.5em;
|
||||
}
|
||||
|
||||
.guilabel, .menuselection {
|
||||
font-family: sans-serif;
|
||||
}
|
||||
|
||||
.accelerator {
|
||||
text-decoration: underline;
|
||||
}
|
||||
|
||||
.classifier {
|
||||
font-style: oblique;
|
||||
}
|
||||
|
||||
.classifier:before {
|
||||
font-style: normal;
|
||||
margin: 0 0.5em;
|
||||
content: ":";
|
||||
display: inline-block;
|
||||
}
|
||||
|
||||
abbr, acronym {
|
||||
border-bottom: dotted 1px;
|
||||
cursor: help;
|
||||
}
|
||||
|
||||
/* -- code displays --------------------------------------------------------- */
|
||||
|
||||
pre {
|
||||
overflow: auto;
|
||||
overflow-y: hidden; /* fixes display issues on Chrome browsers */
|
||||
}
|
||||
|
||||
pre, div[class*="highlight-"] {
|
||||
clear: both;
|
||||
}
|
||||
|
||||
span.pre {
|
||||
-moz-hyphens: none;
|
||||
-ms-hyphens: none;
|
||||
-webkit-hyphens: none;
|
||||
hyphens: none;
|
||||
white-space: nowrap;
|
||||
}
|
||||
|
||||
div[class*="highlight-"] {
|
||||
margin: 1em 0;
|
||||
}
|
||||
|
||||
td.linenos pre {
|
||||
border: 0;
|
||||
background-color: transparent;
|
||||
color: #aaa;
|
||||
}
|
||||
|
||||
table.highlighttable {
|
||||
display: block;
|
||||
}
|
||||
|
||||
table.highlighttable tbody {
|
||||
display: block;
|
||||
}
|
||||
|
||||
table.highlighttable tr {
|
||||
display: flex;
|
||||
}
|
||||
|
||||
table.highlighttable td {
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
table.highlighttable td.linenos {
|
||||
padding-right: 0.5em;
|
||||
}
|
||||
|
||||
table.highlighttable td.code {
|
||||
flex: 1;
|
||||
overflow: hidden;
|
||||
}
|
||||
|
||||
.highlight .hll {
|
||||
display: block;
|
||||
}
|
||||
|
||||
div.highlight pre,
|
||||
table.highlighttable pre {
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
div.code-block-caption + div {
|
||||
margin-top: 0;
|
||||
}
|
||||
|
||||
div.code-block-caption {
|
||||
margin-top: 1em;
|
||||
padding: 2px 5px;
|
||||
font-size: small;
|
||||
}
|
||||
|
||||
div.code-block-caption code {
|
||||
background-color: transparent;
|
||||
}
|
||||
|
||||
table.highlighttable td.linenos,
|
||||
span.linenos,
|
||||
div.highlight span.gp { /* gp: Generic.Prompt */
|
||||
user-select: none;
|
||||
-webkit-user-select: text; /* Safari fallback only */
|
||||
-webkit-user-select: none; /* Chrome/Safari */
|
||||
-moz-user-select: none; /* Firefox */
|
||||
-ms-user-select: none; /* IE10+ */
|
||||
}
|
||||
|
||||
div.code-block-caption span.caption-number {
|
||||
padding: 0.1em 0.3em;
|
||||
font-style: italic;
|
||||
}
|
||||
|
||||
div.code-block-caption span.caption-text {
|
||||
}
|
||||
|
||||
div.literal-block-wrapper {
|
||||
margin: 1em 0;
|
||||
}
|
||||
|
||||
code.xref, a code {
|
||||
background-color: transparent;
|
||||
font-weight: bold;
|
||||
}
|
||||
|
||||
h1 code, h2 code, h3 code, h4 code, h5 code, h6 code {
|
||||
background-color: transparent;
|
||||
}
|
||||
|
||||
.viewcode-link {
|
||||
float: right;
|
||||
}
|
||||
|
||||
.viewcode-back {
|
||||
float: right;
|
||||
font-family: sans-serif;
|
||||
}
|
||||
|
||||
div.viewcode-block:target {
|
||||
margin: -1px -10px;
|
||||
padding: 0 10px;
|
||||
}
|
||||
|
||||
/* -- math display ---------------------------------------------------------- */
|
||||
|
||||
img.math {
|
||||
vertical-align: middle;
|
||||
}
|
||||
|
||||
div.body div.math p {
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
span.eqno {
|
||||
float: right;
|
||||
}
|
||||
|
||||
span.eqno a.headerlink {
|
||||
position: absolute;
|
||||
z-index: 1;
|
||||
}
|
||||
|
||||
div.math:hover a.headerlink {
|
||||
visibility: visible;
|
||||
}
|
||||
|
||||
/* -- printout stylesheet --------------------------------------------------- */
|
||||
|
||||
@media print {
|
||||
div.document,
|
||||
div.documentwrapper,
|
||||
div.bodywrapper {
|
||||
margin: 0 !important;
|
||||
width: 100%;
|
||||
}
|
||||
|
||||
div.sphinxsidebar,
|
||||
div.related,
|
||||
div.footer,
|
||||
#top-link {
|
||||
display: none;
|
||||
}
|
||||
}
|
||||
4
python/docs/_static/check-solid.svg
vendored
Normal file
@ -0,0 +1,4 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" class="icon icon-tabler icon-tabler-check" width="44" height="44" viewBox="0 0 24 24" stroke-width="2" stroke="#22863a" fill="none" stroke-linecap="round" stroke-linejoin="round">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<path d="M5 12l5 5l10 -10" />
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 313 B |
7
python/docs/_static/clipboard.min.js
vendored
Normal file
5
python/docs/_static/copy-button.svg
vendored
Normal file
@ -0,0 +1,5 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" class="icon icon-tabler icon-tabler-copy" width="44" height="44" viewBox="0 0 24 24" stroke-width="1.5" stroke="#000000" fill="none" stroke-linecap="round" stroke-linejoin="round">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<rect x="8" y="8" width="12" height="12" rx="2" />
|
||||
<path d="M16 8v-2a2 2 0 0 0 -2 -2h-8a2 2 0 0 0 -2 2v8a2 2 0 0 0 2 2h2" />
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 411 B |
94
python/docs/_static/copybutton.css
vendored
Normal file
@ -0,0 +1,94 @@
|
||||
/* Copy buttons */
|
||||
button.copybtn {
|
||||
position: absolute;
|
||||
display: flex;
|
||||
top: .3em;
|
||||
right: .3em;
|
||||
width: 1.7em;
|
||||
height: 1.7em;
|
||||
opacity: 0;
|
||||
transition: opacity 0.3s, border .3s, background-color .3s;
|
||||
user-select: none;
|
||||
padding: 0;
|
||||
border: none;
|
||||
outline: none;
|
||||
border-radius: 0.4em;
|
||||
/* The colors that GitHub uses */
|
||||
border: #1b1f2426 1px solid;
|
||||
background-color: #f6f8fa;
|
||||
color: #57606a;
|
||||
}
|
||||
|
||||
button.copybtn.success {
|
||||
border-color: #22863a;
|
||||
color: #22863a;
|
||||
}
|
||||
|
||||
button.copybtn svg {
|
||||
stroke: currentColor;
|
||||
width: 1.5em;
|
||||
height: 1.5em;
|
||||
padding: 0.1em;
|
||||
}
|
||||
|
||||
div.highlight {
|
||||
position: relative;
|
||||
}
|
||||
|
||||
/* Show the copybutton */
|
||||
.highlight:hover button.copybtn, button.copybtn.success {
|
||||
opacity: 1;
|
||||
}
|
||||
|
||||
.highlight button.copybtn:hover {
|
||||
background-color: rgb(235, 235, 235);
|
||||
}
|
||||
|
||||
.highlight button.copybtn:active {
|
||||
background-color: rgb(187, 187, 187);
|
||||
}
|
||||
|
||||
/**
|
||||
* A minimal CSS-only tooltip copied from:
|
||||
* https://codepen.io/mildrenben/pen/rVBrpK
|
||||
*
|
||||
* To use, write HTML like the following:
|
||||
*
|
||||
* <p class="o-tooltip--left" data-tooltip="Hey">Short</p>
|
||||
*/
|
||||
.o-tooltip--left {
|
||||
position: relative;
|
||||
}
|
||||
|
||||
.o-tooltip--left:after {
|
||||
opacity: 0;
|
||||
visibility: hidden;
|
||||
position: absolute;
|
||||
content: attr(data-tooltip);
|
||||
padding: .2em;
|
||||
font-size: .8em;
|
||||
left: -.2em;
|
||||
background: grey;
|
||||
color: white;
|
||||
white-space: nowrap;
|
||||
z-index: 2;
|
||||
border-radius: 2px;
|
||||
transform: translateX(-102%) translateY(0);
|
||||
transition: opacity 0.2s cubic-bezier(0.64, 0.09, 0.08, 1), transform 0.2s cubic-bezier(0.64, 0.09, 0.08, 1);
|
||||
}
|
||||
|
||||
.o-tooltip--left:hover:after {
|
||||
display: block;
|
||||
opacity: 1;
|
||||
visibility: visible;
|
||||
transform: translateX(-100%) translateY(0);
|
||||
transition: opacity 0.2s cubic-bezier(0.64, 0.09, 0.08, 1), transform 0.2s cubic-bezier(0.64, 0.09, 0.08, 1);
|
||||
transition-delay: .5s;
|
||||
}
|
||||
|
||||
/* By default the copy button shouldn't show up when printing a page */
|
||||
@media print {
|
||||
button.copybtn {
|
||||
display: none;
|
||||
}
|
||||
}
|
||||
248
python/docs/_static/copybutton.js
vendored
Normal file
@ -0,0 +1,248 @@
|
||||
// Localization support
|
||||
const messages = {
|
||||
'en': {
|
||||
'copy': 'Copy',
|
||||
'copy_to_clipboard': 'Copy to clipboard',
|
||||
'copy_success': 'Copied!',
|
||||
'copy_failure': 'Failed to copy',
|
||||
},
|
||||
'es' : {
|
||||
'copy': 'Copiar',
|
||||
'copy_to_clipboard': 'Copiar al portapapeles',
|
||||
'copy_success': '¡Copiado!',
|
||||
'copy_failure': 'Error al copiar',
|
||||
},
|
||||
'de' : {
|
||||
'copy': 'Kopieren',
|
||||
'copy_to_clipboard': 'In die Zwischenablage kopieren',
|
||||
'copy_success': 'Kopiert!',
|
||||
'copy_failure': 'Fehler beim Kopieren',
|
||||
},
|
||||
'fr' : {
|
||||
'copy': 'Copier',
|
||||
'copy_to_clipboard': 'Copier dans le presse-papier',
|
||||
'copy_success': 'Copié !',
|
||||
'copy_failure': 'Échec de la copie',
|
||||
},
|
||||
'ru': {
|
||||
'copy': 'Скопировать',
|
||||
'copy_to_clipboard': 'Скопировать в буфер',
|
||||
'copy_success': 'Скопировано!',
|
||||
'copy_failure': 'Не удалось скопировать',
|
||||
},
|
||||
'zh-CN': {
|
||||
'copy': '复制',
|
||||
'copy_to_clipboard': '复制到剪贴板',
|
||||
'copy_success': '复制成功!',
|
||||
'copy_failure': '复制失败',
|
||||
},
|
||||
'it' : {
|
||||
'copy': 'Copiare',
|
||||
'copy_to_clipboard': 'Copiato negli appunti',
|
||||
'copy_success': 'Copiato!',
|
||||
'copy_failure': 'Errore durante la copia',
|
||||
}
|
||||
}
|
||||
|
||||
let locale = 'en'
|
||||
if( document.documentElement.lang !== undefined
|
||||
&& messages[document.documentElement.lang] !== undefined ) {
|
||||
locale = document.documentElement.lang
|
||||
}
|
||||
|
||||
let doc_url_root = DOCUMENTATION_OPTIONS.URL_ROOT;
|
||||
if (doc_url_root == '#') {
|
||||
doc_url_root = '';
|
||||
}
|
||||
|
||||
/**
|
||||
* SVG files for our copy buttons
|
||||
*/
|
||||
let iconCheck = `<svg xmlns="http://www.w3.org/2000/svg" class="icon icon-tabler icon-tabler-check" width="44" height="44" viewBox="0 0 24 24" stroke-width="2" stroke="#22863a" fill="none" stroke-linecap="round" stroke-linejoin="round">
|
||||
<title>${messages[locale]['copy_success']}</title>
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<path d="M5 12l5 5l10 -10" />
|
||||
</svg>`
|
||||
|
||||
// If the user specified their own SVG use that, otherwise use the default
|
||||
let iconCopy = ``;
|
||||
if (!iconCopy) {
|
||||
iconCopy = `<svg xmlns="http://www.w3.org/2000/svg" class="icon icon-tabler icon-tabler-copy" width="44" height="44" viewBox="0 0 24 24" stroke-width="1.5" stroke="#000000" fill="none" stroke-linecap="round" stroke-linejoin="round">
|
||||
<title>${messages[locale]['copy_to_clipboard']}</title>
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<rect x="8" y="8" width="12" height="12" rx="2" />
|
||||
<path d="M16 8v-2a2 2 0 0 0 -2 -2h-8a2 2 0 0 0 -2 2v8a2 2 0 0 0 2 2h2" />
|
||||
</svg>`
|
||||
}
|
||||
|
||||
/**
|
||||
* Set up copy/paste for code blocks
|
||||
*/
|
||||
|
||||
const runWhenDOMLoaded = cb => {
|
||||
if (document.readyState != 'loading') {
|
||||
cb()
|
||||
} else if (document.addEventListener) {
|
||||
document.addEventListener('DOMContentLoaded', cb)
|
||||
} else {
|
||||
document.attachEvent('onreadystatechange', function() {
|
||||
if (document.readyState == 'complete') cb()
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
const codeCellId = index => `codecell${index}`
|
||||
|
||||
// Clears selected text since ClipboardJS will select the text when copying
|
||||
const clearSelection = () => {
|
||||
if (window.getSelection) {
|
||||
window.getSelection().removeAllRanges()
|
||||
} else if (document.selection) {
|
||||
document.selection.empty()
|
||||
}
|
||||
}
|
||||
|
||||
// Changes tooltip text for a moment, then changes it back
|
||||
// We want the timeout of our `success` class to be a bit shorter than the
|
||||
// tooltip and icon change, so that we can hide the icon before changing back.
|
||||
var timeoutIcon = 2000;
|
||||
var timeoutSuccessClass = 1500;
|
||||
|
||||
const temporarilyChangeTooltip = (el, oldText, newText) => {
|
||||
el.setAttribute('data-tooltip', newText)
|
||||
el.classList.add('success')
|
||||
// Remove success a little bit sooner than we change the tooltip
|
||||
// So that we can use CSS to hide the copybutton first
|
||||
setTimeout(() => el.classList.remove('success'), timeoutSuccessClass)
|
||||
setTimeout(() => el.setAttribute('data-tooltip', oldText), timeoutIcon)
|
||||
}
|
||||
|
||||
// Changes the copy button icon for two seconds, then changes it back
|
||||
const temporarilyChangeIcon = (el) => {
|
||||
el.innerHTML = iconCheck;
|
||||
setTimeout(() => {el.innerHTML = iconCopy}, timeoutIcon)
|
||||
}
|
||||
|
||||
const addCopyButtonToCodeCells = () => {
|
||||
// If ClipboardJS hasn't loaded, wait a bit and try again. This
|
||||
// happens because we load ClipboardJS asynchronously.
|
||||
if (window.ClipboardJS === undefined) {
|
||||
setTimeout(addCopyButtonToCodeCells, 250)
|
||||
return
|
||||
}
|
||||
|
||||
// Add copybuttons to all of our code cells
|
||||
const COPYBUTTON_SELECTOR = 'div.highlight pre';
|
||||
const codeCells = document.querySelectorAll(COPYBUTTON_SELECTOR)
|
||||
codeCells.forEach((codeCell, index) => {
|
||||
const id = codeCellId(index)
|
||||
codeCell.setAttribute('id', id)
|
||||
|
||||
const clipboardButton = id =>
|
||||
`<button class="copybtn o-tooltip--left" data-tooltip="${messages[locale]['copy']}" data-clipboard-target="#${id}">
|
||||
${iconCopy}
|
||||
</button>`
|
||||
codeCell.insertAdjacentHTML('afterend', clipboardButton(id))
|
||||
})
|
||||
|
||||
function escapeRegExp(string) {
|
||||
return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
|
||||
}
|
||||
|
||||
/**
|
||||
* Removes excluded text from a Node.
|
||||
*
|
||||
* @param {Node} target Node to filter.
|
||||
* @param {string} exclude CSS selector of nodes to exclude.
|
||||
* @returns {DOMString} Text from `target` with text removed.
|
||||
*/
|
||||
function filterText(target, exclude) {
|
||||
const clone = target.cloneNode(true); // clone as to not modify the live DOM
|
||||
if (exclude) {
|
||||
// remove excluded nodes
|
||||
clone.querySelectorAll(exclude).forEach(node => node.remove());
|
||||
}
|
||||
return clone.innerText;
|
||||
}
|
||||
|
||||
// Callback when a copy button is clicked. Will be passed the node that was clicked
|
||||
// should then grab the text and replace pieces of text that shouldn't be used in output
|
||||
function formatCopyText(textContent, copybuttonPromptText, isRegexp = false, onlyCopyPromptLines = true, removePrompts = true, copyEmptyLines = true, lineContinuationChar = "", hereDocDelim = "") {
|
||||
var regexp;
|
||||
var match;
|
||||
|
||||
// Do we check for line continuation characters and "HERE-documents"?
|
||||
var useLineCont = !!lineContinuationChar
|
||||
var useHereDoc = !!hereDocDelim
|
||||
|
||||
// create regexp to capture prompt and remaining line
|
||||
if (isRegexp) {
|
||||
regexp = new RegExp('^(' + copybuttonPromptText + ')(.*)')
|
||||
} else {
|
||||
regexp = new RegExp('^(' + escapeRegExp(copybuttonPromptText) + ')(.*)')
|
||||
}
|
||||
|
||||
const outputLines = [];
|
||||
var promptFound = false;
|
||||
var gotLineCont = false;
|
||||
var gotHereDoc = false;
|
||||
const lineGotPrompt = [];
|
||||
for (const line of textContent.split('\n')) {
|
||||
match = line.match(regexp)
|
||||
if (match || gotLineCont || gotHereDoc) {
|
||||
promptFound = regexp.test(line)
|
||||
lineGotPrompt.push(promptFound)
|
||||
if (removePrompts && promptFound) {
|
||||
outputLines.push(match[2])
|
||||
} else {
|
||||
outputLines.push(line)
|
||||
}
|
||||
gotLineCont = line.endsWith(lineContinuationChar) & useLineCont
|
||||
if (line.includes(hereDocDelim) & useHereDoc)
|
||||
gotHereDoc = !gotHereDoc
|
||||
} else if (!onlyCopyPromptLines) {
|
||||
outputLines.push(line)
|
||||
} else if (copyEmptyLines && line.trim() === '') {
|
||||
outputLines.push(line)
|
||||
}
|
||||
}
|
||||
|
||||
// If no lines with the prompt were found then just use original lines
|
||||
if (lineGotPrompt.some(v => v === true)) {
|
||||
textContent = outputLines.join('\n');
|
||||
}
|
||||
|
||||
// Remove a trailing newline to avoid auto-running when pasting
|
||||
if (textContent.endsWith("\n")) {
|
||||
textContent = textContent.slice(0, -1)
|
||||
}
|
||||
return textContent
|
||||
}
|
||||
|
||||
|
||||
var copyTargetText = (trigger) => {
|
||||
var target = document.querySelector(trigger.attributes['data-clipboard-target'].value);
|
||||
|
||||
// get filtered text
|
||||
let exclude = '.linenos';
|
||||
|
||||
let text = filterText(target, exclude);
|
||||
return formatCopyText(text, '', false, true, true, true, '', '')
|
||||
}
|
||||
|
||||
// Initialize with a callback so we can modify the text before copy
|
||||
const clipboard = new ClipboardJS('.copybtn', {text: copyTargetText})
|
||||
|
||||
// Update UI with error/success messages
|
||||
clipboard.on('success', event => {
|
||||
clearSelection()
|
||||
temporarilyChangeTooltip(event.trigger, messages[locale]['copy'], messages[locale]['copy_success'])
|
||||
temporarilyChangeIcon(event.trigger)
|
||||
})
|
||||
|
||||
clipboard.on('error', event => {
|
||||
temporarilyChangeTooltip(event.trigger, messages[locale]['copy'], messages[locale]['copy_failure'])
|
||||
})
|
||||
}
|
||||
|
||||
runWhenDOMLoaded(addCopyButtonToCodeCells)
|
||||
73
python/docs/_static/copybutton_funcs.js
vendored
Normal file
@ -0,0 +1,73 @@
|
||||
function escapeRegExp(string) {
|
||||
return string.replace(/[.*+?^${}()|[\]\\]/g, '\\$&'); // $& means the whole matched string
|
||||
}
|
||||
|
||||
/**
|
||||
* Removes excluded text from a Node.
|
||||
*
|
||||
* @param {Node} target Node to filter.
|
||||
* @param {string} exclude CSS selector of nodes to exclude.
|
||||
* @returns {DOMString} Text from `target` with text removed.
|
||||
*/
|
||||
export function filterText(target, exclude) {
|
||||
const clone = target.cloneNode(true); // clone as to not modify the live DOM
|
||||
if (exclude) {
|
||||
// remove excluded nodes
|
||||
clone.querySelectorAll(exclude).forEach(node => node.remove());
|
||||
}
|
||||
return clone.innerText;
|
||||
}
|
||||
|
||||
// Callback when a copy button is clicked. Will be passed the node that was clicked
|
||||
// should then grab the text and replace pieces of text that shouldn't be used in output
|
||||
export function formatCopyText(textContent, copybuttonPromptText, isRegexp = false, onlyCopyPromptLines = true, removePrompts = true, copyEmptyLines = true, lineContinuationChar = "", hereDocDelim = "") {
|
||||
var regexp;
|
||||
var match;
|
||||
|
||||
// Do we check for line continuation characters and "HERE-documents"?
|
||||
var useLineCont = !!lineContinuationChar
|
||||
var useHereDoc = !!hereDocDelim
|
||||
|
||||
// create regexp to capture prompt and remaining line
|
||||
if (isRegexp) {
|
||||
regexp = new RegExp('^(' + copybuttonPromptText + ')(.*)')
|
||||
} else {
|
||||
regexp = new RegExp('^(' + escapeRegExp(copybuttonPromptText) + ')(.*)')
|
||||
}
|
||||
|
||||
const outputLines = [];
|
||||
var promptFound = false;
|
||||
var gotLineCont = false;
|
||||
var gotHereDoc = false;
|
||||
const lineGotPrompt = [];
|
||||
for (const line of textContent.split('\n')) {
|
||||
match = line.match(regexp)
|
||||
if (match || gotLineCont || gotHereDoc) {
|
||||
promptFound = regexp.test(line)
|
||||
lineGotPrompt.push(promptFound)
|
||||
if (removePrompts && promptFound) {
|
||||
outputLines.push(match[2])
|
||||
} else {
|
||||
outputLines.push(line)
|
||||
}
|
||||
gotLineCont = line.endsWith(lineContinuationChar) & useLineCont
|
||||
if (line.includes(hereDocDelim) & useHereDoc)
|
||||
gotHereDoc = !gotHereDoc
|
||||
} else if (!onlyCopyPromptLines) {
|
||||
outputLines.push(line)
|
||||
} else if (copyEmptyLines && line.trim() === '') {
|
||||
outputLines.push(line)
|
||||
}
|
||||
}
|
||||
|
||||
// If no lines with the prompt were found then just use original lines
|
||||
if (lineGotPrompt.some(v => v === true)) {
|
||||
textContent = outputLines.join('\n');
|
||||
}
|
||||
|
||||
// Remove a trailing newline to avoid auto-running when pasting
|
||||
if (textContent.endsWith("\n")) {
|
||||
textContent = textContent.slice(0, -1)
|
||||
}
|
||||
return textContent
|
||||
}
|
||||
BIN
python/docs/_static/cutlass-logo-small.png
vendored
Normal file
|
After Width: | Height: | Size: 1.5 KiB |
69
python/docs/_static/debug.css
vendored
Normal file
@ -0,0 +1,69 @@
|
||||
/*
|
||||
This CSS file should be overridden by the theme authors. It's
|
||||
meant for debugging and developing the skeleton that this theme provides.
|
||||
*/
|
||||
body {
|
||||
font-family: -apple-system, "Segoe UI", Roboto, Helvetica, Arial, sans-serif,
|
||||
"Apple Color Emoji", "Segoe UI Emoji";
|
||||
background: lavender;
|
||||
}
|
||||
.sb-announcement {
|
||||
background: rgb(131, 131, 131);
|
||||
}
|
||||
.sb-announcement__inner {
|
||||
background: black;
|
||||
color: white;
|
||||
}
|
||||
.sb-header {
|
||||
background: lightskyblue;
|
||||
}
|
||||
.sb-header__inner {
|
||||
background: royalblue;
|
||||
color: white;
|
||||
}
|
||||
.sb-header-secondary {
|
||||
background: lightcyan;
|
||||
}
|
||||
.sb-header-secondary__inner {
|
||||
background: cornflowerblue;
|
||||
color: white;
|
||||
}
|
||||
.sb-sidebar-primary {
|
||||
background: lightgreen;
|
||||
}
|
||||
.sb-main {
|
||||
background: blanchedalmond;
|
||||
}
|
||||
.sb-main__inner {
|
||||
background: antiquewhite;
|
||||
}
|
||||
.sb-header-article {
|
||||
background: lightsteelblue;
|
||||
}
|
||||
.sb-article-container {
|
||||
background: snow;
|
||||
}
|
||||
.sb-article-main {
|
||||
background: white;
|
||||
}
|
||||
.sb-footer-article {
|
||||
background: lightpink;
|
||||
}
|
||||
.sb-sidebar-secondary {
|
||||
background: lightgoldenrodyellow;
|
||||
}
|
||||
.sb-footer-content {
|
||||
background: plum;
|
||||
}
|
||||
.sb-footer-content__inner {
|
||||
background: palevioletred;
|
||||
}
|
||||
.sb-footer {
|
||||
background: pink;
|
||||
}
|
||||
.sb-footer__inner {
|
||||
background: salmon;
|
||||
}
|
||||
.sb-article {
|
||||
background: white;
|
||||
}
|
||||
156
python/docs/_static/doctools.js
vendored
Normal file
@ -0,0 +1,156 @@
|
||||
/*
|
||||
* doctools.js
|
||||
* ~~~~~~~~~~~
|
||||
*
|
||||
* Base JavaScript utilities for all Sphinx HTML documentation.
|
||||
*
|
||||
* :copyright: Copyright 2007-2023 by the Sphinx team, see AUTHORS.
|
||||
* :license: BSD, see LICENSE for details.
|
||||
*
|
||||
*/
|
||||
"use strict";
|
||||
|
||||
const BLACKLISTED_KEY_CONTROL_ELEMENTS = new Set([
|
||||
"TEXTAREA",
|
||||
"INPUT",
|
||||
"SELECT",
|
||||
"BUTTON",
|
||||
]);
|
||||
|
||||
const _ready = (callback) => {
|
||||
if (document.readyState !== "loading") {
|
||||
callback();
|
||||
} else {
|
||||
document.addEventListener("DOMContentLoaded", callback);
|
||||
}
|
||||
};
|
||||
|
||||
/**
|
||||
* Small JavaScript module for the documentation.
|
||||
*/
|
||||
const Documentation = {
|
||||
init: () => {
|
||||
Documentation.initDomainIndexTable();
|
||||
Documentation.initOnKeyListeners();
|
||||
},
|
||||
|
||||
/**
|
||||
* i18n support
|
||||
*/
|
||||
TRANSLATIONS: {},
|
||||
PLURAL_EXPR: (n) => (n === 1 ? 0 : 1),
|
||||
LOCALE: "unknown",
|
||||
|
||||
// gettext and ngettext don't access this so that the functions
|
||||
// can safely bound to a different name (_ = Documentation.gettext)
|
||||
gettext: (string) => {
|
||||
const translated = Documentation.TRANSLATIONS[string];
|
||||
switch (typeof translated) {
|
||||
case "undefined":
|
||||
return string; // no translation
|
||||
case "string":
|
||||
return translated; // translation exists
|
||||
default:
|
||||
return translated[0]; // (singular, plural) translation tuple exists
|
||||
}
|
||||
},
|
||||
|
||||
ngettext: (singular, plural, n) => {
|
||||
const translated = Documentation.TRANSLATIONS[singular];
|
||||
if (typeof translated !== "undefined")
|
||||
return translated[Documentation.PLURAL_EXPR(n)];
|
||||
return n === 1 ? singular : plural;
|
||||
},
|
||||
|
||||
addTranslations: (catalog) => {
|
||||
Object.assign(Documentation.TRANSLATIONS, catalog.messages);
|
||||
Documentation.PLURAL_EXPR = new Function(
|
||||
"n",
|
||||
`return (${catalog.plural_expr})`
|
||||
);
|
||||
Documentation.LOCALE = catalog.locale;
|
||||
},
|
||||
|
||||
/**
|
||||
* helper function to focus on search bar
|
||||
*/
|
||||
focusSearchBar: () => {
|
||||
document.querySelectorAll("input[name=q]")[0]?.focus();
|
||||
},
|
||||
|
||||
/**
|
||||
* Initialise the domain index toggle buttons
|
||||
*/
|
||||
initDomainIndexTable: () => {
|
||||
const toggler = (el) => {
|
||||
const idNumber = el.id.substr(7);
|
||||
const toggledRows = document.querySelectorAll(`tr.cg-${idNumber}`);
|
||||
if (el.src.substr(-9) === "minus.png") {
|
||||
el.src = `${el.src.substr(0, el.src.length - 9)}plus.png`;
|
||||
toggledRows.forEach((el) => (el.style.display = "none"));
|
||||
} else {
|
||||
el.src = `${el.src.substr(0, el.src.length - 8)}minus.png`;
|
||||
toggledRows.forEach((el) => (el.style.display = ""));
|
||||
}
|
||||
};
|
||||
|
||||
const togglerElements = document.querySelectorAll("img.toggler");
|
||||
togglerElements.forEach((el) =>
|
||||
el.addEventListener("click", (event) => toggler(event.currentTarget))
|
||||
);
|
||||
togglerElements.forEach((el) => (el.style.display = ""));
|
||||
if (DOCUMENTATION_OPTIONS.COLLAPSE_INDEX) togglerElements.forEach(toggler);
|
||||
},
|
||||
|
||||
initOnKeyListeners: () => {
|
||||
// only install a listener if it is really needed
|
||||
if (
|
||||
!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS &&
|
||||
!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS
|
||||
)
|
||||
return;
|
||||
|
||||
document.addEventListener("keydown", (event) => {
|
||||
// bail for input elements
|
||||
if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return;
|
||||
// bail with special keys
|
||||
if (event.altKey || event.ctrlKey || event.metaKey) return;
|
||||
|
||||
if (!event.shiftKey) {
|
||||
switch (event.key) {
|
||||
case "ArrowLeft":
|
||||
if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break;
|
||||
|
||||
const prevLink = document.querySelector('link[rel="prev"]');
|
||||
if (prevLink && prevLink.href) {
|
||||
window.location.href = prevLink.href;
|
||||
event.preventDefault();
|
||||
}
|
||||
break;
|
||||
case "ArrowRight":
|
||||
if (!DOCUMENTATION_OPTIONS.NAVIGATION_WITH_KEYS) break;
|
||||
|
||||
const nextLink = document.querySelector('link[rel="next"]');
|
||||
if (nextLink && nextLink.href) {
|
||||
window.location.href = nextLink.href;
|
||||
event.preventDefault();
|
||||
}
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
// some keyboard layouts may need Shift to get /
|
||||
switch (event.key) {
|
||||
case "/":
|
||||
if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) break;
|
||||
Documentation.focusSearchBar();
|
||||
event.preventDefault();
|
||||
}
|
||||
});
|
||||
},
|
||||
};
|
||||
|
||||
// quick alias for translations
|
||||
const _ = Documentation.gettext;
|
||||
|
||||
_ready(Documentation.init);
|
||||
14
python/docs/_static/documentation_options.js
vendored
Normal file
@ -0,0 +1,14 @@
|
||||
var DOCUMENTATION_OPTIONS = {
|
||||
URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
|
||||
VERSION: '3.1.0',
|
||||
LANGUAGE: 'en',
|
||||
COLLAPSE_INDEX: false,
|
||||
BUILDER: 'html',
|
||||
FILE_SUFFIX: '.html',
|
||||
LINK_SUFFIX: '.html',
|
||||
HAS_SOURCE: true,
|
||||
SOURCELINK_SUFFIX: '.txt',
|
||||
NAVIGATION_WITH_KEYS: false,
|
||||
SHOW_SEARCH_SUMMARY: true,
|
||||
ENABLE_SEARCH_SHORTCUTS: true,
|
||||
};
|
||||
BIN
python/docs/_static/file.png
vendored
Normal file
|
After Width: | Height: | Size: 286 B |
199
python/docs/_static/language_data.js
vendored
Normal file
@ -0,0 +1,199 @@
|
||||
/*
|
||||
* language_data.js
|
||||
* ~~~~~~~~~~~~~~~~
|
||||
*
|
||||
* This script contains the language-specific data used by searchtools.js,
|
||||
* namely the list of stopwords, stemmer, scorer and splitter.
|
||||
*
|
||||
* :copyright: Copyright 2007-2023 by the Sphinx team, see AUTHORS.
|
||||
* :license: BSD, see LICENSE for details.
|
||||
*
|
||||
*/
|
||||
|
||||
var stopwords = ["a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "near", "no", "not", "of", "on", "or", "such", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"];
|
||||
|
||||
|
||||
/* Non-minified version is copied as a separate JS file, is available */
|
||||
|
||||
/**
|
||||
* Porter Stemmer
|
||||
*/
|
||||
var Stemmer = function() {
|
||||
|
||||
var step2list = {
|
||||
ational: 'ate',
|
||||
tional: 'tion',
|
||||
enci: 'ence',
|
||||
anci: 'ance',
|
||||
izer: 'ize',
|
||||
bli: 'ble',
|
||||
alli: 'al',
|
||||
entli: 'ent',
|
||||
eli: 'e',
|
||||
ousli: 'ous',
|
||||
ization: 'ize',
|
||||
ation: 'ate',
|
||||
ator: 'ate',
|
||||
alism: 'al',
|
||||
iveness: 'ive',
|
||||
fulness: 'ful',
|
||||
ousness: 'ous',
|
||||
aliti: 'al',
|
||||
iviti: 'ive',
|
||||
biliti: 'ble',
|
||||
logi: 'log'
|
||||
};
|
||||
|
||||
var step3list = {
|
||||
icate: 'ic',
|
||||
ative: '',
|
||||
alize: 'al',
|
||||
iciti: 'ic',
|
||||
ical: 'ic',
|
||||
ful: '',
|
||||
ness: ''
|
||||
};
|
||||
|
||||
var c = "[^aeiou]"; // consonant
|
||||
var v = "[aeiouy]"; // vowel
|
||||
var C = c + "[^aeiouy]*"; // consonant sequence
|
||||
var V = v + "[aeiou]*"; // vowel sequence
|
||||
|
||||
var mgr0 = "^(" + C + ")?" + V + C; // [C]VC... is m>0
|
||||
var meq1 = "^(" + C + ")?" + V + C + "(" + V + ")?$"; // [C]VC[V] is m=1
|
||||
var mgr1 = "^(" + C + ")?" + V + C + V + C; // [C]VCVC... is m>1
|
||||
var s_v = "^(" + C + ")?" + v; // vowel in stem
|
||||
|
||||
this.stemWord = function (w) {
|
||||
var stem;
|
||||
var suffix;
|
||||
var firstch;
|
||||
var origword = w;
|
||||
|
||||
if (w.length < 3)
|
||||
return w;
|
||||
|
||||
var re;
|
||||
var re2;
|
||||
var re3;
|
||||
var re4;
|
||||
|
||||
firstch = w.substr(0,1);
|
||||
if (firstch == "y")
|
||||
w = firstch.toUpperCase() + w.substr(1);
|
||||
|
||||
// Step 1a
|
||||
re = /^(.+?)(ss|i)es$/;
|
||||
re2 = /^(.+?)([^s])s$/;
|
||||
|
||||
if (re.test(w))
|
||||
w = w.replace(re,"$1$2");
|
||||
else if (re2.test(w))
|
||||
w = w.replace(re2,"$1$2");
|
||||
|
||||
// Step 1b
|
||||
re = /^(.+?)eed$/;
|
||||
re2 = /^(.+?)(ed|ing)$/;
|
||||
if (re.test(w)) {
|
||||
var fp = re.exec(w);
|
||||
re = new RegExp(mgr0);
|
||||
if (re.test(fp[1])) {
|
||||
re = /.$/;
|
||||
w = w.replace(re,"");
|
||||
}
|
||||
}
|
||||
else if (re2.test(w)) {
|
||||
var fp = re2.exec(w);
|
||||
stem = fp[1];
|
||||
re2 = new RegExp(s_v);
|
||||
if (re2.test(stem)) {
|
||||
w = stem;
|
||||
re2 = /(at|bl|iz)$/;
|
||||
re3 = new RegExp("([^aeiouylsz])\\1$");
|
||||
re4 = new RegExp("^" + C + v + "[^aeiouwxy]$");
|
||||
if (re2.test(w))
|
||||
w = w + "e";
|
||||
else if (re3.test(w)) {
|
||||
re = /.$/;
|
||||
w = w.replace(re,"");
|
||||
}
|
||||
else if (re4.test(w))
|
||||
w = w + "e";
|
||||
}
|
||||
}
|
||||
|
||||
// Step 1c
|
||||
re = /^(.+?)y$/;
|
||||
if (re.test(w)) {
|
||||
var fp = re.exec(w);
|
||||
stem = fp[1];
|
||||
re = new RegExp(s_v);
|
||||
if (re.test(stem))
|
||||
w = stem + "i";
|
||||
}
|
||||
|
||||
// Step 2
|
||||
re = /^(.+?)(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/;
|
||||
if (re.test(w)) {
|
||||
var fp = re.exec(w);
|
||||
stem = fp[1];
|
||||
suffix = fp[2];
|
||||
re = new RegExp(mgr0);
|
||||
if (re.test(stem))
|
||||
w = stem + step2list[suffix];
|
||||
}
|
||||
|
||||
// Step 3
|
||||
re = /^(.+?)(icate|ative|alize|iciti|ical|ful|ness)$/;
|
||||
if (re.test(w)) {
|
||||
var fp = re.exec(w);
|
||||
stem = fp[1];
|
||||
suffix = fp[2];
|
||||
re = new RegExp(mgr0);
|
||||
if (re.test(stem))
|
||||
w = stem + step3list[suffix];
|
||||
}
|
||||
|
||||
// Step 4
|
||||
re = /^(.+?)(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/;
|
||||
re2 = /^(.+?)(s|t)(ion)$/;
|
||||
if (re.test(w)) {
|
||||
var fp = re.exec(w);
|
||||
stem = fp[1];
|
||||
re = new RegExp(mgr1);
|
||||
if (re.test(stem))
|
||||
w = stem;
|
||||
}
|
||||
else if (re2.test(w)) {
|
||||
var fp = re2.exec(w);
|
||||
stem = fp[1] + fp[2];
|
||||
re2 = new RegExp(mgr1);
|
||||
if (re2.test(stem))
|
||||
w = stem;
|
||||
}
|
||||
|
||||
// Step 5
|
||||
re = /^(.+?)e$/;
|
||||
if (re.test(w)) {
|
||||
var fp = re.exec(w);
|
||||
stem = fp[1];
|
||||
re = new RegExp(mgr1);
|
||||
re2 = new RegExp(meq1);
|
||||
re3 = new RegExp("^" + C + v + "[^aeiouwxy]$");
|
||||
if (re.test(stem) || (re2.test(stem) && !(re3.test(stem))))
|
||||
w = stem;
|
||||
}
|
||||
re = /ll$/;
|
||||
re2 = new RegExp(mgr1);
|
||||
if (re.test(w) && re2.test(w)) {
|
||||
re = /.$/;
|
||||
w = w.replace(re,"");
|
||||
}
|
||||
|
||||
// and turn initial Y back to y
|
||||
if (firstch == "y")
|
||||
w = firstch.toLowerCase() + w.substr(1);
|
||||
return w;
|
||||
}
|
||||
}
|
||||
|
||||
BIN
python/docs/_static/logo-dark-mode.png
vendored
Normal file
|
After Width: | Height: | Size: 49 KiB |
BIN
python/docs/_static/logo-light-mode.png
vendored
Normal file
|
After Width: | Height: | Size: 48 KiB |
BIN
python/docs/_static/minus.png
vendored
Normal file
|
After Width: | Height: | Size: 90 B |
9
python/docs/_static/nbsphinx-broken-thumbnail.svg
vendored
Normal file
@ -0,0 +1,9 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100">
|
||||
<style>
|
||||
svg { fill: lightcoral; }
|
||||
@media (prefers-color-scheme: dark) {
|
||||
svg { fill: crimson; }
|
||||
}
|
||||
</style>
|
||||
<path d="m 80.856887,38.34474 h 2.903414 l 7.066397,13.332221 V 38.34474 h 2.092166 V 54.281494 H 90.01545 L 82.949053,40.949273 v 13.332221 h -2.092166 z m -13.812565,0 h 10.076555 v 1.814633 h -7.920343 v 4.718048 h 7.589439 v 1.814634 h -7.589439 v 5.774805 h 8.11248 v 1.814634 H 67.044322 Z m -14.335606,0 h 2.156212 v 6.735493 L 62.01672,38.34474 h 2.775323 l -7.909669,7.429324 8.475407,8.50743 h -2.839368 l -7.653485,-7.674833 v 7.674833 h -2.156212 z m -10.738365,1.462381 q -2.348349,0 -3.73601,1.750588 -1.376987,1.750588 -1.376987,4.771419 0,3.010158 1.376987,4.760746 1.387661,1.750587 3.73601,1.750587 2.34835,0 3.714663,-1.750587 1.376986,-1.750588 1.376986,-4.760746 0,-3.020831 -1.376986,-4.771419 -1.366313,-1.750588 -3.714663,-1.750588 z m 0,-1.750588 q 3.351736,0 5.358507,2.252281 2.006772,2.241606 2.006772,6.020314 0,3.768034 -2.006772,6.020315 -2.006771,2.241606 -5.358507,2.241606 -3.362409,0 -5.379855,-2.241606 -2.006772,-2.241607 -2.006772,-6.020315 0,-3.778708 2.006772,-6.020314 2.017446,-2.252281 5.379855,-2.252281 z m -14.100771,8.75294 q 0.693831,0.234835 1.344964,1.003385 0.661808,0.768551 1.323615,2.113515 l 2.188235,4.355121 H 30.410068 L 28.371273,50.193231 Q 27.581374,48.592083 26.834172,48.069042 26.097644,47.546 24.816726,47.546 h -2.34835 v 6.735494 H 20.312164 V 38.34474 h 4.867489 q 2.732625,0 4.077588,1.142152 1.344964,1.142151 1.344964,3.447804 0,1.505078 -0.704505,2.49779 -0.69383,0.992711 -2.02812,1.376987 z m -5.401204,-6.692797 v 5.657388 h 2.711277 q 1.55845,0 2.348349,-0.715179 0.800574,-0.725854 0.800574,-2.124189 0,-1.398336 -0.800574,-2.102841 -0.789899,-0.715179 -2.348349,-0.715179 z M 7.4709616,46.670707 v 5.838851 H 10.92944 q 1.739914,0 2.57251,-0.71518 0.843271,-0.725853 0.843271,-2.209583 0,-1.494404 -0.843271,-2.198909 -0.832596,-0.715179 -2.57251,-0.715179 z m 0,-6.554031 v 4.803443 h 3.1916204 q 1.579799,0 2.34835,-0.587088 0.779225,-0.597761 0.779225,-1.814634 0,-1.206197 -0.779225,-1.803959 -0.768551,-0.597762 -2.34835,-0.597762 z M 5.3147497,38.34474 h 5.5079473 q 2.465767,0 3.800057,1.024734 1.334289,1.024734 1.334289,2.914088 0,1.462382 -0.683156,2.327001 -0.683156,0.86462 -2.006772,1.078106 1.590474,0.341579 2.465768,1.430359 0.885968,1.078106 0.885968,2.700602 0,2.134863 -1.451707,3.298364 -1.451707,1.1635 -4.130961,1.1635 H 5.3147497 Z M 65.365328,74.550032 H 67.52154 V 88.67217 h 7.760238 v 1.814634 h -9.91645 z m -6.447298,0 h 2.156214 V 90.486804 H 58.91803 Z m -9.955303,2.124192 -2.924765,7.931025 h 5.860208 z m -1.216872,-2.124192 h 2.444423 l 6.073692,15.936772 h -2.241606 l -1.451709,-4.088268 h -7.183824 l -1.451707,4.088268 H 41.66149 Z m -21.264514,0 h 2.903417 L 36.45116,87.882271 V 74.550032 h 2.092171 V 90.486804 H 35.639914 L 28.573506,77.154569 v 13.332235 h -2.092165 z m 53.454813,-9.781663 v 5.838858 h 3.458479 q 1.739919,0 2.572513,-0.71518 0.843272,-0.725854 0.843272,-2.209586 0,-1.494406 -0.843272,-2.198912 -0.832594,-0.71518 -2.572513,-0.71518 z m 0,-6.554037 v 4.803447 h 3.191622 q 1.579803,0 2.348355,-0.587087 0.779225,-0.597763 0.779225,-1.814636 0,-1.206199 -0.779225,-1.803962 -0.768552,-0.597762 -2.348355,-0.597762 z m -2.156212,-1.771939 h 5.507949 q 2.465772,0 3.800065,1.024736 1.334288,1.024735 1.334288,2.914092 0,1.462383 -0.683157,2.327003 -0.683157,0.864621 -2.006771,1.078107 1.590471,0.341579 2.46577,1.43036 0.885967,1.078107 0.885967,2.700605 0,2.134866 -1.451708,3.298368 -1.451709,1.163502 -4.130963,1.163502 h -5.72144 z m -18.86154,0 h 3.212969 l 4.066921,10.845118 4.088269,-10.845118 H 73.49953 V 72.379166 H 71.396692 V 58.385121 L 67.287075,69.315634 H 65.120183 L 61.010567,58.385121 v 13.994045 h -2.092165 z m -16.246333,0 h 2.166892 v 9.681616 q 0,2.561839 0.928668,3.693318 0.928667,1.120804 3.010159,1.120804 2.070819,0 2.999486,-1.120804 0.928667,-1.131479 0.928667,-3.693318 v -9.681616 h 2.166886 v 9.948474 q 0,3.116904 -1.547776,4.70738 -1.537104,1.590474 -4.547263,1.590474 -3.020833,0 -4.568615,-1.590474 -1.537104,-1.590476 -1.537104,-4.70738 z m -16.192953,0 h 2.156213 v 6.532689 h 7.83496 v -6.532689 h 2.156213 v 15.936773 h -2.156213 v -7.589448 h -7.83496 v 7.589448 h -2.156213 z m -15.563168,0 h 13.481671 v 1.814636 H 18.740226 V 72.379166 H 16.573339 V 58.257029 h -5.657391 z" />
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 4.4 KiB |
253
python/docs/_static/nbsphinx-code-cells.css
vendored
Normal file
@ -0,0 +1,253 @@
|
||||
/* remove conflicting styling from Sphinx themes */
|
||||
div.nbinput.container div.prompt *,
|
||||
div.nboutput.container div.prompt *,
|
||||
div.nbinput.container div.input_area pre,
|
||||
div.nboutput.container div.output_area pre,
|
||||
div.nbinput.container div.input_area .highlight,
|
||||
div.nboutput.container div.output_area .highlight {
|
||||
border: none;
|
||||
padding: 0;
|
||||
margin: 0;
|
||||
box-shadow: none;
|
||||
}
|
||||
|
||||
div.nbinput.container > div[class*=highlight],
|
||||
div.nboutput.container > div[class*=highlight] {
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
div.nbinput.container div.prompt *,
|
||||
div.nboutput.container div.prompt * {
|
||||
background: none;
|
||||
}
|
||||
|
||||
div.nboutput.container div.output_area .highlight,
|
||||
div.nboutput.container div.output_area pre {
|
||||
background: unset;
|
||||
}
|
||||
|
||||
div.nboutput.container div.output_area div.highlight {
|
||||
color: unset; /* override Pygments text color */
|
||||
}
|
||||
|
||||
/* avoid gaps between output lines */
|
||||
div.nboutput.container div[class*=highlight] pre {
|
||||
line-height: normal;
|
||||
}
|
||||
|
||||
/* input/output containers */
|
||||
div.nbinput.container,
|
||||
div.nboutput.container {
|
||||
display: -webkit-flex;
|
||||
display: flex;
|
||||
align-items: flex-start;
|
||||
margin: 0;
|
||||
width: 100%;
|
||||
}
|
||||
@media (max-width: 540px) {
|
||||
div.nbinput.container,
|
||||
div.nboutput.container {
|
||||
flex-direction: column;
|
||||
}
|
||||
}
|
||||
|
||||
/* input container */
|
||||
div.nbinput.container {
|
||||
padding-top: 5px;
|
||||
}
|
||||
|
||||
/* last container */
|
||||
div.nblast.container {
|
||||
padding-bottom: 5px;
|
||||
}
|
||||
|
||||
/* input prompt */
|
||||
div.nbinput.container div.prompt pre {
|
||||
color: #307FC1;
|
||||
}
|
||||
|
||||
/* output prompt */
|
||||
div.nboutput.container div.prompt pre {
|
||||
color: #BF5B3D;
|
||||
}
|
||||
|
||||
/* all prompts */
|
||||
div.nbinput.container div.prompt,
|
||||
div.nboutput.container div.prompt {
|
||||
width: 4.5ex;
|
||||
padding-top: 5px;
|
||||
position: relative;
|
||||
user-select: none;
|
||||
}
|
||||
|
||||
div.nbinput.container div.prompt > div,
|
||||
div.nboutput.container div.prompt > div {
|
||||
position: absolute;
|
||||
right: 0;
|
||||
margin-right: 0.3ex;
|
||||
}
|
||||
|
||||
@media (max-width: 540px) {
|
||||
div.nbinput.container div.prompt,
|
||||
div.nboutput.container div.prompt {
|
||||
width: unset;
|
||||
text-align: left;
|
||||
padding: 0.4em;
|
||||
}
|
||||
div.nboutput.container div.prompt.empty {
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
div.nbinput.container div.prompt > div,
|
||||
div.nboutput.container div.prompt > div {
|
||||
position: unset;
|
||||
}
|
||||
}
|
||||
|
||||
/* disable scrollbars and line breaks on prompts */
|
||||
div.nbinput.container div.prompt pre,
|
||||
div.nboutput.container div.prompt pre {
|
||||
overflow: hidden;
|
||||
white-space: pre;
|
||||
}
|
||||
|
||||
/* input/output area */
|
||||
div.nbinput.container div.input_area,
|
||||
div.nboutput.container div.output_area {
|
||||
-webkit-flex: 1;
|
||||
flex: 1;
|
||||
overflow: auto;
|
||||
}
|
||||
@media (max-width: 540px) {
|
||||
div.nbinput.container div.input_area,
|
||||
div.nboutput.container div.output_area {
|
||||
width: 100%;
|
||||
}
|
||||
}
|
||||
|
||||
/* input area */
|
||||
div.nbinput.container div.input_area {
|
||||
border: 1px solid #e0e0e0;
|
||||
border-radius: 2px;
|
||||
/*background: #f5f5f5;*/
|
||||
}
|
||||
|
||||
/* override MathJax center alignment in output cells */
|
||||
div.nboutput.container div[class*=MathJax] {
|
||||
text-align: left !important;
|
||||
}
|
||||
|
||||
/* override sphinx.ext.imgmath center alignment in output cells */
|
||||
div.nboutput.container div.math p {
|
||||
text-align: left;
|
||||
}
|
||||
|
||||
/* standard error */
|
||||
div.nboutput.container div.output_area.stderr {
|
||||
background: #fdd;
|
||||
}
|
||||
|
||||
/* ANSI colors */
|
||||
.ansi-black-fg { color: #3E424D; }
|
||||
.ansi-black-bg { background-color: #3E424D; }
|
||||
.ansi-black-intense-fg { color: #282C36; }
|
||||
.ansi-black-intense-bg { background-color: #282C36; }
|
||||
.ansi-red-fg { color: #E75C58; }
|
||||
.ansi-red-bg { background-color: #E75C58; }
|
||||
.ansi-red-intense-fg { color: #B22B31; }
|
||||
.ansi-red-intense-bg { background-color: #B22B31; }
|
||||
.ansi-green-fg { color: #00A250; }
|
||||
.ansi-green-bg { background-color: #00A250; }
|
||||
.ansi-green-intense-fg { color: #007427; }
|
||||
.ansi-green-intense-bg { background-color: #007427; }
|
||||
.ansi-yellow-fg { color: #DDB62B; }
|
||||
.ansi-yellow-bg { background-color: #DDB62B; }
|
||||
.ansi-yellow-intense-fg { color: #B27D12; }
|
||||
.ansi-yellow-intense-bg { background-color: #B27D12; }
|
||||
.ansi-blue-fg { color: #208FFB; }
|
||||
.ansi-blue-bg { background-color: #208FFB; }
|
||||
.ansi-blue-intense-fg { color: #0065CA; }
|
||||
.ansi-blue-intense-bg { background-color: #0065CA; }
|
||||
.ansi-magenta-fg { color: #D160C4; }
|
||||
.ansi-magenta-bg { background-color: #D160C4; }
|
||||
.ansi-magenta-intense-fg { color: #A03196; }
|
||||
.ansi-magenta-intense-bg { background-color: #A03196; }
|
||||
.ansi-cyan-fg { color: #60C6C8; }
|
||||
.ansi-cyan-bg { background-color: #60C6C8; }
|
||||
.ansi-cyan-intense-fg { color: #258F8F; }
|
||||
.ansi-cyan-intense-bg { background-color: #258F8F; }
|
||||
.ansi-white-fg { color: #C5C1B4; }
|
||||
.ansi-white-bg { background-color: #C5C1B4; }
|
||||
.ansi-white-intense-fg { color: #A1A6B2; }
|
||||
.ansi-white-intense-bg { background-color: #A1A6B2; }
|
||||
|
||||
.ansi-default-inverse-fg { color: #FFFFFF; }
|
||||
.ansi-default-inverse-bg { background-color: #000000; }
|
||||
|
||||
.ansi-bold { font-weight: bold; }
|
||||
.ansi-underline { text-decoration: underline; }
|
||||
|
||||
|
||||
div.nbinput.container div.input_area div[class*=highlight] > pre,
|
||||
div.nboutput.container div.output_area div[class*=highlight] > pre,
|
||||
div.nboutput.container div.output_area div[class*=highlight].math,
|
||||
div.nboutput.container div.output_area.rendered_html,
|
||||
div.nboutput.container div.output_area > div.output_javascript,
|
||||
div.nboutput.container div.output_area:not(.rendered_html) > img{
|
||||
padding: 5px;
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
/* fix copybtn overflow problem in chromium (needed for 'sphinx_copybutton') */
|
||||
div.nbinput.container div.input_area > div[class^='highlight'],
|
||||
div.nboutput.container div.output_area > div[class^='highlight']{
|
||||
overflow-y: hidden;
|
||||
}
|
||||
|
||||
/* hide copybtn icon on prompts (needed for 'sphinx_copybutton') */
|
||||
.prompt .copybtn {
|
||||
display: none;
|
||||
}
|
||||
|
||||
/* Some additional styling taken form the Jupyter notebook CSS */
|
||||
.jp-RenderedHTMLCommon table,
|
||||
div.rendered_html table {
|
||||
border: none;
|
||||
border-collapse: collapse;
|
||||
border-spacing: 0;
|
||||
color: black;
|
||||
font-size: 12px;
|
||||
table-layout: fixed;
|
||||
}
|
||||
.jp-RenderedHTMLCommon thead,
|
||||
div.rendered_html thead {
|
||||
border-bottom: 1px solid black;
|
||||
vertical-align: bottom;
|
||||
}
|
||||
.jp-RenderedHTMLCommon tr,
|
||||
.jp-RenderedHTMLCommon th,
|
||||
.jp-RenderedHTMLCommon td,
|
||||
div.rendered_html tr,
|
||||
div.rendered_html th,
|
||||
div.rendered_html td {
|
||||
text-align: right;
|
||||
vertical-align: middle;
|
||||
padding: 0.5em 0.5em;
|
||||
line-height: normal;
|
||||
white-space: normal;
|
||||
max-width: none;
|
||||
border: none;
|
||||
}
|
||||
.jp-RenderedHTMLCommon th,
|
||||
div.rendered_html th {
|
||||
font-weight: bold;
|
||||
}
|
||||
.jp-RenderedHTMLCommon tbody tr:nth-child(odd),
|
||||
div.rendered_html tbody tr:nth-child(odd) {
|
||||
background: #f5f5f5;
|
||||
}
|
||||
.jp-RenderedHTMLCommon tbody tr:hover,
|
||||
div.rendered_html tbody tr:hover {
|
||||
background: rgba(66, 165, 245, 0.2);
|
||||
}
|
||||
|
||||
31
python/docs/_static/nbsphinx-gallery.css
vendored
Normal file
@ -0,0 +1,31 @@
|
||||
.nbsphinx-gallery {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(auto-fill, minmax(160px, 1fr));
|
||||
gap: 5px;
|
||||
margin-top: 1em;
|
||||
margin-bottom: 1em;
|
||||
}
|
||||
|
||||
.nbsphinx-gallery > a {
|
||||
padding: 5px;
|
||||
border: 1px dotted currentColor;
|
||||
border-radius: 2px;
|
||||
text-align: center;
|
||||
}
|
||||
|
||||
.nbsphinx-gallery > a:hover {
|
||||
border-style: solid;
|
||||
}
|
||||
|
||||
.nbsphinx-gallery img {
|
||||
max-width: 100%;
|
||||
max-height: 100%;
|
||||
}
|
||||
|
||||
.nbsphinx-gallery > a > div:first-child {
|
||||
display: flex;
|
||||
align-items: start;
|
||||
justify-content: center;
|
||||
height: 120px;
|
||||
margin-bottom: 5px;
|
||||
}
|
||||
9
python/docs/_static/nbsphinx-no-thumbnail.svg
vendored
Normal file
@ -0,0 +1,9 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" width="100" height="100">
|
||||
<style>
|
||||
svg { fill: #ccc; }
|
||||
@media (prefers-color-scheme: dark) {
|
||||
svg { fill: #999; }
|
||||
}
|
||||
</style>
|
||||
<path d="M 65.365328,74.550032 H 67.52154 V 88.67217 h 7.760238 v 1.814634 h -9.91645 z m -6.447298,0 h 2.156214 V 90.486804 H 58.91803 Z m -9.955303,2.124192 -2.924765,7.931025 h 5.860208 z m -1.216872,-2.124192 h 2.444423 l 6.073692,15.936772 h -2.241606 l -1.451709,-4.088268 h -7.183824 l -1.451707,4.088268 H 41.66149 Z m -21.264514,0 h 2.903417 L 36.45116,87.882271 V 74.550032 h 2.092171 V 90.486804 H 35.639914 L 28.573506,77.154569 v 13.332235 h -2.092165 z m 53.454813,-9.781663 v 5.838858 h 3.458479 q 1.739919,0 2.572513,-0.71518 0.843272,-0.725854 0.843272,-2.209586 0,-1.494406 -0.843272,-2.198912 -0.832594,-0.71518 -2.572513,-0.71518 z m 0,-6.554037 v 4.803447 h 3.191622 q 1.579803,0 2.348355,-0.587087 0.779225,-0.597763 0.779225,-1.814636 0,-1.206199 -0.779225,-1.803962 -0.768552,-0.597762 -2.348355,-0.597762 z m -2.156212,-1.771939 h 5.507949 q 2.465772,0 3.800065,1.024736 1.334288,1.024735 1.334288,2.914092 0,1.462383 -0.683157,2.327003 -0.683157,0.864621 -2.006771,1.078107 1.590471,0.341579 2.46577,1.43036 0.885967,1.078107 0.885967,2.700605 0,2.134866 -1.451708,3.298368 -1.451709,1.163502 -4.130963,1.163502 h -5.72144 z m -18.86154,0 h 3.212969 l 4.066921,10.845118 4.088269,-10.845118 H 73.49953 V 72.379166 H 71.396692 V 58.385121 L 67.287075,69.315634 H 65.120183 L 61.010567,58.385121 v 13.994045 h -2.092165 z m -16.246333,0 h 2.166892 v 9.681616 q 0,2.561839 0.928668,3.693318 0.928667,1.120804 3.010159,1.120804 2.070819,0 2.999486,-1.120804 0.928667,-1.131479 0.928667,-3.693318 v -9.681616 h 2.166886 v 9.948474 q 0,3.116904 -1.547776,4.70738 -1.537104,1.590474 -4.547263,1.590474 -3.020833,0 -4.568615,-1.590474 -1.537104,-1.590476 -1.537104,-4.70738 z m -16.192953,0 h 2.156213 v 6.532689 h 7.83496 v -6.532689 h 2.156213 v 15.936773 h -2.156213 v -7.589448 h -7.83496 v 7.589448 h -2.156213 z m -15.563168,0 h 13.481671 v 1.814636 H 18.740226 V 72.379166 H 16.573339 V 58.257029 H 10.915948 Z M 65.497127,39.809967 q -2.34835,0 -3.736011,1.750588 -1.376987,1.750588 -1.376987,4.771419 0,3.010157 1.376987,4.760745 1.387661,1.750588 3.736011,1.750588 2.348349,0 3.714662,-1.750588 1.376986,-1.750588 1.376986,-4.760745 0,-3.020831 -1.376986,-4.771419 -1.366313,-1.750588 -3.714662,-1.750588 z m 0,-1.750588 q 3.351735,0 5.358506,2.252281 2.006772,2.241606 2.006772,6.020314 0,3.768034 -2.006772,6.020315 -2.006771,2.241606 -5.358506,2.241606 -3.36241,0 -5.379856,-2.241606 -2.006771,-2.241607 -2.006771,-6.020315 0,-3.778708 2.006771,-6.020314 2.017446,-2.252281 5.379856,-2.252281 z m -22.821688,0.288206 h 2.903414 L 52.64525,51.679807 V 38.347585 h 2.092166 V 54.28434 H 51.834001 L 44.767605,40.952119 V 54.28434 h -2.092166 z" />
|
||||
</svg>
|
||||
|
After Width: | Height: | Size: 2.8 KiB |
BIN
python/docs/_static/plus.png
vendored
Normal file
|
After Width: | Height: | Size: 90 B |
246
python/docs/_static/pygments.css
vendored
Normal file
@ -0,0 +1,246 @@
|
||||
.highlight pre { line-height: 125%; }
|
||||
.highlight td.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; }
|
||||
.highlight span.linenos { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; }
|
||||
.highlight td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; }
|
||||
.highlight span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; }
|
||||
.highlight .hll { background-color: #ffffcc }
|
||||
.highlight { background: #eeffcc; }
|
||||
.highlight .c { color: #408090; font-style: italic } /* Comment */
|
||||
.highlight .err { border: 1px solid #FF0000 } /* Error */
|
||||
.highlight .k { color: #007020; font-weight: bold } /* Keyword */
|
||||
.highlight .o { color: #666666 } /* Operator */
|
||||
.highlight .ch { color: #408090; font-style: italic } /* Comment.Hashbang */
|
||||
.highlight .cm { color: #408090; font-style: italic } /* Comment.Multiline */
|
||||
.highlight .cp { color: #007020 } /* Comment.Preproc */
|
||||
.highlight .cpf { color: #408090; font-style: italic } /* Comment.PreprocFile */
|
||||
.highlight .c1 { color: #408090; font-style: italic } /* Comment.Single */
|
||||
.highlight .cs { color: #408090; background-color: #fff0f0 } /* Comment.Special */
|
||||
.highlight .gd { color: #A00000 } /* Generic.Deleted */
|
||||
.highlight .ge { font-style: italic } /* Generic.Emph */
|
||||
.highlight .gr { color: #FF0000 } /* Generic.Error */
|
||||
.highlight .gh { color: #000080; font-weight: bold } /* Generic.Heading */
|
||||
.highlight .gi { color: #00A000 } /* Generic.Inserted */
|
||||
.highlight .go { color: #333333 } /* Generic.Output */
|
||||
.highlight .gp { color: #c65d09; font-weight: bold } /* Generic.Prompt */
|
||||
.highlight .gs { font-weight: bold } /* Generic.Strong */
|
||||
.highlight .gu { color: #800080; font-weight: bold } /* Generic.Subheading */
|
||||
.highlight .gt { color: #0044DD } /* Generic.Traceback */
|
||||
.highlight .kc { color: #007020; font-weight: bold } /* Keyword.Constant */
|
||||
.highlight .kd { color: #007020; font-weight: bold } /* Keyword.Declaration */
|
||||
.highlight .kn { color: #007020; font-weight: bold } /* Keyword.Namespace */
|
||||
.highlight .kp { color: #007020 } /* Keyword.Pseudo */
|
||||
.highlight .kr { color: #007020; font-weight: bold } /* Keyword.Reserved */
|
||||
.highlight .kt { color: #902000 } /* Keyword.Type */
|
||||
.highlight .m { color: #208050 } /* Literal.Number */
|
||||
.highlight .s { color: #4070a0 } /* Literal.String */
|
||||
.highlight .na { color: #4070a0 } /* Name.Attribute */
|
||||
.highlight .nb { color: #007020 } /* Name.Builtin */
|
||||
.highlight .nc { color: #0e84b5; font-weight: bold } /* Name.Class */
|
||||
.highlight .no { color: #60add5 } /* Name.Constant */
|
||||
.highlight .nd { color: #555555; font-weight: bold } /* Name.Decorator */
|
||||
.highlight .ni { color: #d55537; font-weight: bold } /* Name.Entity */
|
||||
.highlight .ne { color: #007020 } /* Name.Exception */
|
||||
.highlight .nf { color: #06287e } /* Name.Function */
|
||||
.highlight .nl { color: #002070; font-weight: bold } /* Name.Label */
|
||||
.highlight .nn { color: #0e84b5; font-weight: bold } /* Name.Namespace */
|
||||
.highlight .nt { color: #062873; font-weight: bold } /* Name.Tag */
|
||||
.highlight .nv { color: #bb60d5 } /* Name.Variable */
|
||||
.highlight .ow { color: #007020; font-weight: bold } /* Operator.Word */
|
||||
.highlight .w { color: #bbbbbb } /* Text.Whitespace */
|
||||
.highlight .mb { color: #208050 } /* Literal.Number.Bin */
|
||||
.highlight .mf { color: #208050 } /* Literal.Number.Float */
|
||||
.highlight .mh { color: #208050 } /* Literal.Number.Hex */
|
||||
.highlight .mi { color: #208050 } /* Literal.Number.Integer */
|
||||
.highlight .mo { color: #208050 } /* Literal.Number.Oct */
|
||||
.highlight .sa { color: #4070a0 } /* Literal.String.Affix */
|
||||
.highlight .sb { color: #4070a0 } /* Literal.String.Backtick */
|
||||
.highlight .sc { color: #4070a0 } /* Literal.String.Char */
|
||||
.highlight .dl { color: #4070a0 } /* Literal.String.Delimiter */
|
||||
.highlight .sd { color: #4070a0; font-style: italic } /* Literal.String.Doc */
|
||||
.highlight .s2 { color: #4070a0 } /* Literal.String.Double */
|
||||
.highlight .se { color: #4070a0; font-weight: bold } /* Literal.String.Escape */
|
||||
.highlight .sh { color: #4070a0 } /* Literal.String.Heredoc */
|
||||
.highlight .si { color: #70a0d0; font-style: italic } /* Literal.String.Interpol */
|
||||
.highlight .sx { color: #c65d09 } /* Literal.String.Other */
|
||||
.highlight .sr { color: #235388 } /* Literal.String.Regex */
|
||||
.highlight .s1 { color: #4070a0 } /* Literal.String.Single */
|
||||
.highlight .ss { color: #517918 } /* Literal.String.Symbol */
|
||||
.highlight .bp { color: #007020 } /* Name.Builtin.Pseudo */
|
||||
.highlight .fm { color: #06287e } /* Name.Function.Magic */
|
||||
.highlight .vc { color: #bb60d5 } /* Name.Variable.Class */
|
||||
.highlight .vg { color: #bb60d5 } /* Name.Variable.Global */
|
||||
.highlight .vi { color: #bb60d5 } /* Name.Variable.Instance */
|
||||
.highlight .vm { color: #bb60d5 } /* Name.Variable.Magic */
|
||||
.highlight .il { color: #208050 } /* Literal.Number.Integer.Long */
|
||||
@media not print {
|
||||
body[data-theme="dark"] .highlight pre { line-height: 125%; }
|
||||
body[data-theme="dark"] .highlight td.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; }
|
||||
body[data-theme="dark"] .highlight span.linenos { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; }
|
||||
body[data-theme="dark"] .highlight td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; }
|
||||
body[data-theme="dark"] .highlight span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; }
|
||||
body[data-theme="dark"] .highlight .hll { background-color: #49483e }
|
||||
body[data-theme="dark"] .highlight { background: #272822; color: #f8f8f2 }
|
||||
body[data-theme="dark"] .highlight .c { color: #75715e } /* Comment */
|
||||
body[data-theme="dark"] .highlight .err { color: #960050; background-color: #1e0010 } /* Error */
|
||||
body[data-theme="dark"] .highlight .esc { color: #f8f8f2 } /* Escape */
|
||||
body[data-theme="dark"] .highlight .g { color: #f8f8f2 } /* Generic */
|
||||
body[data-theme="dark"] .highlight .k { color: #66d9ef } /* Keyword */
|
||||
body[data-theme="dark"] .highlight .l { color: #ae81ff } /* Literal */
|
||||
body[data-theme="dark"] .highlight .n { color: #f8f8f2 } /* Name */
|
||||
body[data-theme="dark"] .highlight .o { color: #f92672 } /* Operator */
|
||||
body[data-theme="dark"] .highlight .x { color: #f8f8f2 } /* Other */
|
||||
body[data-theme="dark"] .highlight .p { color: #f8f8f2 } /* Punctuation */
|
||||
body[data-theme="dark"] .highlight .ch { color: #75715e } /* Comment.Hashbang */
|
||||
body[data-theme="dark"] .highlight .cm { color: #75715e } /* Comment.Multiline */
|
||||
body[data-theme="dark"] .highlight .cp { color: #75715e } /* Comment.Preproc */
|
||||
body[data-theme="dark"] .highlight .cpf { color: #75715e } /* Comment.PreprocFile */
|
||||
body[data-theme="dark"] .highlight .c1 { color: #75715e } /* Comment.Single */
|
||||
body[data-theme="dark"] .highlight .cs { color: #75715e } /* Comment.Special */
|
||||
body[data-theme="dark"] .highlight .gd { color: #f92672 } /* Generic.Deleted */
|
||||
body[data-theme="dark"] .highlight .ge { color: #f8f8f2; font-style: italic } /* Generic.Emph */
|
||||
body[data-theme="dark"] .highlight .gr { color: #f8f8f2 } /* Generic.Error */
|
||||
body[data-theme="dark"] .highlight .gh { color: #f8f8f2 } /* Generic.Heading */
|
||||
body[data-theme="dark"] .highlight .gi { color: #a6e22e } /* Generic.Inserted */
|
||||
body[data-theme="dark"] .highlight .go { color: #66d9ef } /* Generic.Output */
|
||||
body[data-theme="dark"] .highlight .gp { color: #f92672; font-weight: bold } /* Generic.Prompt */
|
||||
body[data-theme="dark"] .highlight .gs { color: #f8f8f2; font-weight: bold } /* Generic.Strong */
|
||||
body[data-theme="dark"] .highlight .gu { color: #75715e } /* Generic.Subheading */
|
||||
body[data-theme="dark"] .highlight .gt { color: #f8f8f2 } /* Generic.Traceback */
|
||||
body[data-theme="dark"] .highlight .kc { color: #66d9ef } /* Keyword.Constant */
|
||||
body[data-theme="dark"] .highlight .kd { color: #66d9ef } /* Keyword.Declaration */
|
||||
body[data-theme="dark"] .highlight .kn { color: #f92672 } /* Keyword.Namespace */
|
||||
body[data-theme="dark"] .highlight .kp { color: #66d9ef } /* Keyword.Pseudo */
|
||||
body[data-theme="dark"] .highlight .kr { color: #66d9ef } /* Keyword.Reserved */
|
||||
body[data-theme="dark"] .highlight .kt { color: #66d9ef } /* Keyword.Type */
|
||||
body[data-theme="dark"] .highlight .ld { color: #e6db74 } /* Literal.Date */
|
||||
body[data-theme="dark"] .highlight .m { color: #ae81ff } /* Literal.Number */
|
||||
body[data-theme="dark"] .highlight .s { color: #e6db74 } /* Literal.String */
|
||||
body[data-theme="dark"] .highlight .na { color: #a6e22e } /* Name.Attribute */
|
||||
body[data-theme="dark"] .highlight .nb { color: #f8f8f2 } /* Name.Builtin */
|
||||
body[data-theme="dark"] .highlight .nc { color: #a6e22e } /* Name.Class */
|
||||
body[data-theme="dark"] .highlight .no { color: #66d9ef } /* Name.Constant */
|
||||
body[data-theme="dark"] .highlight .nd { color: #a6e22e } /* Name.Decorator */
|
||||
body[data-theme="dark"] .highlight .ni { color: #f8f8f2 } /* Name.Entity */
|
||||
body[data-theme="dark"] .highlight .ne { color: #a6e22e } /* Name.Exception */
|
||||
body[data-theme="dark"] .highlight .nf { color: #a6e22e } /* Name.Function */
|
||||
body[data-theme="dark"] .highlight .nl { color: #f8f8f2 } /* Name.Label */
|
||||
body[data-theme="dark"] .highlight .nn { color: #f8f8f2 } /* Name.Namespace */
|
||||
body[data-theme="dark"] .highlight .nx { color: #a6e22e } /* Name.Other */
|
||||
body[data-theme="dark"] .highlight .py { color: #f8f8f2 } /* Name.Property */
|
||||
body[data-theme="dark"] .highlight .nt { color: #f92672 } /* Name.Tag */
|
||||
body[data-theme="dark"] .highlight .nv { color: #f8f8f2 } /* Name.Variable */
|
||||
body[data-theme="dark"] .highlight .ow { color: #f92672 } /* Operator.Word */
|
||||
body[data-theme="dark"] .highlight .pm { color: #f8f8f2 } /* Punctuation.Marker */
|
||||
body[data-theme="dark"] .highlight .w { color: #f8f8f2 } /* Text.Whitespace */
|
||||
body[data-theme="dark"] .highlight .mb { color: #ae81ff } /* Literal.Number.Bin */
|
||||
body[data-theme="dark"] .highlight .mf { color: #ae81ff } /* Literal.Number.Float */
|
||||
body[data-theme="dark"] .highlight .mh { color: #ae81ff } /* Literal.Number.Hex */
|
||||
body[data-theme="dark"] .highlight .mi { color: #ae81ff } /* Literal.Number.Integer */
|
||||
body[data-theme="dark"] .highlight .mo { color: #ae81ff } /* Literal.Number.Oct */
|
||||
body[data-theme="dark"] .highlight .sa { color: #e6db74 } /* Literal.String.Affix */
|
||||
body[data-theme="dark"] .highlight .sb { color: #e6db74 } /* Literal.String.Backtick */
|
||||
body[data-theme="dark"] .highlight .sc { color: #e6db74 } /* Literal.String.Char */
|
||||
body[data-theme="dark"] .highlight .dl { color: #e6db74 } /* Literal.String.Delimiter */
|
||||
body[data-theme="dark"] .highlight .sd { color: #e6db74 } /* Literal.String.Doc */
|
||||
body[data-theme="dark"] .highlight .s2 { color: #e6db74 } /* Literal.String.Double */
|
||||
body[data-theme="dark"] .highlight .se { color: #ae81ff } /* Literal.String.Escape */
|
||||
body[data-theme="dark"] .highlight .sh { color: #e6db74 } /* Literal.String.Heredoc */
|
||||
body[data-theme="dark"] .highlight .si { color: #e6db74 } /* Literal.String.Interpol */
|
||||
body[data-theme="dark"] .highlight .sx { color: #e6db74 } /* Literal.String.Other */
|
||||
body[data-theme="dark"] .highlight .sr { color: #e6db74 } /* Literal.String.Regex */
|
||||
body[data-theme="dark"] .highlight .s1 { color: #e6db74 } /* Literal.String.Single */
|
||||
body[data-theme="dark"] .highlight .ss { color: #e6db74 } /* Literal.String.Symbol */
|
||||
body[data-theme="dark"] .highlight .bp { color: #f8f8f2 } /* Name.Builtin.Pseudo */
|
||||
body[data-theme="dark"] .highlight .fm { color: #a6e22e } /* Name.Function.Magic */
|
||||
body[data-theme="dark"] .highlight .vc { color: #f8f8f2 } /* Name.Variable.Class */
|
||||
body[data-theme="dark"] .highlight .vg { color: #f8f8f2 } /* Name.Variable.Global */
|
||||
body[data-theme="dark"] .highlight .vi { color: #f8f8f2 } /* Name.Variable.Instance */
|
||||
body[data-theme="dark"] .highlight .vm { color: #f8f8f2 } /* Name.Variable.Magic */
|
||||
body[data-theme="dark"] .highlight .il { color: #ae81ff } /* Literal.Number.Integer.Long */
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) .highlight pre { line-height: 125%; }
|
||||
body:not([data-theme="light"]) .highlight td.linenos .normal { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; }
|
||||
body:not([data-theme="light"]) .highlight span.linenos { color: inherit; background-color: transparent; padding-left: 5px; padding-right: 5px; }
|
||||
body:not([data-theme="light"]) .highlight td.linenos .special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; }
|
||||
body:not([data-theme="light"]) .highlight span.linenos.special { color: #000000; background-color: #ffffc0; padding-left: 5px; padding-right: 5px; }
|
||||
body:not([data-theme="light"]) .highlight .hll { background-color: #49483e }
|
||||
body:not([data-theme="light"]) .highlight { background: #272822; color: #f8f8f2 }
|
||||
body:not([data-theme="light"]) .highlight .c { color: #75715e } /* Comment */
|
||||
body:not([data-theme="light"]) .highlight .err { color: #960050; background-color: #1e0010 } /* Error */
|
||||
body:not([data-theme="light"]) .highlight .esc { color: #f8f8f2 } /* Escape */
|
||||
body:not([data-theme="light"]) .highlight .g { color: #f8f8f2 } /* Generic */
|
||||
body:not([data-theme="light"]) .highlight .k { color: #66d9ef } /* Keyword */
|
||||
body:not([data-theme="light"]) .highlight .l { color: #ae81ff } /* Literal */
|
||||
body:not([data-theme="light"]) .highlight .n { color: #f8f8f2 } /* Name */
|
||||
body:not([data-theme="light"]) .highlight .o { color: #f92672 } /* Operator */
|
||||
body:not([data-theme="light"]) .highlight .x { color: #f8f8f2 } /* Other */
|
||||
body:not([data-theme="light"]) .highlight .p { color: #f8f8f2 } /* Punctuation */
|
||||
body:not([data-theme="light"]) .highlight .ch { color: #75715e } /* Comment.Hashbang */
|
||||
body:not([data-theme="light"]) .highlight .cm { color: #75715e } /* Comment.Multiline */
|
||||
body:not([data-theme="light"]) .highlight .cp { color: #75715e } /* Comment.Preproc */
|
||||
body:not([data-theme="light"]) .highlight .cpf { color: #75715e } /* Comment.PreprocFile */
|
||||
body:not([data-theme="light"]) .highlight .c1 { color: #75715e } /* Comment.Single */
|
||||
body:not([data-theme="light"]) .highlight .cs { color: #75715e } /* Comment.Special */
|
||||
body:not([data-theme="light"]) .highlight .gd { color: #f92672 } /* Generic.Deleted */
|
||||
body:not([data-theme="light"]) .highlight .ge { color: #f8f8f2; font-style: italic } /* Generic.Emph */
|
||||
body:not([data-theme="light"]) .highlight .gr { color: #f8f8f2 } /* Generic.Error */
|
||||
body:not([data-theme="light"]) .highlight .gh { color: #f8f8f2 } /* Generic.Heading */
|
||||
body:not([data-theme="light"]) .highlight .gi { color: #a6e22e } /* Generic.Inserted */
|
||||
body:not([data-theme="light"]) .highlight .go { color: #66d9ef } /* Generic.Output */
|
||||
body:not([data-theme="light"]) .highlight .gp { color: #f92672; font-weight: bold } /* Generic.Prompt */
|
||||
body:not([data-theme="light"]) .highlight .gs { color: #f8f8f2; font-weight: bold } /* Generic.Strong */
|
||||
body:not([data-theme="light"]) .highlight .gu { color: #75715e } /* Generic.Subheading */
|
||||
body:not([data-theme="light"]) .highlight .gt { color: #f8f8f2 } /* Generic.Traceback */
|
||||
body:not([data-theme="light"]) .highlight .kc { color: #66d9ef } /* Keyword.Constant */
|
||||
body:not([data-theme="light"]) .highlight .kd { color: #66d9ef } /* Keyword.Declaration */
|
||||
body:not([data-theme="light"]) .highlight .kn { color: #f92672 } /* Keyword.Namespace */
|
||||
body:not([data-theme="light"]) .highlight .kp { color: #66d9ef } /* Keyword.Pseudo */
|
||||
body:not([data-theme="light"]) .highlight .kr { color: #66d9ef } /* Keyword.Reserved */
|
||||
body:not([data-theme="light"]) .highlight .kt { color: #66d9ef } /* Keyword.Type */
|
||||
body:not([data-theme="light"]) .highlight .ld { color: #e6db74 } /* Literal.Date */
|
||||
body:not([data-theme="light"]) .highlight .m { color: #ae81ff } /* Literal.Number */
|
||||
body:not([data-theme="light"]) .highlight .s { color: #e6db74 } /* Literal.String */
|
||||
body:not([data-theme="light"]) .highlight .na { color: #a6e22e } /* Name.Attribute */
|
||||
body:not([data-theme="light"]) .highlight .nb { color: #f8f8f2 } /* Name.Builtin */
|
||||
body:not([data-theme="light"]) .highlight .nc { color: #a6e22e } /* Name.Class */
|
||||
body:not([data-theme="light"]) .highlight .no { color: #66d9ef } /* Name.Constant */
|
||||
body:not([data-theme="light"]) .highlight .nd { color: #a6e22e } /* Name.Decorator */
|
||||
body:not([data-theme="light"]) .highlight .ni { color: #f8f8f2 } /* Name.Entity */
|
||||
body:not([data-theme="light"]) .highlight .ne { color: #a6e22e } /* Name.Exception */
|
||||
body:not([data-theme="light"]) .highlight .nf { color: #a6e22e } /* Name.Function */
|
||||
body:not([data-theme="light"]) .highlight .nl { color: #f8f8f2 } /* Name.Label */
|
||||
body:not([data-theme="light"]) .highlight .nn { color: #f8f8f2 } /* Name.Namespace */
|
||||
body:not([data-theme="light"]) .highlight .nx { color: #a6e22e } /* Name.Other */
|
||||
body:not([data-theme="light"]) .highlight .py { color: #f8f8f2 } /* Name.Property */
|
||||
body:not([data-theme="light"]) .highlight .nt { color: #f92672 } /* Name.Tag */
|
||||
body:not([data-theme="light"]) .highlight .nv { color: #f8f8f2 } /* Name.Variable */
|
||||
body:not([data-theme="light"]) .highlight .ow { color: #f92672 } /* Operator.Word */
|
||||
body:not([data-theme="light"]) .highlight .pm { color: #f8f8f2 } /* Punctuation.Marker */
|
||||
body:not([data-theme="light"]) .highlight .w { color: #f8f8f2 } /* Text.Whitespace */
|
||||
body:not([data-theme="light"]) .highlight .mb { color: #ae81ff } /* Literal.Number.Bin */
|
||||
body:not([data-theme="light"]) .highlight .mf { color: #ae81ff } /* Literal.Number.Float */
|
||||
body:not([data-theme="light"]) .highlight .mh { color: #ae81ff } /* Literal.Number.Hex */
|
||||
body:not([data-theme="light"]) .highlight .mi { color: #ae81ff } /* Literal.Number.Integer */
|
||||
body:not([data-theme="light"]) .highlight .mo { color: #ae81ff } /* Literal.Number.Oct */
|
||||
body:not([data-theme="light"]) .highlight .sa { color: #e6db74 } /* Literal.String.Affix */
|
||||
body:not([data-theme="light"]) .highlight .sb { color: #e6db74 } /* Literal.String.Backtick */
|
||||
body:not([data-theme="light"]) .highlight .sc { color: #e6db74 } /* Literal.String.Char */
|
||||
body:not([data-theme="light"]) .highlight .dl { color: #e6db74 } /* Literal.String.Delimiter */
|
||||
body:not([data-theme="light"]) .highlight .sd { color: #e6db74 } /* Literal.String.Doc */
|
||||
body:not([data-theme="light"]) .highlight .s2 { color: #e6db74 } /* Literal.String.Double */
|
||||
body:not([data-theme="light"]) .highlight .se { color: #ae81ff } /* Literal.String.Escape */
|
||||
body:not([data-theme="light"]) .highlight .sh { color: #e6db74 } /* Literal.String.Heredoc */
|
||||
body:not([data-theme="light"]) .highlight .si { color: #e6db74 } /* Literal.String.Interpol */
|
||||
body:not([data-theme="light"]) .highlight .sx { color: #e6db74 } /* Literal.String.Other */
|
||||
body:not([data-theme="light"]) .highlight .sr { color: #e6db74 } /* Literal.String.Regex */
|
||||
body:not([data-theme="light"]) .highlight .s1 { color: #e6db74 } /* Literal.String.Single */
|
||||
body:not([data-theme="light"]) .highlight .ss { color: #e6db74 } /* Literal.String.Symbol */
|
||||
body:not([data-theme="light"]) .highlight .bp { color: #f8f8f2 } /* Name.Builtin.Pseudo */
|
||||
body:not([data-theme="light"]) .highlight .fm { color: #a6e22e } /* Name.Function.Magic */
|
||||
body:not([data-theme="light"]) .highlight .vc { color: #f8f8f2 } /* Name.Variable.Class */
|
||||
body:not([data-theme="light"]) .highlight .vg { color: #f8f8f2 } /* Name.Variable.Global */
|
||||
body:not([data-theme="light"]) .highlight .vi { color: #f8f8f2 } /* Name.Variable.Instance */
|
||||
body:not([data-theme="light"]) .highlight .vm { color: #f8f8f2 } /* Name.Variable.Magic */
|
||||
body:not([data-theme="light"]) .highlight .il { color: #ae81ff } /* Literal.Number.Integer.Long */
|
||||
}
|
||||
}
|
||||
0
python/docs/_static/scripts/furo-extensions.js
vendored
Normal file
3
python/docs/_static/scripts/furo.js
vendored
Normal file
7
python/docs/_static/scripts/furo.js.LICENSE.txt
vendored
Normal file
@ -0,0 +1,7 @@
|
||||
/*!
|
||||
* gumshoejs v5.1.2 (patched by @pradyunsg)
|
||||
* A simple, framework-agnostic scrollspy script.
|
||||
* (c) 2019 Chris Ferdinandi
|
||||
* MIT License
|
||||
* http://github.com/cferdinandi/gumshoe
|
||||
*/
|
||||
1
python/docs/_static/scripts/furo.js.map
vendored
Normal file
566
python/docs/_static/searchtools.js
vendored
Normal file
@ -0,0 +1,566 @@
|
||||
/*
|
||||
* searchtools.js
|
||||
* ~~~~~~~~~~~~~~~~
|
||||
*
|
||||
* Sphinx JavaScript utilities for the full-text search.
|
||||
*
|
||||
* :copyright: Copyright 2007-2023 by the Sphinx team, see AUTHORS.
|
||||
* :license: BSD, see LICENSE for details.
|
||||
*
|
||||
*/
|
||||
"use strict";
|
||||
|
||||
/**
|
||||
* Simple result scoring code.
|
||||
*/
|
||||
if (typeof Scorer === "undefined") {
|
||||
var Scorer = {
|
||||
// Implement the following function to further tweak the score for each result
|
||||
// The function takes a result array [docname, title, anchor, descr, score, filename]
|
||||
// and returns the new score.
|
||||
/*
|
||||
score: result => {
|
||||
const [docname, title, anchor, descr, score, filename] = result
|
||||
return score
|
||||
},
|
||||
*/
|
||||
|
||||
// query matches the full name of an object
|
||||
objNameMatch: 11,
|
||||
// or matches in the last dotted part of the object name
|
||||
objPartialMatch: 6,
|
||||
// Additive scores depending on the priority of the object
|
||||
objPrio: {
|
||||
0: 15, // used to be importantResults
|
||||
1: 5, // used to be objectResults
|
||||
2: -5, // used to be unimportantResults
|
||||
},
|
||||
// Used when the priority is not in the mapping.
|
||||
objPrioDefault: 0,
|
||||
|
||||
// query found in title
|
||||
title: 15,
|
||||
partialTitle: 7,
|
||||
// query found in terms
|
||||
term: 5,
|
||||
partialTerm: 2,
|
||||
};
|
||||
}
|
||||
|
||||
const _removeChildren = (element) => {
|
||||
while (element && element.lastChild) element.removeChild(element.lastChild);
|
||||
};
|
||||
|
||||
/**
|
||||
* See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#escaping
|
||||
*/
|
||||
const _escapeRegExp = (string) =>
|
||||
string.replace(/[.*+\-?^${}()|[\]\\]/g, "\\$&"); // $& means the whole matched string
|
||||
|
||||
const _displayItem = (item, searchTerms) => {
|
||||
const docBuilder = DOCUMENTATION_OPTIONS.BUILDER;
|
||||
const docUrlRoot = DOCUMENTATION_OPTIONS.URL_ROOT;
|
||||
const docFileSuffix = DOCUMENTATION_OPTIONS.FILE_SUFFIX;
|
||||
const docLinkSuffix = DOCUMENTATION_OPTIONS.LINK_SUFFIX;
|
||||
const showSearchSummary = DOCUMENTATION_OPTIONS.SHOW_SEARCH_SUMMARY;
|
||||
|
||||
const [docName, title, anchor, descr, score, _filename] = item;
|
||||
|
||||
let listItem = document.createElement("li");
|
||||
let requestUrl;
|
||||
let linkUrl;
|
||||
if (docBuilder === "dirhtml") {
|
||||
// dirhtml builder
|
||||
let dirname = docName + "/";
|
||||
if (dirname.match(/\/index\/$/))
|
||||
dirname = dirname.substring(0, dirname.length - 6);
|
||||
else if (dirname === "index/") dirname = "";
|
||||
requestUrl = docUrlRoot + dirname;
|
||||
linkUrl = requestUrl;
|
||||
} else {
|
||||
// normal html builders
|
||||
requestUrl = docUrlRoot + docName + docFileSuffix;
|
||||
linkUrl = docName + docLinkSuffix;
|
||||
}
|
||||
let linkEl = listItem.appendChild(document.createElement("a"));
|
||||
linkEl.href = linkUrl + anchor;
|
||||
linkEl.dataset.score = score;
|
||||
linkEl.innerHTML = title;
|
||||
if (descr)
|
||||
listItem.appendChild(document.createElement("span")).innerHTML =
|
||||
" (" + descr + ")";
|
||||
else if (showSearchSummary)
|
||||
fetch(requestUrl)
|
||||
.then((responseData) => responseData.text())
|
||||
.then((data) => {
|
||||
if (data)
|
||||
listItem.appendChild(
|
||||
Search.makeSearchSummary(data, searchTerms)
|
||||
);
|
||||
});
|
||||
Search.output.appendChild(listItem);
|
||||
};
|
||||
const _finishSearch = (resultCount) => {
|
||||
Search.stopPulse();
|
||||
Search.title.innerText = _("Search Results");
|
||||
if (!resultCount)
|
||||
Search.status.innerText = Documentation.gettext(
|
||||
"Your search did not match any documents. Please make sure that all words are spelled correctly and that you've selected enough categories."
|
||||
);
|
||||
else
|
||||
Search.status.innerText = _(
|
||||
`Search finished, found ${resultCount} page(s) matching the search query.`
|
||||
);
|
||||
};
|
||||
const _displayNextItem = (
|
||||
results,
|
||||
resultCount,
|
||||
searchTerms
|
||||
) => {
|
||||
// results left, load the summary and display it
|
||||
// this is intended to be dynamic (don't sub resultsCount)
|
||||
if (results.length) {
|
||||
_displayItem(results.pop(), searchTerms);
|
||||
setTimeout(
|
||||
() => _displayNextItem(results, resultCount, searchTerms),
|
||||
5
|
||||
);
|
||||
}
|
||||
// search finished, update title and status message
|
||||
else _finishSearch(resultCount);
|
||||
};
|
||||
|
||||
/**
|
||||
* Default splitQuery function. Can be overridden in ``sphinx.search`` with a
|
||||
* custom function per language.
|
||||
*
|
||||
* The regular expression works by splitting the string on consecutive characters
|
||||
* that are not Unicode letters, numbers, underscores, or emoji characters.
|
||||
* This is the same as ``\W+`` in Python, preserving the surrogate pair area.
|
||||
*/
|
||||
if (typeof splitQuery === "undefined") {
|
||||
var splitQuery = (query) => query
|
||||
.split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}]+/gu)
|
||||
.filter(term => term) // remove remaining empty strings
|
||||
}
|
||||
|
||||
/**
|
||||
* Search Module
|
||||
*/
|
||||
const Search = {
|
||||
_index: null,
|
||||
_queued_query: null,
|
||||
_pulse_status: -1,
|
||||
|
||||
htmlToText: (htmlString) => {
|
||||
const htmlElement = new DOMParser().parseFromString(htmlString, 'text/html');
|
||||
htmlElement.querySelectorAll(".headerlink").forEach((el) => { el.remove() });
|
||||
const docContent = htmlElement.querySelector('[role="main"]');
|
||||
if (docContent !== undefined) return docContent.textContent;
|
||||
console.warn(
|
||||
"Content block not found. Sphinx search tries to obtain it via '[role=main]'. Could you check your theme or template."
|
||||
);
|
||||
return "";
|
||||
},
|
||||
|
||||
init: () => {
|
||||
const query = new URLSearchParams(window.location.search).get("q");
|
||||
document
|
||||
.querySelectorAll('input[name="q"]')
|
||||
.forEach((el) => (el.value = query));
|
||||
if (query) Search.performSearch(query);
|
||||
},
|
||||
|
||||
loadIndex: (url) =>
|
||||
(document.body.appendChild(document.createElement("script")).src = url),
|
||||
|
||||
setIndex: (index) => {
|
||||
Search._index = index;
|
||||
if (Search._queued_query !== null) {
|
||||
const query = Search._queued_query;
|
||||
Search._queued_query = null;
|
||||
Search.query(query);
|
||||
}
|
||||
},
|
||||
|
||||
hasIndex: () => Search._index !== null,
|
||||
|
||||
deferQuery: (query) => (Search._queued_query = query),
|
||||
|
||||
stopPulse: () => (Search._pulse_status = -1),
|
||||
|
||||
startPulse: () => {
|
||||
if (Search._pulse_status >= 0) return;
|
||||
|
||||
const pulse = () => {
|
||||
Search._pulse_status = (Search._pulse_status + 1) % 4;
|
||||
Search.dots.innerText = ".".repeat(Search._pulse_status);
|
||||
if (Search._pulse_status >= 0) window.setTimeout(pulse, 500);
|
||||
};
|
||||
pulse();
|
||||
},
|
||||
|
||||
/**
|
||||
* perform a search for something (or wait until index is loaded)
|
||||
*/
|
||||
performSearch: (query) => {
|
||||
// create the required interface elements
|
||||
const searchText = document.createElement("h2");
|
||||
searchText.textContent = _("Searching");
|
||||
const searchSummary = document.createElement("p");
|
||||
searchSummary.classList.add("search-summary");
|
||||
searchSummary.innerText = "";
|
||||
const searchList = document.createElement("ul");
|
||||
searchList.classList.add("search");
|
||||
|
||||
const out = document.getElementById("search-results");
|
||||
Search.title = out.appendChild(searchText);
|
||||
Search.dots = Search.title.appendChild(document.createElement("span"));
|
||||
Search.status = out.appendChild(searchSummary);
|
||||
Search.output = out.appendChild(searchList);
|
||||
|
||||
const searchProgress = document.getElementById("search-progress");
|
||||
// Some themes don't use the search progress node
|
||||
if (searchProgress) {
|
||||
searchProgress.innerText = _("Preparing search...");
|
||||
}
|
||||
Search.startPulse();
|
||||
|
||||
// index already loaded, the browser was quick!
|
||||
if (Search.hasIndex()) Search.query(query);
|
||||
else Search.deferQuery(query);
|
||||
},
|
||||
|
||||
/**
|
||||
* execute search (requires search index to be loaded)
|
||||
*/
|
||||
query: (query) => {
|
||||
const filenames = Search._index.filenames;
|
||||
const docNames = Search._index.docnames;
|
||||
const titles = Search._index.titles;
|
||||
const allTitles = Search._index.alltitles;
|
||||
const indexEntries = Search._index.indexentries;
|
||||
|
||||
// stem the search terms and add them to the correct list
|
||||
const stemmer = new Stemmer();
|
||||
const searchTerms = new Set();
|
||||
const excludedTerms = new Set();
|
||||
const highlightTerms = new Set();
|
||||
const objectTerms = new Set(splitQuery(query.toLowerCase().trim()));
|
||||
splitQuery(query.trim()).forEach((queryTerm) => {
|
||||
const queryTermLower = queryTerm.toLowerCase();
|
||||
|
||||
// maybe skip this "word"
|
||||
// stopwords array is from language_data.js
|
||||
if (
|
||||
stopwords.indexOf(queryTermLower) !== -1 ||
|
||||
queryTerm.match(/^\d+$/)
|
||||
)
|
||||
return;
|
||||
|
||||
// stem the word
|
||||
let word = stemmer.stemWord(queryTermLower);
|
||||
// select the correct list
|
||||
if (word[0] === "-") excludedTerms.add(word.substr(1));
|
||||
else {
|
||||
searchTerms.add(word);
|
||||
highlightTerms.add(queryTermLower);
|
||||
}
|
||||
});
|
||||
|
||||
if (SPHINX_HIGHLIGHT_ENABLED) { // set in sphinx_highlight.js
|
||||
localStorage.setItem("sphinx_highlight_terms", [...highlightTerms].join(" "))
|
||||
}
|
||||
|
||||
// console.debug("SEARCH: searching for:");
|
||||
// console.info("required: ", [...searchTerms]);
|
||||
// console.info("excluded: ", [...excludedTerms]);
|
||||
|
||||
// array of [docname, title, anchor, descr, score, filename]
|
||||
let results = [];
|
||||
_removeChildren(document.getElementById("search-progress"));
|
||||
|
||||
const queryLower = query.toLowerCase();
|
||||
for (const [title, foundTitles] of Object.entries(allTitles)) {
|
||||
if (title.toLowerCase().includes(queryLower) && (queryLower.length >= title.length/2)) {
|
||||
for (const [file, id] of foundTitles) {
|
||||
let score = Math.round(100 * queryLower.length / title.length)
|
||||
results.push([
|
||||
docNames[file],
|
||||
titles[file] !== title ? `${titles[file]} > ${title}` : title,
|
||||
id !== null ? "#" + id : "",
|
||||
null,
|
||||
score,
|
||||
filenames[file],
|
||||
]);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// search for explicit entries in index directives
|
||||
for (const [entry, foundEntries] of Object.entries(indexEntries)) {
|
||||
if (entry.includes(queryLower) && (queryLower.length >= entry.length/2)) {
|
||||
for (const [file, id] of foundEntries) {
|
||||
let score = Math.round(100 * queryLower.length / entry.length)
|
||||
results.push([
|
||||
docNames[file],
|
||||
titles[file],
|
||||
id ? "#" + id : "",
|
||||
null,
|
||||
score,
|
||||
filenames[file],
|
||||
]);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// lookup as object
|
||||
objectTerms.forEach((term) =>
|
||||
results.push(...Search.performObjectSearch(term, objectTerms))
|
||||
);
|
||||
|
||||
// lookup as search terms in fulltext
|
||||
results.push(...Search.performTermsSearch(searchTerms, excludedTerms));
|
||||
|
||||
// let the scorer override scores with a custom scoring function
|
||||
if (Scorer.score) results.forEach((item) => (item[4] = Scorer.score(item)));
|
||||
|
||||
// now sort the results by score (in opposite order of appearance, since the
|
||||
// display function below uses pop() to retrieve items) and then
|
||||
// alphabetically
|
||||
results.sort((a, b) => {
|
||||
const leftScore = a[4];
|
||||
const rightScore = b[4];
|
||||
if (leftScore === rightScore) {
|
||||
// same score: sort alphabetically
|
||||
const leftTitle = a[1].toLowerCase();
|
||||
const rightTitle = b[1].toLowerCase();
|
||||
if (leftTitle === rightTitle) return 0;
|
||||
return leftTitle > rightTitle ? -1 : 1; // inverted is intentional
|
||||
}
|
||||
return leftScore > rightScore ? 1 : -1;
|
||||
});
|
||||
|
||||
// remove duplicate search results
|
||||
// note the reversing of results, so that in the case of duplicates, the highest-scoring entry is kept
|
||||
let seen = new Set();
|
||||
results = results.reverse().reduce((acc, result) => {
|
||||
let resultStr = result.slice(0, 4).concat([result[5]]).map(v => String(v)).join(',');
|
||||
if (!seen.has(resultStr)) {
|
||||
acc.push(result);
|
||||
seen.add(resultStr);
|
||||
}
|
||||
return acc;
|
||||
}, []);
|
||||
|
||||
results = results.reverse();
|
||||
|
||||
// for debugging
|
||||
//Search.lastresults = results.slice(); // a copy
|
||||
// console.info("search results:", Search.lastresults);
|
||||
|
||||
// print the results
|
||||
_displayNextItem(results, results.length, searchTerms);
|
||||
},
|
||||
|
||||
/**
|
||||
* search for object names
|
||||
*/
|
||||
performObjectSearch: (object, objectTerms) => {
|
||||
const filenames = Search._index.filenames;
|
||||
const docNames = Search._index.docnames;
|
||||
const objects = Search._index.objects;
|
||||
const objNames = Search._index.objnames;
|
||||
const titles = Search._index.titles;
|
||||
|
||||
const results = [];
|
||||
|
||||
const objectSearchCallback = (prefix, match) => {
|
||||
const name = match[4]
|
||||
const fullname = (prefix ? prefix + "." : "") + name;
|
||||
const fullnameLower = fullname.toLowerCase();
|
||||
if (fullnameLower.indexOf(object) < 0) return;
|
||||
|
||||
let score = 0;
|
||||
const parts = fullnameLower.split(".");
|
||||
|
||||
// check for different match types: exact matches of full name or
|
||||
// "last name" (i.e. last dotted part)
|
||||
if (fullnameLower === object || parts.slice(-1)[0] === object)
|
||||
score += Scorer.objNameMatch;
|
||||
else if (parts.slice(-1)[0].indexOf(object) > -1)
|
||||
score += Scorer.objPartialMatch; // matches in last name
|
||||
|
||||
const objName = objNames[match[1]][2];
|
||||
const title = titles[match[0]];
|
||||
|
||||
// If more than one term searched for, we require other words to be
|
||||
// found in the name/title/description
|
||||
const otherTerms = new Set(objectTerms);
|
||||
otherTerms.delete(object);
|
||||
if (otherTerms.size > 0) {
|
||||
const haystack = `${prefix} ${name} ${objName} ${title}`.toLowerCase();
|
||||
if (
|
||||
[...otherTerms].some((otherTerm) => haystack.indexOf(otherTerm) < 0)
|
||||
)
|
||||
return;
|
||||
}
|
||||
|
||||
let anchor = match[3];
|
||||
if (anchor === "") anchor = fullname;
|
||||
else if (anchor === "-") anchor = objNames[match[1]][1] + "-" + fullname;
|
||||
|
||||
const descr = objName + _(", in ") + title;
|
||||
|
||||
// add custom score for some objects according to scorer
|
||||
if (Scorer.objPrio.hasOwnProperty(match[2]))
|
||||
score += Scorer.objPrio[match[2]];
|
||||
else score += Scorer.objPrioDefault;
|
||||
|
||||
results.push([
|
||||
docNames[match[0]],
|
||||
fullname,
|
||||
"#" + anchor,
|
||||
descr,
|
||||
score,
|
||||
filenames[match[0]],
|
||||
]);
|
||||
};
|
||||
Object.keys(objects).forEach((prefix) =>
|
||||
objects[prefix].forEach((array) =>
|
||||
objectSearchCallback(prefix, array)
|
||||
)
|
||||
);
|
||||
return results;
|
||||
},
|
||||
|
||||
/**
|
||||
* search for full-text terms in the index
|
||||
*/
|
||||
performTermsSearch: (searchTerms, excludedTerms) => {
|
||||
// prepare search
|
||||
const terms = Search._index.terms;
|
||||
const titleTerms = Search._index.titleterms;
|
||||
const filenames = Search._index.filenames;
|
||||
const docNames = Search._index.docnames;
|
||||
const titles = Search._index.titles;
|
||||
|
||||
const scoreMap = new Map();
|
||||
const fileMap = new Map();
|
||||
|
||||
// perform the search on the required terms
|
||||
searchTerms.forEach((word) => {
|
||||
const files = [];
|
||||
const arr = [
|
||||
{ files: terms[word], score: Scorer.term },
|
||||
{ files: titleTerms[word], score: Scorer.title },
|
||||
];
|
||||
// add support for partial matches
|
||||
if (word.length > 2) {
|
||||
const escapedWord = _escapeRegExp(word);
|
||||
Object.keys(terms).forEach((term) => {
|
||||
if (term.match(escapedWord) && !terms[word])
|
||||
arr.push({ files: terms[term], score: Scorer.partialTerm });
|
||||
});
|
||||
Object.keys(titleTerms).forEach((term) => {
|
||||
if (term.match(escapedWord) && !titleTerms[word])
|
||||
arr.push({ files: titleTerms[word], score: Scorer.partialTitle });
|
||||
});
|
||||
}
|
||||
|
||||
// no match but word was a required one
|
||||
if (arr.every((record) => record.files === undefined)) return;
|
||||
|
||||
// found search word in contents
|
||||
arr.forEach((record) => {
|
||||
if (record.files === undefined) return;
|
||||
|
||||
let recordFiles = record.files;
|
||||
if (recordFiles.length === undefined) recordFiles = [recordFiles];
|
||||
files.push(...recordFiles);
|
||||
|
||||
// set score for the word in each file
|
||||
recordFiles.forEach((file) => {
|
||||
if (!scoreMap.has(file)) scoreMap.set(file, {});
|
||||
scoreMap.get(file)[word] = record.score;
|
||||
});
|
||||
});
|
||||
|
||||
// create the mapping
|
||||
files.forEach((file) => {
|
||||
if (fileMap.has(file) && fileMap.get(file).indexOf(word) === -1)
|
||||
fileMap.get(file).push(word);
|
||||
else fileMap.set(file, [word]);
|
||||
});
|
||||
});
|
||||
|
||||
// now check if the files don't contain excluded terms
|
||||
const results = [];
|
||||
for (const [file, wordList] of fileMap) {
|
||||
// check if all requirements are matched
|
||||
|
||||
// as search terms with length < 3 are discarded
|
||||
const filteredTermCount = [...searchTerms].filter(
|
||||
(term) => term.length > 2
|
||||
).length;
|
||||
if (
|
||||
wordList.length !== searchTerms.size &&
|
||||
wordList.length !== filteredTermCount
|
||||
)
|
||||
continue;
|
||||
|
||||
// ensure that none of the excluded terms is in the search result
|
||||
if (
|
||||
[...excludedTerms].some(
|
||||
(term) =>
|
||||
terms[term] === file ||
|
||||
titleTerms[term] === file ||
|
||||
(terms[term] || []).includes(file) ||
|
||||
(titleTerms[term] || []).includes(file)
|
||||
)
|
||||
)
|
||||
break;
|
||||
|
||||
// select one (max) score for the file.
|
||||
const score = Math.max(...wordList.map((w) => scoreMap.get(file)[w]));
|
||||
// add result to the result list
|
||||
results.push([
|
||||
docNames[file],
|
||||
titles[file],
|
||||
"",
|
||||
null,
|
||||
score,
|
||||
filenames[file],
|
||||
]);
|
||||
}
|
||||
return results;
|
||||
},
|
||||
|
||||
/**
|
||||
* helper function to return a node containing the
|
||||
* search summary for a given text. keywords is a list
|
||||
* of stemmed words.
|
||||
*/
|
||||
makeSearchSummary: (htmlText, keywords) => {
|
||||
const text = Search.htmlToText(htmlText);
|
||||
if (text === "") return null;
|
||||
|
||||
const textLower = text.toLowerCase();
|
||||
const actualStartPosition = [...keywords]
|
||||
.map((k) => textLower.indexOf(k.toLowerCase()))
|
||||
.filter((i) => i > -1)
|
||||
.slice(-1)[0];
|
||||
const startWithContext = Math.max(actualStartPosition - 120, 0);
|
||||
|
||||
const top = startWithContext === 0 ? "" : "...";
|
||||
const tail = startWithContext + 240 < text.length ? "..." : "";
|
||||
|
||||
let summary = document.createElement("p");
|
||||
summary.classList.add("context");
|
||||
summary.textContent = top + text.substr(startWithContext, 240).trim() + tail;
|
||||
|
||||
return summary;
|
||||
},
|
||||
};
|
||||
|
||||
_ready(Search.init);
|
||||
296
python/docs/_static/skeleton.css
vendored
Normal file
@ -0,0 +1,296 @@
|
||||
/* Some sane resets. */
|
||||
html {
|
||||
height: 100%;
|
||||
}
|
||||
|
||||
body {
|
||||
margin: 0;
|
||||
min-height: 100%;
|
||||
}
|
||||
|
||||
/* All the flexbox magic! */
|
||||
body,
|
||||
.sb-announcement,
|
||||
.sb-content,
|
||||
.sb-main,
|
||||
.sb-container,
|
||||
.sb-container__inner,
|
||||
.sb-article-container,
|
||||
.sb-footer-content,
|
||||
.sb-header,
|
||||
.sb-header-secondary,
|
||||
.sb-footer {
|
||||
display: flex;
|
||||
}
|
||||
|
||||
/* These order things vertically */
|
||||
body,
|
||||
.sb-main,
|
||||
.sb-article-container {
|
||||
flex-direction: column;
|
||||
}
|
||||
|
||||
/* Put elements in the center */
|
||||
.sb-header,
|
||||
.sb-header-secondary,
|
||||
.sb-container,
|
||||
.sb-content,
|
||||
.sb-footer,
|
||||
.sb-footer-content {
|
||||
justify-content: center;
|
||||
}
|
||||
/* Put elements at the ends */
|
||||
.sb-article-container {
|
||||
justify-content: space-between;
|
||||
}
|
||||
|
||||
/* These elements grow. */
|
||||
.sb-main,
|
||||
.sb-content,
|
||||
.sb-container,
|
||||
article {
|
||||
flex-grow: 1;
|
||||
}
|
||||
|
||||
/* Because padding making this wider is not fun */
|
||||
article {
|
||||
box-sizing: border-box;
|
||||
}
|
||||
|
||||
/* The announcements element should never be wider than the page. */
|
||||
.sb-announcement {
|
||||
max-width: 100%;
|
||||
}
|
||||
|
||||
.sb-sidebar-primary,
|
||||
.sb-sidebar-secondary {
|
||||
flex-shrink: 0;
|
||||
width: 17rem;
|
||||
}
|
||||
|
||||
.sb-announcement__inner {
|
||||
justify-content: center;
|
||||
|
||||
box-sizing: border-box;
|
||||
height: 3rem;
|
||||
|
||||
overflow-x: auto;
|
||||
white-space: nowrap;
|
||||
}
|
||||
|
||||
/* Sidebars, with checkbox-based toggle */
|
||||
.sb-sidebar-primary,
|
||||
.sb-sidebar-secondary {
|
||||
position: fixed;
|
||||
height: 100%;
|
||||
top: 0;
|
||||
}
|
||||
|
||||
.sb-sidebar-primary {
|
||||
left: -17rem;
|
||||
transition: left 250ms ease-in-out;
|
||||
}
|
||||
.sb-sidebar-secondary {
|
||||
right: -17rem;
|
||||
transition: right 250ms ease-in-out;
|
||||
}
|
||||
|
||||
.sb-sidebar-toggle {
|
||||
display: none;
|
||||
}
|
||||
.sb-sidebar-overlay {
|
||||
position: fixed;
|
||||
top: 0;
|
||||
width: 0;
|
||||
height: 0;
|
||||
|
||||
transition: width 0ms ease 250ms, height 0ms ease 250ms, opacity 250ms ease;
|
||||
|
||||
opacity: 0;
|
||||
background-color: rgba(0, 0, 0, 0.54);
|
||||
}
|
||||
|
||||
#sb-sidebar-toggle--primary:checked
|
||||
~ .sb-sidebar-overlay[for="sb-sidebar-toggle--primary"],
|
||||
#sb-sidebar-toggle--secondary:checked
|
||||
~ .sb-sidebar-overlay[for="sb-sidebar-toggle--secondary"] {
|
||||
width: 100%;
|
||||
height: 100%;
|
||||
opacity: 1;
|
||||
transition: width 0ms ease, height 0ms ease, opacity 250ms ease;
|
||||
}
|
||||
|
||||
#sb-sidebar-toggle--primary:checked ~ .sb-container .sb-sidebar-primary {
|
||||
left: 0;
|
||||
}
|
||||
#sb-sidebar-toggle--secondary:checked ~ .sb-container .sb-sidebar-secondary {
|
||||
right: 0;
|
||||
}
|
||||
|
||||
/* Full-width mode */
|
||||
.drop-secondary-sidebar-for-full-width-content
|
||||
.hide-when-secondary-sidebar-shown {
|
||||
display: none !important;
|
||||
}
|
||||
.drop-secondary-sidebar-for-full-width-content .sb-sidebar-secondary {
|
||||
display: none !important;
|
||||
}
|
||||
|
||||
/* Mobile views */
|
||||
.sb-page-width {
|
||||
width: 100%;
|
||||
}
|
||||
|
||||
.sb-article-container,
|
||||
.sb-footer-content__inner,
|
||||
.drop-secondary-sidebar-for-full-width-content .sb-article,
|
||||
.drop-secondary-sidebar-for-full-width-content .match-content-width {
|
||||
width: 100vw;
|
||||
}
|
||||
|
||||
.sb-article,
|
||||
.match-content-width {
|
||||
padding: 0 1rem;
|
||||
box-sizing: border-box;
|
||||
}
|
||||
|
||||
@media (min-width: 32rem) {
|
||||
.sb-article,
|
||||
.match-content-width {
|
||||
padding: 0 2rem;
|
||||
}
|
||||
}
|
||||
|
||||
/* Tablet views */
|
||||
@media (min-width: 42rem) {
|
||||
.sb-article-container {
|
||||
width: auto;
|
||||
}
|
||||
.sb-footer-content__inner,
|
||||
.drop-secondary-sidebar-for-full-width-content .sb-article,
|
||||
.drop-secondary-sidebar-for-full-width-content .match-content-width {
|
||||
width: 42rem;
|
||||
}
|
||||
.sb-article,
|
||||
.match-content-width {
|
||||
width: 42rem;
|
||||
}
|
||||
}
|
||||
@media (min-width: 46rem) {
|
||||
.sb-footer-content__inner,
|
||||
.drop-secondary-sidebar-for-full-width-content .sb-article,
|
||||
.drop-secondary-sidebar-for-full-width-content .match-content-width {
|
||||
width: 46rem;
|
||||
}
|
||||
.sb-article,
|
||||
.match-content-width {
|
||||
width: 46rem;
|
||||
}
|
||||
}
|
||||
@media (min-width: 50rem) {
|
||||
.sb-footer-content__inner,
|
||||
.drop-secondary-sidebar-for-full-width-content .sb-article,
|
||||
.drop-secondary-sidebar-for-full-width-content .match-content-width {
|
||||
width: 50rem;
|
||||
}
|
||||
.sb-article,
|
||||
.match-content-width {
|
||||
width: 50rem;
|
||||
}
|
||||
}
|
||||
|
||||
/* Tablet views */
|
||||
@media (min-width: 59rem) {
|
||||
.sb-sidebar-secondary {
|
||||
position: static;
|
||||
}
|
||||
.hide-when-secondary-sidebar-shown {
|
||||
display: none !important;
|
||||
}
|
||||
.sb-footer-content__inner,
|
||||
.drop-secondary-sidebar-for-full-width-content .sb-article,
|
||||
.drop-secondary-sidebar-for-full-width-content .match-content-width {
|
||||
width: 59rem;
|
||||
}
|
||||
.sb-article,
|
||||
.match-content-width {
|
||||
width: 42rem;
|
||||
}
|
||||
}
|
||||
@media (min-width: 63rem) {
|
||||
.sb-footer-content__inner,
|
||||
.drop-secondary-sidebar-for-full-width-content .sb-article,
|
||||
.drop-secondary-sidebar-for-full-width-content .match-content-width {
|
||||
width: 63rem;
|
||||
}
|
||||
.sb-article,
|
||||
.match-content-width {
|
||||
width: 46rem;
|
||||
}
|
||||
}
|
||||
@media (min-width: 67rem) {
|
||||
.sb-footer-content__inner,
|
||||
.drop-secondary-sidebar-for-full-width-content .sb-article,
|
||||
.drop-secondary-sidebar-for-full-width-content .match-content-width {
|
||||
width: 67rem;
|
||||
}
|
||||
.sb-article,
|
||||
.match-content-width {
|
||||
width: 50rem;
|
||||
}
|
||||
}
|
||||
|
||||
/* Desktop views */
|
||||
@media (min-width: 76rem) {
|
||||
.sb-sidebar-primary {
|
||||
position: static;
|
||||
}
|
||||
.hide-when-primary-sidebar-shown {
|
||||
display: none !important;
|
||||
}
|
||||
.sb-footer-content__inner,
|
||||
.drop-secondary-sidebar-for-full-width-content .sb-article,
|
||||
.drop-secondary-sidebar-for-full-width-content .match-content-width {
|
||||
width: 59rem;
|
||||
}
|
||||
.sb-article,
|
||||
.match-content-width {
|
||||
width: 42rem;
|
||||
}
|
||||
}
|
||||
|
||||
/* Full desktop views */
|
||||
@media (min-width: 80rem) {
|
||||
.sb-article,
|
||||
.match-content-width {
|
||||
width: 46rem;
|
||||
}
|
||||
.sb-footer-content__inner,
|
||||
.drop-secondary-sidebar-for-full-width-content .sb-article,
|
||||
.drop-secondary-sidebar-for-full-width-content .match-content-width {
|
||||
width: 63rem;
|
||||
}
|
||||
}
|
||||
|
||||
@media (min-width: 84rem) {
|
||||
.sb-article,
|
||||
.match-content-width {
|
||||
width: 50rem;
|
||||
}
|
||||
.sb-footer-content__inner,
|
||||
.drop-secondary-sidebar-for-full-width-content .sb-article,
|
||||
.drop-secondary-sidebar-for-full-width-content .match-content-width {
|
||||
width: 67rem;
|
||||
}
|
||||
}
|
||||
|
||||
@media (min-width: 88rem) {
|
||||
.sb-footer-content__inner,
|
||||
.drop-secondary-sidebar-for-full-width-content .sb-article,
|
||||
.drop-secondary-sidebar-for-full-width-content .match-content-width {
|
||||
width: 67rem;
|
||||
}
|
||||
.sb-page-width {
|
||||
width: 88rem;
|
||||
}
|
||||
}
|
||||
144
python/docs/_static/sphinx_highlight.js
vendored
Normal file
@ -0,0 +1,144 @@
|
||||
/* Highlighting utilities for Sphinx HTML documentation. */
|
||||
"use strict";
|
||||
|
||||
const SPHINX_HIGHLIGHT_ENABLED = true
|
||||
|
||||
/**
|
||||
* highlight a given string on a node by wrapping it in
|
||||
* span elements with the given class name.
|
||||
*/
|
||||
const _highlight = (node, addItems, text, className) => {
|
||||
if (node.nodeType === Node.TEXT_NODE) {
|
||||
const val = node.nodeValue;
|
||||
const parent = node.parentNode;
|
||||
const pos = val.toLowerCase().indexOf(text);
|
||||
if (
|
||||
pos >= 0 &&
|
||||
!parent.classList.contains(className) &&
|
||||
!parent.classList.contains("nohighlight")
|
||||
) {
|
||||
let span;
|
||||
|
||||
const closestNode = parent.closest("body, svg, foreignObject");
|
||||
const isInSVG = closestNode && closestNode.matches("svg");
|
||||
if (isInSVG) {
|
||||
span = document.createElementNS("http://www.w3.org/2000/svg", "tspan");
|
||||
} else {
|
||||
span = document.createElement("span");
|
||||
span.classList.add(className);
|
||||
}
|
||||
|
||||
span.appendChild(document.createTextNode(val.substr(pos, text.length)));
|
||||
parent.insertBefore(
|
||||
span,
|
||||
parent.insertBefore(
|
||||
document.createTextNode(val.substr(pos + text.length)),
|
||||
node.nextSibling
|
||||
)
|
||||
);
|
||||
node.nodeValue = val.substr(0, pos);
|
||||
|
||||
if (isInSVG) {
|
||||
const rect = document.createElementNS(
|
||||
"http://www.w3.org/2000/svg",
|
||||
"rect"
|
||||
);
|
||||
const bbox = parent.getBBox();
|
||||
rect.x.baseVal.value = bbox.x;
|
||||
rect.y.baseVal.value = bbox.y;
|
||||
rect.width.baseVal.value = bbox.width;
|
||||
rect.height.baseVal.value = bbox.height;
|
||||
rect.setAttribute("class", className);
|
||||
addItems.push({ parent: parent, target: rect });
|
||||
}
|
||||
}
|
||||
} else if (node.matches && !node.matches("button, select, textarea")) {
|
||||
node.childNodes.forEach((el) => _highlight(el, addItems, text, className));
|
||||
}
|
||||
};
|
||||
const _highlightText = (thisNode, text, className) => {
|
||||
let addItems = [];
|
||||
_highlight(thisNode, addItems, text, className);
|
||||
addItems.forEach((obj) =>
|
||||
obj.parent.insertAdjacentElement("beforebegin", obj.target)
|
||||
);
|
||||
};
|
||||
|
||||
/**
|
||||
* Small JavaScript module for the documentation.
|
||||
*/
|
||||
const SphinxHighlight = {
|
||||
|
||||
/**
|
||||
* highlight the search words provided in localstorage in the text
|
||||
*/
|
||||
highlightSearchWords: () => {
|
||||
if (!SPHINX_HIGHLIGHT_ENABLED) return; // bail if no highlight
|
||||
|
||||
// get and clear terms from localstorage
|
||||
const url = new URL(window.location);
|
||||
const highlight =
|
||||
localStorage.getItem("sphinx_highlight_terms")
|
||||
|| url.searchParams.get("highlight")
|
||||
|| "";
|
||||
localStorage.removeItem("sphinx_highlight_terms")
|
||||
url.searchParams.delete("highlight");
|
||||
window.history.replaceState({}, "", url);
|
||||
|
||||
// get individual terms from highlight string
|
||||
const terms = highlight.toLowerCase().split(/\s+/).filter(x => x);
|
||||
if (terms.length === 0) return; // nothing to do
|
||||
|
||||
// There should never be more than one element matching "div.body"
|
||||
const divBody = document.querySelectorAll("div.body");
|
||||
const body = divBody.length ? divBody[0] : document.querySelector("body");
|
||||
window.setTimeout(() => {
|
||||
terms.forEach((term) => _highlightText(body, term, "highlighted"));
|
||||
}, 10);
|
||||
|
||||
const searchBox = document.getElementById("searchbox");
|
||||
if (searchBox === null) return;
|
||||
searchBox.appendChild(
|
||||
document
|
||||
.createRange()
|
||||
.createContextualFragment(
|
||||
'<p class="highlight-link">' +
|
||||
'<a href="javascript:SphinxHighlight.hideSearchWords()">' +
|
||||
_("Hide Search Matches") +
|
||||
"</a></p>"
|
||||
)
|
||||
);
|
||||
},
|
||||
|
||||
/**
|
||||
* helper function to hide the search marks again
|
||||
*/
|
||||
hideSearchWords: () => {
|
||||
document
|
||||
.querySelectorAll("#searchbox .highlight-link")
|
||||
.forEach((el) => el.remove());
|
||||
document
|
||||
.querySelectorAll("span.highlighted")
|
||||
.forEach((el) => el.classList.remove("highlighted"));
|
||||
localStorage.removeItem("sphinx_highlight_terms")
|
||||
},
|
||||
|
||||
initEscapeListener: () => {
|
||||
// only install a listener if it is really needed
|
||||
if (!DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS) return;
|
||||
|
||||
document.addEventListener("keydown", (event) => {
|
||||
// bail for input elements
|
||||
if (BLACKLISTED_KEY_CONTROL_ELEMENTS.has(document.activeElement.tagName)) return;
|
||||
// bail with special keys
|
||||
if (event.shiftKey || event.altKey || event.ctrlKey || event.metaKey) return;
|
||||
if (DOCUMENTATION_OPTIONS.ENABLE_SEARCH_SHORTCUTS && (event.key === "Escape")) {
|
||||
SphinxHighlight.hideSearchWords();
|
||||
event.preventDefault();
|
||||
}
|
||||
});
|
||||
},
|
||||
};
|
||||
|
||||
_ready(SphinxHighlight.highlightSearchWords);
|
||||
_ready(SphinxHighlight.initEscapeListener);
|
||||
2
python/docs/_static/styles/furo-extensions.css
vendored
Normal file
1
python/docs/_static/styles/furo-extensions.css.map
vendored
Normal file
2
python/docs/_static/styles/furo.css
vendored
Normal file
1
python/docs/_static/styles/furo.css.map
vendored
Normal file
110
python/docs/_static/tabs.css
vendored
Normal file
@ -0,0 +1,110 @@
|
||||
/* body[data-theme] { */
|
||||
:root {
|
||||
--tabs--label-text: #4b5563;
|
||||
--tabs--label-text--hover: #4b5563;
|
||||
--tabs--label-text--active: #0ea5e9;
|
||||
--tabs--label-text--active--hover: #0ea5e9;
|
||||
--tabs--label-background: transparent;
|
||||
--tabs--label-background--hover: transparent;
|
||||
--tabs--label-background--active: transparent;
|
||||
--tabs--label-background--active--hover: transparent;
|
||||
--tabs--label-border: transparent;
|
||||
--tabs--label-border--hover: #d1d5db;
|
||||
--tabs--label-border--active: #0ea5e9;
|
||||
--tabs--label-border--active--hover: #0ea5e9;
|
||||
--tabs--padding-x: 1.25em;
|
||||
--tabs--margin-x: 0;
|
||||
--tabs--border: #e6e6e6;
|
||||
}
|
||||
|
||||
/* Hide radio buttons */
|
||||
.tab-set > input {
|
||||
position: absolute;
|
||||
opacity: 0;
|
||||
}
|
||||
|
||||
/* Tab set container */
|
||||
.tab-set {
|
||||
border-radius: 2px;
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
margin: 0.75em 0;
|
||||
position: relative;
|
||||
}
|
||||
|
||||
/* Tab label */
|
||||
.tab-set > label {
|
||||
z-index: 1;
|
||||
|
||||
width: auto;
|
||||
border-bottom: 2px solid var(--tabs--label-border);
|
||||
padding: 1em var(--tabs--padding-x) 0.5em;
|
||||
margin-left: var(--tabs--margin-x);
|
||||
|
||||
color: var(--tabs--label-text);
|
||||
background: var(--tabs--label-background);
|
||||
|
||||
transition: color 250ms;
|
||||
|
||||
cursor: pointer;
|
||||
|
||||
font-size: 0.875em;
|
||||
font-weight: 700;
|
||||
}
|
||||
.tab-set > label:nth-child(2) {
|
||||
margin-left: 0;
|
||||
}
|
||||
|
||||
/* Hovered label */
|
||||
.tab-set > label:hover {
|
||||
color: var(--tabs--label-text--hover);
|
||||
background: var(--tabs--label-background--hover);
|
||||
border-color: var(--tabs--label-border--hover);
|
||||
}
|
||||
|
||||
/* Active tab label */
|
||||
.tab-set > input:checked + label {
|
||||
color: var(--tabs--label-text--active);
|
||||
background: var(--tabs--label-background--active);
|
||||
border-color: var(--tabs--label-border--active);
|
||||
}
|
||||
.tab-set > input:checked + label:hover {
|
||||
color: var(--tabs--label-text--active--hover);
|
||||
background: var(--tabs--label-background--active--hover);
|
||||
border-color: var(--tabs--label-border--active--hover);
|
||||
}
|
||||
|
||||
/* Tab content */
|
||||
.tab-content {
|
||||
order: 99;
|
||||
display: none;
|
||||
width: 100%;
|
||||
box-shadow: 0 -0.0625rem var(--tabs--border);
|
||||
}
|
||||
|
||||
/* Show content, when input is checked. */
|
||||
.tab-set > input:checked + label + .tab-content {
|
||||
display: block;
|
||||
}
|
||||
.tab-content > p:first-child {
|
||||
margin-top: 0.75rem;
|
||||
}
|
||||
/* Remove the top border on first code block */
|
||||
.tab-content > [class^="highlight-"]:first-child .highlight {
|
||||
border-top: none;
|
||||
border-top-left-radius: 0;
|
||||
border-top-right-radius: 0;
|
||||
}
|
||||
|
||||
/* Remove margins on children */
|
||||
.tab-content > *:first-child {
|
||||
margin-top: 0;
|
||||
}
|
||||
.tab-content > *:last-child {
|
||||
margin-bottom: 0;
|
||||
}
|
||||
|
||||
/* Remove margins on nested tabs */
|
||||
.tab-content > .tab-set {
|
||||
margin: 0;
|
||||
}
|
||||
21
python/docs/_static/tabs.js
vendored
Normal file
@ -0,0 +1,21 @@
|
||||
var labels_by_text = {};
|
||||
|
||||
function ready() {
|
||||
var li = document.getElementsByClassName("tab-label");
|
||||
for (const label of li) {
|
||||
label.onclick = onLabelClick;
|
||||
const text = label.textContent;
|
||||
if (!labels_by_text[text]) {
|
||||
labels_by_text[text] = [];
|
||||
}
|
||||
labels_by_text[text].push(label);
|
||||
}
|
||||
}
|
||||
|
||||
function onLabelClick() {
|
||||
// Activate other labels with the same text.
|
||||
for (label of labels_by_text[this.textContent]) {
|
||||
label.previousSibling.checked = true;
|
||||
}
|
||||
}
|
||||
document.addEventListener("DOMContentLoaded", ready, false);
|
||||
320
python/docs/contribute.html
Normal file
@ -0,0 +1,320 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
<link rel="index" title="Index" href="genindex.html" /><link rel="search" title="Search" href="search.html" /><link rel="next" title="CUTLASS Python API" href="modules.html" /><link rel="prev" title="Basic example of using the CUTLASS Python interface" href="externals/00_basic_gemm.html" />
|
||||
<link rel="canonical" href="docs/contribute.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>Contributing - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul class="current">
|
||||
<li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1 current current-page"><a class="current reference internal" href="#">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<section id="contributing">
|
||||
<h1>Contributing<a class="headerlink" href="#contributing" title="Permalink to this heading">#</a></h1>
|
||||
<p>Thank you for your interest in contributing to the CUTLASS Python interface. Based on the type of contribution, it will fall into two categories:</p>
|
||||
<ol class="arabic simple">
|
||||
<li><p>You want to report a bug, feature request, or documentation issue</p>
|
||||
<ul class="simple">
|
||||
<li><p>File an <a class="reference external" href="https://github.com/NVIDIA/cutlass/issues/new/choose">issue</a> describing what you encountered or what you want to see changed.</p></li>
|
||||
<li><p>The CUTLASS team will evaluate the issues and triage them, scheduling them for a release. If you believe the issue needs priority attention, comment on the issue to notify the team.</p></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><p>You want to implement a feature or bug-fix</p>
|
||||
<ul class="simple">
|
||||
<li><p>We welcome contributions from the community. We recommend that you contribute via a <a class="reference external" href="https://github.com/NVIDIA/cutlass/pulls">pull request</a>. If you have questions about CUTLASS, consider asking a question via the <a class="reference external" href="https://github.com/NVIDIA/cutlass/discussions">Discussions</a> tab. Please be sure to search through both existing issues and discussions to see whether your question has already been answered.</p></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ol>
|
||||
</section>
|
||||
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
<a class="next-page" href="modules.html">
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Next</span>
|
||||
</div>
|
||||
<div class="title">CUTLASS Python API</div>
|
||||
</div>
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
</a>
|
||||
<a class="prev-page" href="externals/00_basic_gemm.html">
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Previous</span>
|
||||
</div>
|
||||
|
||||
<div class="title">Basic example of using the CUTLASS Python interface</div>
|
||||
|
||||
</div>
|
||||
</a>
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer no-toc">
|
||||
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/scripts/furo.js"></script>
|
||||
<script src="_static/clipboard.min.js"></script>
|
||||
<script src="_static/copybutton.js"></script>
|
||||
<script src="_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
394
python/docs/cutlass.emit.html
Normal file
@ -0,0 +1,394 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
<link rel="index" title="Index" href="genindex.html" /><link rel="search" title="Search" href="search.html" /><link rel="next" title="Operations" href="cutlass.op.html" /><link rel="prev" title="CUTLASS" href="cutlass.html" />
|
||||
<link rel="canonical" href="docs/cutlass.emit.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>Emitters - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul class="current">
|
||||
<li class="toctree-l1 current has-children"><a class="reference internal" href="modules.html">CUTLASS Python API</a><input checked="" class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul class="current">
|
||||
<li class="toctree-l2 current has-children"><a class="reference internal" href="cutlass.html">CUTLASS</a><input checked="" class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul class="current">
|
||||
<li class="toctree-l3 current current-page"><a class="current reference internal" href="#">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<section id="emitters">
|
||||
<h1>Emitters<a class="headerlink" href="#emitters" title="Permalink to this heading">#</a></h1>
|
||||
<section id="module-cutlass.emit.common">
|
||||
<span id="common"></span><h2>Common<a class="headerlink" href="#module-cutlass.emit.common" title="Permalink to this heading">#</a></h2>
|
||||
<p>Common utilities for emitting CUTLASS kernels</p>
|
||||
</section>
|
||||
<section id="module-cutlass.emit.pytorch">
|
||||
<span id="pytorch"></span><h2>PyTorch<a class="headerlink" href="#module-cutlass.emit.pytorch" title="Permalink to this heading">#</a></h2>
|
||||
<p>Utilities for generating source for building a PyTorch CUDA extension that using a CUTLASS kernel.
|
||||
If specified, the extension can be JIT compiled via PyTorch’s <code class="docutils literal notranslate"><span class="pre">cpp_extension.load</span></code> method.</p>
|
||||
<p>Example usage with JIT compilation:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">plan</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">Gemm</span><span class="p">(</span><span class="n">element</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">)</span>
|
||||
<span class="n">op</span> <span class="o">=</span> <span class="n">plan</span><span class="o">.</span><span class="n">construct</span><span class="p">()</span>
|
||||
<span class="n">mod</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">emit</span><span class="o">.</span><span class="n">pytorch</span><span class="p">(</span><span class="n">op</span><span class="p">,</span> <span class="s1">'cutlass_gemm'</span><span class="p">,</span> <span class="mi">80</span><span class="p">,</span> <span class="n">jit</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># Generate inputs for the GEMM</span>
|
||||
<span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span> <span class="o">=</span> <span class="p">[</span><span class="n">torch</span><span class="o">.</span><span class="n">ones</span><span class="p">((</span><span class="mi">512</span><span class="p">,</span> <span class="mi">512</span><span class="p">))</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="s1">'cuda'</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">)]</span>
|
||||
|
||||
<span class="c1"># Run the module</span>
|
||||
<span class="n">D</span> <span class="o">=</span> <span class="n">mod</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Example usage without JIT compilation:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">plan</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">Gemm</span><span class="p">(</span><span class="n">element</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">)</span>
|
||||
<span class="n">op</span> <span class="o">=</span> <span class="n">plan</span><span class="o">.</span><span class="n">construct</span><span class="p">()</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">emit</span><span class="o">.</span><span class="n">pytorch</span><span class="p">(</span><span class="n">op</span><span class="p">,</span> <span class="s1">'cutlass_gemm'</span><span class="p">,</span> <span class="mi">80</span><span class="p">,</span> <span class="n">jit</span><span class="o">=</span><span class="kc">False</span><span class="p">,</span> <span class="n">sourcedir</span><span class="o">=</span><span class="s1">'output'</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>After this call, the directory <code class="docutils literal notranslate"><span class="pre">output</span></code> contains <code class="docutils literal notranslate"><span class="pre">setup.py</span></code>,
|
||||
<code class="docutils literal notranslate"><span class="pre">cutlass_gemm.cpp</span></code>, and <code class="docutils literal notranslate"><span class="pre">cutlass_gemm_kernel.cu</span></code>. The module can be built from
|
||||
within <code class="docutils literal notranslate"><span class="pre">output</span></code> by running: <code class="docutils literal notranslate"><span class="pre">TORCH_CUDA_ARCH_LIST="8.0"</span> <span class="pre">python</span> <span class="pre">setup.py</span> <span class="pre">develop</span> <span class="pre">--user</span></code>.</p>
|
||||
<p>The module can later be used in Python via:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">torch</span>
|
||||
<span class="kn">import</span> <span class="nn">cutlass_gemm</span>
|
||||
|
||||
<span class="c1"># Generate inputs for the GEMM</span>
|
||||
<span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span> <span class="o">=</span> <span class="p">[</span><span class="n">torch</span><span class="o">.</span><span class="n">ones</span><span class="p">((</span><span class="mi">512</span><span class="p">,</span> <span class="mi">512</span><span class="p">))</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="s1">'cuda'</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">)]</span>
|
||||
|
||||
<span class="c1"># Run the module</span>
|
||||
<span class="n">D</span> <span class="o">=</span> <span class="n">cutlass_gemm</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.emit.pytorch.pytorch">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.emit.pytorch.</span></span><span class="sig-name descname"><span class="pre">pytorch</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">op</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">name</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">cc</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">jit</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">sourcedir</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">''</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/emit/pytorch.html#pytorch"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.emit.pytorch.pytorch" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Generates source for building a PyTorch CUDA module that leverages the CUTLASS kernel
|
||||
specified by <code class="docutils literal notranslate"><span class="pre">op</span></code>. If the <code class="docutils literal notranslate"><span class="pre">jit</span></code> parameter is set to true, the module is just-in-time
|
||||
compiled, loaded, and returned.</p>
|
||||
<p>The result of this method is files within <code class="docutils literal notranslate"><span class="pre">sourcedir</span></code> that can be used for building
|
||||
a PyTorch module.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>op</strong> – operation to emit in the module</p></li>
|
||||
<li><p><strong>name</strong> (<em>str</em>) – name of the module to generate</p></li>
|
||||
<li><p><strong>cc</strong> (<em>int</em>) – compute capability of the device the module should target</p></li>
|
||||
<li><p><strong>jit</strong> (<em>bool</em>) – whether the module should be just-in-time compiled</p></li>
|
||||
<li><p><strong>sourcedir</strong> (<em>str</em>) – directory to which generated source files should be written</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>loaded PyTorch module (if <code class="docutils literal notranslate"><span class="pre">jit=True</span></code>) or None</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
</section>
|
||||
</section>
|
||||
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
<a class="next-page" href="cutlass.op.html">
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Next</span>
|
||||
</div>
|
||||
<div class="title">Operations</div>
|
||||
</div>
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
</a>
|
||||
<a class="prev-page" href="cutlass.html">
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Previous</span>
|
||||
</div>
|
||||
|
||||
<div class="title">CUTLASS</div>
|
||||
|
||||
</div>
|
||||
</a>
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer">
|
||||
|
||||
|
||||
<div class="toc-sticky toc-scroll">
|
||||
<div class="toc-title-container">
|
||||
<span class="toc-title">
|
||||
On this page
|
||||
</span>
|
||||
</div>
|
||||
<div class="toc-tree-container">
|
||||
<div class="toc-tree">
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Emitters</a><ul>
|
||||
<li><a class="reference internal" href="#module-cutlass.emit.common">Common</a></li>
|
||||
<li><a class="reference internal" href="#module-cutlass.emit.pytorch">PyTorch</a><ul>
|
||||
<li><a class="reference internal" href="#cutlass.emit.pytorch.pytorch"><code class="docutils literal notranslate"><span class="pre">pytorch()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/scripts/furo.js"></script>
|
||||
<script src="_static/clipboard.min.js"></script>
|
||||
<script src="_static/copybutton.js"></script>
|
||||
<script src="_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
628
python/docs/cutlass.html
Normal file
@ -0,0 +1,628 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
<link rel="index" title="Index" href="genindex.html" /><link rel="search" title="Search" href="search.html" /><link rel="next" title="Emitters" href="cutlass.emit.html" /><link rel="prev" title="CUTLASS Python API" href="modules.html" />
|
||||
<link rel="canonical" href="docs/cutlass.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>CUTLASS - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul class="current">
|
||||
<li class="toctree-l1 current has-children"><a class="reference internal" href="modules.html">CUTLASS Python API</a><input checked="" class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul class="current">
|
||||
<li class="toctree-l2 current has-children current-page"><a class="current reference internal" href="#">CUTLASS</a><input checked="" class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<section id="cutlass">
|
||||
<h1>CUTLASS<a class="headerlink" href="#cutlass" title="Permalink to this heading">#</a></h1>
|
||||
<section id="subpackages">
|
||||
<h2>Subpackages<a class="headerlink" href="#subpackages" title="Permalink to this heading">#</a></h2>
|
||||
<div class="toctree-wrapper compound">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</div>
|
||||
</section>
|
||||
<section id="module-cutlass.epilogue">
|
||||
<span id="epilogue"></span><h2>Epilogue<a class="headerlink" href="#module-cutlass.epilogue" title="Permalink to this heading">#</a></h2>
|
||||
<p>Registry of elementwise epilogues</p>
|
||||
<p>Elementwise epilogues can be added to many CUTLASS kernels in the CUTLAS Python interface via
|
||||
code like the following for GEMM:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">plan</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">Gemm</span><span class="p">(</span><span class="n">element</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">)</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">activation</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">epilogue</span><span class="o">.</span><span class="n">relu</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.epilogue.get_activation_epilogue">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.epilogue.</span></span><span class="sig-name descname"><span class="pre">get_activation_epilogue</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">activation</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element_output</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">elements_per_access</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element_accumulator</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element_compute</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/epilogue.html#get_activation_epilogue"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.epilogue.get_activation_epilogue" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Return an epilogue corresponding to the activation function, data types, and alignment
|
||||
used in the kernel</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>activation</strong> – elementwise activation function to use</p></li>
|
||||
<li><p><strong>element_output</strong> – data type of the output</p></li>
|
||||
<li><p><strong>elements_per_access</strong> (<em>int</em>) – alignment of operand C of the kernel</p></li>
|
||||
<li><p><strong>element_accumulator</strong> – data type of the accumulated output C</p></li>
|
||||
<li><p><strong>element_compute</strong> – data type in which compute operations should be performed</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>epilogue functor</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.epilogue.get_activations">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.epilogue.</span></span><span class="sig-name descname"><span class="pre">get_activations</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/epilogue.html#get_activations"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.epilogue.get_activations" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Returns a list of available activation functions</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>list of available activation functions</p>
|
||||
</dd>
|
||||
<dt class="field-even">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>list</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
</section>
|
||||
<section id="module-cutlass.library_defaults">
|
||||
<span id="library-defaults"></span><h2>Library Defaults<a class="headerlink" href="#module-cutlass.library_defaults" title="Permalink to this heading">#</a></h2>
|
||||
<p>Classes containing valid operations for a given compute capability and data types.</p>
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="cutlass.library_defaults.ArchOptions">
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">cutlass.library_defaults.</span></span><span class="sig-name descname"><span class="pre">ArchOptions</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="pre">target_cc,</span> <span class="pre">kernel_cc,</span> <span class="pre">operation_kind,</span> <span class="pre">gemm_kinds,</span> <span class="pre">allowed_math_operations=[<MathOperation.multiply_add:</span> <span class="pre">1>,</span> <span class="pre"><MathOperation.multiply_add_saturate:</span> <span class="pre">2>]</span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/library_defaults.html#ArchOptions"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.library_defaults.ArchOptions" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">object</span></code></p>
|
||||
<p>Structure for keeping track of kernels available on a given compute capability</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>target_cc</strong> (<em>int</em>) – compute capability of the device on which kernels will be run</p></li>
|
||||
<li><p><strong>kernel_cc</strong> (<em>int</em>) – compute capability of the kernels to generate</p></li>
|
||||
<li><p><strong>operation_kind</strong> (<em>cutlass.OperationKind</em>) – type of operation to register</p></li>
|
||||
<li><p><strong>gemm_kinds</strong> (<em>list</em>) – types of GEMM operations that can be included</p></li>
|
||||
<li><p><strong>allowed_math_operations</strong> (<em>list</em>) – types of primitive math operations allowed</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
</dl>
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="cutlass.library_defaults.ArchOptions.opclass_supports_combination">
|
||||
<span class="sig-name descname"><span class="pre">opclass_supports_combination</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">op_class</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">datatype_comb</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">layout_comb</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/library_defaults.html#ArchOptions.opclass_supports_combination"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.library_defaults.ArchOptions.opclass_supports_combination" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Returns whether the provided operation class supports the provided data type and layout combination</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>op_class</strong> (<em>cutlass.OpcodeClass</em>) – operation class to consider</p></li>
|
||||
<li><p><strong>datatype_comb</strong> (<em>tuple</em><em>[</em><em>cutlass.DataType</em><em>]</em>) – tuple of data types for (element_A, element_B, element_accumulator)</p></li>
|
||||
<li><p><strong>layout_comb</strong> (<em>tuple</em><em>[</em><em>cutlass.LayoutType</em><em>]</em>) – tuple of data types for (layout_A, layout_B)</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>set of operation classes that support the provided data type and layout combination</p>
|
||||
</dd>
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>set</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="cutlass.library_defaults.ArchOptions.operations">
|
||||
<span class="sig-name descname"><span class="pre">operations</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">op_class</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element_a</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element_b</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element_accumulator</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">layout_a</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">layout_b</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/library_defaults.html#ArchOptions.operations"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.library_defaults.ArchOptions.operations" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Returns whether the provided operation class supports the provided data type combination</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>op_class</strong> (<em>cutlass.OpcodeClass</em>) – operation class to consider</p></li>
|
||||
<li><p><strong>element_a</strong> (<em>cutlass.DataType</em>) – data type of operand A</p></li>
|
||||
<li><p><strong>element_b</strong> (<em>cutlass.DataType</em>) – data type of operand B</p></li>
|
||||
<li><p><strong>element_accumulator</strong> (<em>cutlass.DataType</em>) – data type of accumulator</p></li>
|
||||
<li><p><strong>layout_a</strong> (<em>cutlass.LayoutType</em>) – layout of operand A</p></li>
|
||||
<li><p><strong>layout_b</strong> (<em>cutlass.LayoutType</em>) – layout of operand B</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>container of kernels by alignment supported by the provided combination of parameters</p>
|
||||
</dd>
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><a class="reference internal" href="#cutlass.library_defaults.KernelsForDataType" title="cutlass.library_defaults.KernelsForDataType">KernelsForDataType</a></p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="cutlass.library_defaults.ArchOptions.supporting_opclasses">
|
||||
<span class="sig-name descname"><span class="pre">supporting_opclasses</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">element_a</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element_b</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element_accumulator</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">layout_a</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">layout_b</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/library_defaults.html#ArchOptions.supporting_opclasses"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.library_defaults.ArchOptions.supporting_opclasses" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Returns a set of operation classes that support the provided data type combination</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>element_a</strong> (<em>cutlass.DataType</em>) – data type of operand A</p></li>
|
||||
<li><p><strong>element_b</strong> (<em>cutlass.DataType</em>) – data type of operand B</p></li>
|
||||
<li><p><strong>element_accumulator</strong> (<em>cutlass.DataType</em>) – data type of accumulator</p></li>
|
||||
<li><p><strong>layout_a</strong> (<em>cutlass.LayoutType</em>) – layout of operand A</p></li>
|
||||
<li><p><strong>layout_b</strong> (<em>cutlass.LayoutType</em>) – layout of operand B</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>set of operation classes that support the provided data type combination</p>
|
||||
</dd>
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>set</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="cutlass.library_defaults.KernelsForDataType">
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">cutlass.library_defaults.</span></span><span class="sig-name descname"><span class="pre">KernelsForDataType</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">datatype_comb</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">layout_comb</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/library_defaults.html#KernelsForDataType"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.library_defaults.KernelsForDataType" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">object</span></code></p>
|
||||
<p>Container class for keeping track of kernels that correspond to a particular combination
|
||||
of data types for operands A, B, and accumulator</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>datatype_comb</strong> (<em>tuple</em>) – </p></li>
|
||||
<li><p><strong>layout_comb</strong> (<em>tuple</em>) – </p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
</dl>
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="cutlass.library_defaults.KernelsForDataType.add">
|
||||
<span class="sig-name descname"><span class="pre">add</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">operation</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/library_defaults.html#KernelsForDataType.add"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.library_defaults.KernelsForDataType.add" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Add an operation to the list of supported kernels</p>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py property">
|
||||
<dt class="sig sig-object py" id="cutlass.library_defaults.KernelsForDataType.alignments">
|
||||
<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">alignments</span></span><a class="headerlink" href="#cutlass.library_defaults.KernelsForDataType.alignments" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Returns an unsorted list of alignments supported by this data type combination</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>unsorted list of alignments supported by this data type combination</p>
|
||||
</dd>
|
||||
<dt class="field-even">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>list</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py property">
|
||||
<dt class="sig sig-object py" id="cutlass.library_defaults.KernelsForDataType.all_operations">
|
||||
<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">all_operations</span></span><a class="headerlink" href="#cutlass.library_defaults.KernelsForDataType.all_operations" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Returns a list of all operations supported by this data type combination</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>list of all operations supported by this data type combination</p>
|
||||
</dd>
|
||||
<dt class="field-even">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>list</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="cutlass.library_defaults.KernelsForDataType.find_alignment">
|
||||
<span class="sig-name descname"><span class="pre">find_alignment</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">shape</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">layout</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/library_defaults.html#KernelsForDataType.find_alignment"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.library_defaults.KernelsForDataType.find_alignment" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Returns the most preferable alignment for a given shape and layout</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>shape</strong> (<em>tuple</em>) – extent of each dimension of the tensor</p></li>
|
||||
<li><p><strong>layout</strong> (<em>cutlass.LayoutType</em>) – layout of the tensor</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>maximum alignment supported by the data type combination and tensor size</p>
|
||||
</dd>
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>int</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="cutlass.library_defaults.KernelsForDataType.operations">
|
||||
<span class="sig-name descname"><span class="pre">operations</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">alignment</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/library_defaults.html#KernelsForDataType.operations"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.library_defaults.KernelsForDataType.operations" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Returns operations satisfying the alignment constraint indicated by <cite>alignment</cite></p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>alignment</strong> (<em>int</em>) – alignment constraint of operations to return</p>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>list of operations</p>
|
||||
</dd>
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>list</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="cutlass.library_defaults.KernelsForDataType.sort">
|
||||
<span class="sig-name descname"><span class="pre">sort</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/library_defaults.html#KernelsForDataType.sort"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.library_defaults.KernelsForDataType.sort" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Sorts each list of kernels in <cite>kernels_by_alignment</cite> in descending order of threadblock shape</p>
|
||||
</dd></dl>
|
||||
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="cutlass.library_defaults.OptionRegistry">
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">cutlass.library_defaults.</span></span><span class="sig-name descname"><span class="pre">OptionRegistry</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">target_cc</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/library_defaults.html#OptionRegistry"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.library_defaults.OptionRegistry" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">object</span></code></p>
|
||||
<p>Container of all architecture-specific options</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>target_cc</strong> (<em>int</em>) – compute capability of the device on which operations will be run</p>
|
||||
</dd>
|
||||
</dl>
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="cutlass.library_defaults.OptionRegistry.options_for_cc">
|
||||
<span class="sig-name descname"><span class="pre">options_for_cc</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">cc</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/library_defaults.html#OptionRegistry.options_for_cc"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.library_defaults.OptionRegistry.options_for_cc" title="Permalink to this definition">#</a></dt>
|
||||
<dd><dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>cc</strong> (<em>int</em>) – </p>
|
||||
</dd>
|
||||
<dt class="field-even">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p><a class="reference internal" href="#cutlass.library_defaults.ArchOptions" title="cutlass.library_defaults.ArchOptions"><em>ArchOptions</em></a></p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
</dd></dl>
|
||||
|
||||
</section>
|
||||
<section id="module-cutlass.swizzle">
|
||||
<span id="swizzle"></span><h2>Swizzle<a class="headerlink" href="#module-cutlass.swizzle" title="Permalink to this heading">#</a></h2>
|
||||
<p>Registry of swizzling functions</p>
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.swizzle.get_swizzling_functors">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.swizzle.</span></span><span class="sig-name descname"><span class="pre">get_swizzling_functors</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/swizzle.html#get_swizzling_functors"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.swizzle.get_swizzling_functors" title="Permalink to this definition">#</a></dt>
|
||||
<dd></dd></dl>
|
||||
|
||||
</section>
|
||||
</section>
|
||||
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
<a class="next-page" href="cutlass.emit.html">
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Next</span>
|
||||
</div>
|
||||
<div class="title">Emitters</div>
|
||||
</div>
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
</a>
|
||||
<a class="prev-page" href="modules.html">
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Previous</span>
|
||||
</div>
|
||||
|
||||
<div class="title">CUTLASS Python API</div>
|
||||
|
||||
</div>
|
||||
</a>
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer">
|
||||
|
||||
|
||||
<div class="toc-sticky toc-scroll">
|
||||
<div class="toc-title-container">
|
||||
<span class="toc-title">
|
||||
On this page
|
||||
</span>
|
||||
</div>
|
||||
<div class="toc-tree-container">
|
||||
<div class="toc-tree">
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">CUTLASS</a><ul>
|
||||
<li><a class="reference internal" href="#subpackages">Subpackages</a></li>
|
||||
<li><a class="reference internal" href="#module-cutlass.epilogue">Epilogue</a><ul>
|
||||
<li><a class="reference internal" href="#cutlass.epilogue.get_activation_epilogue"><code class="docutils literal notranslate"><span class="pre">get_activation_epilogue()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.epilogue.get_activations"><code class="docutils literal notranslate"><span class="pre">get_activations()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#module-cutlass.library_defaults">Library Defaults</a><ul>
|
||||
<li><a class="reference internal" href="#cutlass.library_defaults.ArchOptions"><code class="docutils literal notranslate"><span class="pre">ArchOptions</span></code></a><ul>
|
||||
<li><a class="reference internal" href="#cutlass.library_defaults.ArchOptions.opclass_supports_combination"><code class="docutils literal notranslate"><span class="pre">ArchOptions.opclass_supports_combination()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.library_defaults.ArchOptions.operations"><code class="docutils literal notranslate"><span class="pre">ArchOptions.operations()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.library_defaults.ArchOptions.supporting_opclasses"><code class="docutils literal notranslate"><span class="pre">ArchOptions.supporting_opclasses()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#cutlass.library_defaults.KernelsForDataType"><code class="docutils literal notranslate"><span class="pre">KernelsForDataType</span></code></a><ul>
|
||||
<li><a class="reference internal" href="#cutlass.library_defaults.KernelsForDataType.add"><code class="docutils literal notranslate"><span class="pre">KernelsForDataType.add()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.library_defaults.KernelsForDataType.alignments"><code class="docutils literal notranslate"><span class="pre">KernelsForDataType.alignments</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.library_defaults.KernelsForDataType.all_operations"><code class="docutils literal notranslate"><span class="pre">KernelsForDataType.all_operations</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.library_defaults.KernelsForDataType.find_alignment"><code class="docutils literal notranslate"><span class="pre">KernelsForDataType.find_alignment()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.library_defaults.KernelsForDataType.operations"><code class="docutils literal notranslate"><span class="pre">KernelsForDataType.operations()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.library_defaults.KernelsForDataType.sort"><code class="docutils literal notranslate"><span class="pre">KernelsForDataType.sort()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#cutlass.library_defaults.OptionRegistry"><code class="docutils literal notranslate"><span class="pre">OptionRegistry</span></code></a><ul>
|
||||
<li><a class="reference internal" href="#cutlass.library_defaults.OptionRegistry.options_for_cc"><code class="docutils literal notranslate"><span class="pre">OptionRegistry.options_for_cc()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#module-cutlass.swizzle">Swizzle</a><ul>
|
||||
<li><a class="reference internal" href="#cutlass.swizzle.get_swizzling_functors"><code class="docutils literal notranslate"><span class="pre">get_swizzling_functors()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/scripts/furo.js"></script>
|
||||
<script src="_static/clipboard.min.js"></script>
|
||||
<script src="_static/copybutton.js"></script>
|
||||
<script src="_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
784
python/docs/cutlass.op.html
Normal file
@ -0,0 +1,784 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
<link rel="index" title="Index" href="genindex.html" /><link rel="search" title="Search" href="search.html" /><link rel="next" title="Utilities" href="cutlass.utils.html" /><link rel="prev" title="Emitters" href="cutlass.emit.html" />
|
||||
<link rel="canonical" href="docs/cutlass.op.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>Operations - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul class="current">
|
||||
<li class="toctree-l1 current has-children"><a class="reference internal" href="modules.html">CUTLASS Python API</a><input checked="" class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul class="current">
|
||||
<li class="toctree-l2 current has-children"><a class="reference internal" href="cutlass.html">CUTLASS</a><input checked="" class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul class="current">
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3 current current-page"><a class="current reference internal" href="#">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<section id="operations">
|
||||
<h1>Operations<a class="headerlink" href="#operations" title="Permalink to this heading">#</a></h1>
|
||||
<section id="module-cutlass.op.gemm">
|
||||
<span id="gemm"></span><h2>GEMM<a class="headerlink" href="#module-cutlass.op.gemm" title="Permalink to this heading">#</a></h2>
|
||||
<p>Ease-of-use interface for constructing, compiling, and running GEMMs.</p>
|
||||
<p>The <code class="docutils literal notranslate"><span class="pre">Gemm</span></code> interface is meant to allow one to easily instantiate, compile, and run
|
||||
GEMM operations in CUTLASS via Python, without specifying many configuration parameters.
|
||||
Under the hood, the interface will select sensible default parameters for the many template
|
||||
parameters for CUTLASS GEMMs.</p>
|
||||
<p>Note: optimal performance is not to be expected from this interface. To achieve optimal
|
||||
performance, one should specify and tune each configuration parameter.</p>
|
||||
<p>The simplest example of using this interface is the following:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># A, B, C, and D are torch/numpy/cupy tensor objects</span>
|
||||
<span class="n">plan</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">Gemm</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">)</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>One can also use the interface by specifying data types of operands at construction
|
||||
and using different tensor objects with these data types at runtime:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># The following is shorthand for:</span>
|
||||
<span class="c1"># cutlass.op.Gemm(element_A=torch.float32, element_B=torch.float32,</span>
|
||||
<span class="c1"># element_C=torch.float32, element_D=torch.float32,</span>
|
||||
<span class="c1"># element_accumulator=torch.float32,</span>
|
||||
<span class="c1"># layout=cutlass.LayoutType.RowMajor)</span>
|
||||
<span class="n">plan</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">Gemm</span><span class="p">(</span><span class="n">element</span><span class="o">=</span><span class="n">torch</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">)</span>
|
||||
|
||||
<span class="n">A0</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">rand</span><span class="p">((</span><span class="mi">128</span><span class="p">,</span> <span class="mi">256</span><span class="p">),</span> <span class="n">device</span><span class="o">=</span><span class="s1">'cuda'</span><span class="p">)</span>
|
||||
<span class="n">B0</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">rand</span><span class="p">((</span><span class="mi">256</span><span class="p">,</span> <span class="mi">64</span><span class="p">),</span> <span class="n">device</span><span class="o">=</span><span class="s1">'cuda'</span><span class="p">)</span>
|
||||
<span class="n">C0</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="mi">128</span><span class="p">,</span> <span class="mi">64</span><span class="p">),</span> <span class="n">device</span><span class="o">=</span><span class="s1">'cuda'</span><span class="p">)</span>
|
||||
<span class="n">D0</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="mi">128</span><span class="p">,</span> <span class="mi">64</span><span class="p">),</span> <span class="n">device</span><span class="o">.</span><span class="s1">'cuda'</span><span class="p">)</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">A0</span><span class="p">,</span> <span class="n">B0</span><span class="p">,</span> <span class="n">C0</span><span class="p">,</span> <span class="n">D0</span><span class="p">)</span>
|
||||
|
||||
<span class="n">A</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">rand</span><span class="p">((</span><span class="mi">32</span><span class="p">,</span> <span class="mi">128</span><span class="p">),</span> <span class="n">device</span><span class="o">=</span><span class="s1">'cuda'</span><span class="p">)</span>
|
||||
<span class="n">B</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">rand</span><span class="p">((</span><span class="mi">128</span><span class="p">,</span> <span class="mi">256</span><span class="p">),</span> <span class="n">device</span><span class="o">=</span><span class="s1">'cuda'</span><span class="p">)</span>
|
||||
<span class="n">C</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="mi">32</span><span class="p">,</span> <span class="mi">256</span><span class="p">),</span> <span class="n">device</span><span class="o">=</span><span class="s1">'cuda'</span><span class="p">)</span>
|
||||
<span class="n">D</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="mi">32</span><span class="p">,</span> <span class="mi">256</span><span class="p">),</span> <span class="n">device</span><span class="o">.</span><span class="s1">'cuda'</span><span class="p">)</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">A1</span><span class="p">,</span> <span class="n">B1</span><span class="p">,</span> <span class="n">C1</span><span class="p">,</span> <span class="n">D1</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The interface additionally enables one to decouple the compilation of the underlying CUTLASS
|
||||
kernel from its execution:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">plan</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">Gemm</span><span class="p">(</span><span class="n">element</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">)</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">compile</span><span class="p">()</span>
|
||||
|
||||
<span class="c1"># Do other work...</span>
|
||||
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">A0</span><span class="p">,</span> <span class="n">B0</span><span class="p">,</span> <span class="n">C0</span><span class="p">,</span> <span class="n">D0</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># Do other work...</span>
|
||||
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">A1</span><span class="p">,</span> <span class="n">B1</span><span class="p">,</span> <span class="n">C1</span><span class="p">,</span> <span class="n">D1</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Elementwise activation functions are easily fused to the GEMM via the interface:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">plan</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">Gemm</span><span class="p">(</span><span class="n">element</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">)</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">activation</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">epilogue</span><span class="o">.</span><span class="n">relu</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Operations can also be run asynchronously:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="n">plan</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">Gemm</span><span class="p">(</span><span class="n">element</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">)</span>
|
||||
<span class="n">args</span> <span class="o">=</span> <span class="n">plan</span><span class="o">.</span><span class="n">run</span><span class="p">()</span>
|
||||
|
||||
<span class="c1"># Do other work...</span>
|
||||
|
||||
<span class="n">args</span><span class="o">.</span><span class="n">sync</span><span class="p">()</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="cutlass.op.gemm.Gemm">
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">cutlass.op.gemm.</span></span><span class="sig-name descname"><span class="pre">Gemm</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">A</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">B</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">C</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">D</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">alpha</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1.0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">beta</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element_accumulator</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">layout</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element_A</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element_B</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element_C</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element_D</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">layout_A</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">layout_B</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">layout_C</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">cc</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">kernel_cc</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/op/gemm.html#Gemm"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.op.gemm.Gemm" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Bases: <a class="reference internal" href="#cutlass.op.op.OperationBase" title="cutlass.op.op.OperationBase"><code class="xref py py-class docutils literal notranslate"><span class="pre">OperationBase</span></code></a></p>
|
||||
<p>Constructs a <code class="docutils literal notranslate"><span class="pre">Gemm</span></code> object.</p>
|
||||
<p>The data types and layouts of operands A, B, and C, along with the data type of output D
|
||||
and that used for accumulation, are bound to the <code class="docutils literal notranslate"><span class="pre">Gemm</span></code> object throughout its lifetime –
|
||||
these are not to be changed after a <code class="docutils literal notranslate"><span class="pre">Gemm</span></code> has been constructed.</p>
|
||||
<p>The constructor has optional parameters for flexibly setting these parameters. The following
|
||||
constructors are equivalent:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># Use F32 for A, B, C, D, and accumulation. All operands are row major.</span>
|
||||
|
||||
<span class="c1"># Use the generic ``element`` and ``layout`` parameters to concisely set all data types and layouts</span>
|
||||
<span class="c1"># for operands to the same values.</span>
|
||||
<span class="n">Gemm</span><span class="p">(</span><span class="n">element</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># Explicitly specify the data types to use for A, B, C, and D. Use the generic ``layout``.</span>
|
||||
<span class="n">Gemm</span><span class="p">(</span><span class="n">element_A</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">,</span> <span class="n">element_B</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">,</span> <span class="n">element_C</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">,</span>
|
||||
<span class="n">element_D</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># Set the data types and elements from existing tensors. Note that one can use different tensors when</span>
|
||||
<span class="c1"># executing GEMM via the ``run()`` method than passed in here (though those passed in to ``run()`` must</span>
|
||||
<span class="c1"># have the same data type and layout as those passed in here).</span>
|
||||
<span class="c1"># A, B, C, and D are row-major torch.Tensor objects of type torch.float32</span>
|
||||
<span class="n">Gemm</span><span class="p">(</span><span class="n">A</span><span class="o">=</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="o">=</span><span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="o">=</span><span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="o">=</span><span class="n">D</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># Use the generic ``element`` and explicitly specify the layouts to use for A, B, and C (layout of D is</span>
|
||||
<span class="c1"># the same as that for D, at present)</span>
|
||||
<span class="n">Gemm</span><span class="p">(</span><span class="n">element</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">,</span> <span class="n">layout_A</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">,</span>
|
||||
<span class="n">layout_B</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">,</span> <span class="n">layout_C</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># Explicitly specify the data type and layout for only some of A, B, C, and D. Unspecified data types</span>
|
||||
<span class="c1"># and layouts will inherit those passed in via the generic ``element`` and ``layout``</span>
|
||||
<span class="n">Gemm</span><span class="p">(</span><span class="n">element_A</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">,</span> <span class="n">layout_B</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">,</span>
|
||||
<span class="n">element</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f32</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<dl class="simple">
|
||||
<dt>The order of precedence for the setting of the data type and layout for a given operand/output is as follows:</dt><dd><ol class="arabic simple">
|
||||
<li><p>If the tensor type is specified (e.g., <code class="docutils literal notranslate"><span class="pre">A</span></code>), use the data type and layout inferred from this tensor</p></li>
|
||||
<li><p>Otherwise, if the data type/layout (e.g., <code class="docutils literal notranslate"><span class="pre">element_A</span></code>, <code class="docutils literal notranslate"><span class="pre">layout_A</span></code>) is specified, use those</p></li>
|
||||
<li><p>Otherwise, use the generic values (e.g., <code class="docutils literal notranslate"><span class="pre">element</span></code>, <code class="docutils literal notranslate"><span class="pre">layout</span></code>)</p></li>
|
||||
</ol>
|
||||
</dd>
|
||||
</dl>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>cc</strong> (<em>int</em>) – compute capability of device for which kernels should be compiled. For example, if running on H100, this should be set to 90</p></li>
|
||||
<li><p><strong>kernel_cc</strong> (<em>int</em>) – compute capability of kernels to generate. For example, if running on SM90, but desiring to use a CUTLASS 2.x-style Ampere kernel, this should be set to 80</p></li>
|
||||
<li><p><strong>A</strong> – tensor representing data type and layout of operand A</p></li>
|
||||
<li><p><strong>B</strong> – tensor representing data type and layout of operand B</p></li>
|
||||
<li><p><strong>C</strong> – tensor representing data type and layout of operand C</p></li>
|
||||
<li><p><strong>D</strong> – tensor representing data type and layout of operand D</p></li>
|
||||
<li><p><strong>alpha</strong> – scalar paramter alpha from GEMM computation that scales the product of operands A and B</p></li>
|
||||
<li><p><strong>beta</strong> – scalar parameter beta from GEMM operation that scales operand C</p></li>
|
||||
<li><p><strong>element_accumulator</strong> (<em>cutlass.DataType</em>) – data type to be used in accumulation of the product of operands A and B</p></li>
|
||||
<li><p><strong>element</strong> (<em>cutlass.DataType</em>) – generic data type to be used for operands A, B, C, D, as well as the accumulation data type</p></li>
|
||||
<li><p><strong>layout</strong> (<em>cutlass.LayoutType</em>) – generic layout type to be used for operands A, B, C, and D</p></li>
|
||||
<li><p><strong>element_A</strong> (<em>cutlass.DataType</em>) – data type to be used for operand A</p></li>
|
||||
<li><p><strong>element_B</strong> (<em>cutlass.DataType</em>) – data type to be used for operand B</p></li>
|
||||
<li><p><strong>element_C</strong> (<em>cutlass.DataType</em>) – data type to be used for operand C</p></li>
|
||||
<li><p><strong>element_D</strong> (<em>cutlass.DataType</em>) – data type to be used for operand D</p></li>
|
||||
<li><p><strong>layout_A</strong> (<em>layout</em><em> of </em><em>operand A</em>) – cutlass.LayoutType</p></li>
|
||||
<li><p><strong>layout_B</strong> (<em>layout</em><em> of </em><em>operand B</em>) – cutlass.LayoutType</p></li>
|
||||
<li><p><strong>layout_C</strong> (<em>layout</em><em> of </em><em>operand C</em>) – cutlass.LayoutType</p></li>
|
||||
<li><p><strong>layout_D</strong> (<em>layout</em><em> of </em><em>operand D</em>) – cutlass.LayoutType</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
</dl>
|
||||
<dl class="py property">
|
||||
<dt class="sig sig-object py" id="cutlass.op.gemm.Gemm.activation">
|
||||
<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">activation</span></span><a class="headerlink" href="#cutlass.op.gemm.Gemm.activation" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Returns the type of the current activation function used</p>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="cutlass.op.gemm.Gemm.compile">
|
||||
<span class="sig-name descname"><span class="pre">compile</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">tile_description</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">alignment_A</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">alignment_B</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">alignment_C</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">print_module</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/op/gemm.html#Gemm.compile"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.op.gemm.Gemm.compile" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Emits and compiles the kernel currently specified. If <code class="docutils literal notranslate"><span class="pre">tile_description</span></code> and any
|
||||
of the <code class="docutils literal notranslate"><span class="pre">alignment</span></code> parameters are set, the kernel will be chosen using this
|
||||
tile description and alignments. Otherwise, a default tile description and alignment
|
||||
will be used.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>tile_description</strong> (<em>cutlass.backend.TileDescription</em>) – tile description specifying shapes and operand types to use in the kernel</p></li>
|
||||
<li><p><strong>alignment_A</strong> (<em>int</em>) – alignment of operand A</p></li>
|
||||
<li><p><strong>alignment_B</strong> (<em>int</em>) – alignment of operand B</p></li>
|
||||
<li><p><strong>alignment_C</strong> (<em>int</em>) – alignment of operand C</p></li>
|
||||
<li><p><strong>print_module</strong> (<em>bool</em>) – whether to print the emitted C++ code</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>operation that was compiled</p>
|
||||
</dd>
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>cutlass.backend.GemmOperationUniversal</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="cutlass.op.gemm.Gemm.construct">
|
||||
<span class="sig-name descname"><span class="pre">construct</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">tile_description</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">alignment_A</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">alignment_B</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">alignment_C</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/op/gemm.html#Gemm.construct"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.op.gemm.Gemm.construct" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Constructs a <code class="docutils literal notranslate"><span class="pre">cutlass.backend.GemmUniversalOperation</span></code> based on the input parameters and current
|
||||
kernel specification of the <code class="docutils literal notranslate"><span class="pre">Gemm</span></code> object.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>tile_description</strong> (<em>cutlass.backend.TileDescription</em>) – tile description specifying shapes and operand types to use in the kernel</p></li>
|
||||
<li><p><strong>alignment_A</strong> (<em>int</em>) – alignment of operand A</p></li>
|
||||
<li><p><strong>alignment_B</strong> (<em>int</em>) – alignment of operand B</p></li>
|
||||
<li><p><strong>alignment_C</strong> (<em>int</em>) – alignment of operand C</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>operation that was constructed</p>
|
||||
</dd>
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>cutlass.backend.GemmOperationUniversal</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py property">
|
||||
<dt class="sig sig-object py" id="cutlass.op.gemm.Gemm.opclass">
|
||||
<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">opclass</span></span><em class="property"><span class="p"><span class="pre">:</span></span><span class="w"> </span><span class="pre">OpcodeClass</span></em><a class="headerlink" href="#cutlass.op.gemm.Gemm.opclass" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Returns the opcode class currently in use by the GEMM</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>opcode class currently in use</p>
|
||||
</dd>
|
||||
<dt class="field-even">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>cutlass.OpcodeClass</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="cutlass.op.gemm.Gemm.run">
|
||||
<span class="sig-name descname"><span class="pre">run</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">A</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">B</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">C</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">D</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">alpha</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">beta</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">batch_count</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">sync</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">print_module</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/op/gemm.html#Gemm.run"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.op.gemm.Gemm.run" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Runs the kernel currently specified. If it has not already been, the kernel is emitted and
|
||||
compiled. Tensors holding operands and outputs of the kernel are sourced either from the
|
||||
<code class="docutils literal notranslate"><span class="pre">A</span></code>, <code class="docutils literal notranslate"><span class="pre">B</span></code>, <code class="docutils literal notranslate"><span class="pre">C</span></code>, <code class="docutils literal notranslate"><span class="pre">D</span></code>, <code class="docutils literal notranslate"><span class="pre">alpha</span></code>, and <code class="docutils literal notranslate"><span class="pre">beta</span></code>
|
||||
parameters provided in this call, or from those
|
||||
passed in on the construction of this object – one of the two must be specified.</p>
|
||||
<p>By default, this call returns only once the kernel has completed. To launch the kernel
|
||||
and immediately return, set <code class="docutils literal notranslate"><span class="pre">sync=False</span></code>. In this case, it is the responsibility of the
|
||||
caller to syncrhonize the results of the kernel before attempting to access outputs
|
||||
by calling <code class="docutils literal notranslate"><span class="pre">sync()</span></code> on the arguments returned from this call.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>A</strong> – tensor representing data type and layout of operand A</p></li>
|
||||
<li><p><strong>B</strong> – tensor representing data type and layout of operand B</p></li>
|
||||
<li><p><strong>C</strong> – tensor representing data type and layout of operand C</p></li>
|
||||
<li><p><strong>D</strong> – tensor representing data type and layout of operand D</p></li>
|
||||
<li><p><strong>alpha</strong> – scalar paramter alpha from GEMM computation that scales the product of operands A and B</p></li>
|
||||
<li><p><strong>beta</strong> – scalar parameter beta from GEMM operation that scales operand C</p></li>
|
||||
<li><p><strong>batch_count</strong> (<em>int</em>) – number of GEMMs in the batch</p></li>
|
||||
<li><p><strong>sync</strong> (<em>bool</em>) – whether the call should wait for the kernel to complete before returning</p></li>
|
||||
<li><p><strong>print_module</strong> (<em>bool</em>) – whether to print the emitted C++ code</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>arguments passed in to the kernel</p>
|
||||
</dd>
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>cutlass.backend.GemmArguments</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py property">
|
||||
<dt class="sig sig-object py" id="cutlass.op.gemm.Gemm.swizzling_functor">
|
||||
<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">swizzling_functor</span></span><a class="headerlink" href="#cutlass.op.gemm.Gemm.swizzling_functor" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Returns the type of the swizzling functor currently being used by the GEMM</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>swizzing functor type</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="cutlass.op.gemm.Gemm.tile_descriptions">
|
||||
<span class="sig-name descname"><span class="pre">tile_descriptions</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/op/gemm.html#Gemm.tile_descriptions"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.op.gemm.Gemm.tile_descriptions" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Returns a list of valid tile descriptions for the operations</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>list of valid tile descriptions for the operations</p>
|
||||
</dd>
|
||||
<dt class="field-even">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>list</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
</dd></dl>
|
||||
|
||||
</section>
|
||||
<section id="module-cutlass.op.gemm_grouped">
|
||||
<span id="grouped-gemm"></span><h2>Grouped GEMM<a class="headerlink" href="#module-cutlass.op.gemm_grouped" title="Permalink to this heading">#</a></h2>
|
||||
<p>Ease-of-use interface for constructing, compiling, and running GEMMs.</p>
|
||||
<p>The <code class="docutils literal notranslate"><span class="pre">GroupedGemm</span></code> interface is meant to allow one to easily instantiate, compile, and run
|
||||
grouped GEMM operations in CUTLASS via Python, without specifying many configuration parameters.
|
||||
Under the hood, the interface will select sensible default parameters for the many template
|
||||
parameters for CUTLASS grouped GEMMs.</p>
|
||||
<p>Note: optimal performance is not to be expected from this interface. To achieve optimal
|
||||
performance, one should specify and tune each configuration parameter.</p>
|
||||
<p>The simplest example of using this interface is the following:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="c1"># As, Bs, Cs, and Ds are torch/numpy/cupy tensor objects</span>
|
||||
<span class="n">plan</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">GroupedGemm</span><span class="p">(</span><span class="n">element</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">DataType</span><span class="o">.</span><span class="n">f16</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">)</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">run</span><span class="p">([</span><span class="n">A0</span><span class="p">,</span> <span class="n">A1</span><span class="p">],</span> <span class="p">[</span><span class="n">B0</span><span class="p">,</span> <span class="n">B1</span><span class="p">],</span> <span class="p">[</span><span class="n">C0</span><span class="p">,</span> <span class="n">C1</span><span class="p">],</span> <span class="p">[</span><span class="n">D0</span><span class="p">,</span> <span class="n">D1</span><span class="p">])</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="cutlass.op.gemm_grouped.GroupedGemm">
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">cutlass.op.gemm_grouped.</span></span><span class="sig-name descname"><span class="pre">GroupedGemm</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">A</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">B</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">C</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">D</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">alpha</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">1.0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">beta</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">0.0</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element_accumulator</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">layout</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element_A</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element_B</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element_C</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">element_D</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">layout_A</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">layout_B</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">layout_C</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">cc</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/op/gemm_grouped.html#GroupedGemm"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.op.gemm_grouped.GroupedGemm" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Bases: <a class="reference internal" href="#cutlass.op.gemm.Gemm" title="cutlass.op.gemm.Gemm"><code class="xref py py-class docutils literal notranslate"><span class="pre">Gemm</span></code></a></p>
|
||||
<p>Constructs a <code class="docutils literal notranslate"><span class="pre">GroupedGemm</span></code> object.</p>
|
||||
<p>The data types and layouts of operands A, B, and C, along with the data type of output D
|
||||
and that used for accumulation, are bound to the <code class="docutils literal notranslate"><span class="pre">GroupedGemm</span></code> object throughout its lifetime –
|
||||
these are not to be changed after a <code class="docutils literal notranslate"><span class="pre">GroupedGemm</span></code> has been constructed.</p>
|
||||
<p>The constructor has optional parameters for flexibly setting these parameters. Please see the constructor
|
||||
for <code class="docutils literal notranslate"><span class="pre">Gemm</span></code> for examples of these.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>cc</strong> (<em>int</em>) – compute capability of device to generate kernels for</p></li>
|
||||
<li><p><strong>A</strong> – tensor representing data type and layout of operands A</p></li>
|
||||
<li><p><strong>B</strong> – tensor representing data type and layout of operands B</p></li>
|
||||
<li><p><strong>C</strong> – tensor representing data type and layout of operands C</p></li>
|
||||
<li><p><strong>D</strong> – tensor representing data type and layout of operands D</p></li>
|
||||
<li><p><strong>alpha</strong> – scalar paramter alpha from GEMM computation that scales the product of operands A and B</p></li>
|
||||
<li><p><strong>beta</strong> – scalar parameter beta from GEMM operation that scales operand C</p></li>
|
||||
<li><p><strong>element_accumulator</strong> (<em>cutlass.DataType</em>) – data type to be used in accumulation of the product of operands A and B</p></li>
|
||||
<li><p><strong>element</strong> (<em>cutlass.DataType</em>) – generic data type to be used for operands A, B, C, D, as well as the accumulation data type</p></li>
|
||||
<li><p><strong>layout</strong> (<em>cutlass.LayoutType</em>) – generic layout type to be used for operands A, B, C, and D</p></li>
|
||||
<li><p><strong>element_A</strong> (<em>cutlass.DataType</em>) – data type to be used for operand A</p></li>
|
||||
<li><p><strong>element_B</strong> (<em>cutlass.DataType</em>) – data type to be used for operand B</p></li>
|
||||
<li><p><strong>element_C</strong> (<em>cutlass.DataType</em>) – data type to be used for operand C</p></li>
|
||||
<li><p><strong>element_D</strong> (<em>cutlass.DataType</em>) – data type to be used for operand D</p></li>
|
||||
<li><p><strong>layout_A</strong> (<em>layout</em><em> of </em><em>operand A</em>) – cutlass.LayoutType</p></li>
|
||||
<li><p><strong>layout_B</strong> (<em>layout</em><em> of </em><em>operand B</em>) – cutlass.LayoutType</p></li>
|
||||
<li><p><strong>layout_C</strong> (<em>layout</em><em> of </em><em>operand C</em>) – cutlass.LayoutType</p></li>
|
||||
<li><p><strong>layout_D</strong> (<em>layout</em><em> of </em><em>operand D</em>) – cutlass.LayoutType</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
</dl>
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="cutlass.op.gemm_grouped.GroupedGemm.construct">
|
||||
<span class="sig-name descname"><span class="pre">construct</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">tile_description</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">alignment_A</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">alignment_B</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">alignment_C</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/op/gemm_grouped.html#GroupedGemm.construct"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.op.gemm_grouped.GroupedGemm.construct" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Constructs a <code class="docutils literal notranslate"><span class="pre">cutlass.backend.GemmOperationGrouped</span></code> based on the input parameters and current
|
||||
kernel specification of the <code class="docutils literal notranslate"><span class="pre">Gemm</span></code> object.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>tile_description</strong> (<em>cutlass.backend.TileDescription</em>) – tile description specifying shapes and operand types to use in the kernel</p></li>
|
||||
<li><p><strong>alignment_A</strong> (<em>int</em>) – alignment of operand A</p></li>
|
||||
<li><p><strong>alignment_B</strong> (<em>int</em>) – alignment of operand B</p></li>
|
||||
<li><p><strong>alignment_C</strong> (<em>int</em>) – alignment of operand C</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>operation that was constructed</p>
|
||||
</dd>
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>cutlass.backend.GemmOperationGrouped</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="cutlass.op.gemm_grouped.GroupedGemm.run">
|
||||
<span class="sig-name descname"><span class="pre">run</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">A</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">B</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">C</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">D</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">alpha</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">beta</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">sync</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">True</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">print_module</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/op/gemm_grouped.html#GroupedGemm.run"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.op.gemm_grouped.GroupedGemm.run" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Runs the kernel currently specified.</p>
|
||||
<p>By default, this call returns only once the kernel has completed. To launch the kernel
|
||||
and immediately return, set <code class="docutils literal notranslate"><span class="pre">sync=False</span></code>. In this case, it is the responsibility of the
|
||||
caller to syncrhonize the results of the kernel before attempting to access outputs
|
||||
by calling <code class="docutils literal notranslate"><span class="pre">sync()</span></code> on the arguments returned from this call.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>A</strong> (<em>list</em>) – list of tensors representing data type and layout of operand A</p></li>
|
||||
<li><p><strong>B</strong> (<em>list</em>) – list of tensors representing data type and layout of operand B</p></li>
|
||||
<li><p><strong>C</strong> (<em>list</em>) – list of tensors representing data type and layout of operand C</p></li>
|
||||
<li><p><strong>D</strong> (<em>list</em>) – list of tensors representing data type and layout of operand D</p></li>
|
||||
<li><p><strong>alpha</strong> – scalar paramter alpha from GEMM computation that scales the product of operands A and B</p></li>
|
||||
<li><p><strong>beta</strong> – scalar parameter beta from GEMM operation that scales operand C</p></li>
|
||||
<li><p><strong>sync</strong> (<em>bool</em>) – whether the call should wait for the kernel to complete before returning</p></li>
|
||||
<li><p><strong>print_module</strong> (<em>bool</em>) – whether to print the emitted C++ code</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>arguments passed in to the kernel</p>
|
||||
</dd>
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>cutlass.backend.GemmGroupedArguments</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py property">
|
||||
<dt class="sig sig-object py" id="cutlass.op.gemm_grouped.GroupedGemm.swizzling_functor">
|
||||
<em class="property"><span class="pre">property</span><span class="w"> </span></em><span class="sig-name descname"><span class="pre">swizzling_functor</span></span><a class="headerlink" href="#cutlass.op.gemm_grouped.GroupedGemm.swizzling_functor" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Returns the type of the swizzling functor currently being used by the GEMM</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>swizzing functor type</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
</dd></dl>
|
||||
|
||||
</section>
|
||||
<section id="module-cutlass.op.op">
|
||||
<span id="operation"></span><h2>Operation<a class="headerlink" href="#module-cutlass.op.op" title="Permalink to this heading">#</a></h2>
|
||||
<p>Base operation used for defining high-level CUTLASS operations (e.g., GEMM, Conv2d)</p>
|
||||
<dl class="py class">
|
||||
<dt class="sig sig-object py" id="cutlass.op.op.OperationBase">
|
||||
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">cutlass.op.op.</span></span><span class="sig-name descname"><span class="pre">OperationBase</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">cc</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">kernel_cc</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">None</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/op/op.html#OperationBase"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.op.op.OperationBase" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Bases: <code class="xref py py-class docutils literal notranslate"><span class="pre">object</span></code></p>
|
||||
<p>Base operation used for defining high-level CUTLASS operations (e.g., GEMM, Conv2d)</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>cc</strong> (<em>int</em>) – </p></li>
|
||||
<li><p><strong>kernel_cc</strong> (<em>int</em>) – </p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
</dl>
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="cutlass.op.op.OperationBase.activations">
|
||||
<span class="sig-name descname"><span class="pre">activations</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/op/op.html#OperationBase.activations"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.op.op.OperationBase.activations" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Returns possible activation functions that can be used</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>list of activation functions that can be used</p>
|
||||
</dd>
|
||||
<dt class="field-even">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>list</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py method">
|
||||
<dt class="sig sig-object py" id="cutlass.op.op.OperationBase.swizzling_functors">
|
||||
<span class="sig-name descname"><span class="pre">swizzling_functors</span></span><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/op/op.html#OperationBase.swizzling_functors"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.op.op.OperationBase.swizzling_functors" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Returns possible swizzling functions that can be used</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>list of swizzling functions that can be used</p>
|
||||
</dd>
|
||||
<dt class="field-even">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>list</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
</dd></dl>
|
||||
|
||||
</section>
|
||||
</section>
|
||||
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
<a class="next-page" href="cutlass.utils.html">
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Next</span>
|
||||
</div>
|
||||
<div class="title">Utilities</div>
|
||||
</div>
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
</a>
|
||||
<a class="prev-page" href="cutlass.emit.html">
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Previous</span>
|
||||
</div>
|
||||
|
||||
<div class="title">Emitters</div>
|
||||
|
||||
</div>
|
||||
</a>
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer">
|
||||
|
||||
|
||||
<div class="toc-sticky toc-scroll">
|
||||
<div class="toc-title-container">
|
||||
<span class="toc-title">
|
||||
On this page
|
||||
</span>
|
||||
</div>
|
||||
<div class="toc-tree-container">
|
||||
<div class="toc-tree">
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Operations</a><ul>
|
||||
<li><a class="reference internal" href="#module-cutlass.op.gemm">GEMM</a><ul>
|
||||
<li><a class="reference internal" href="#cutlass.op.gemm.Gemm"><code class="docutils literal notranslate"><span class="pre">Gemm</span></code></a><ul>
|
||||
<li><a class="reference internal" href="#cutlass.op.gemm.Gemm.activation"><code class="docutils literal notranslate"><span class="pre">Gemm.activation</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.op.gemm.Gemm.compile"><code class="docutils literal notranslate"><span class="pre">Gemm.compile()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.op.gemm.Gemm.construct"><code class="docutils literal notranslate"><span class="pre">Gemm.construct()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.op.gemm.Gemm.opclass"><code class="docutils literal notranslate"><span class="pre">Gemm.opclass</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.op.gemm.Gemm.run"><code class="docutils literal notranslate"><span class="pre">Gemm.run()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.op.gemm.Gemm.swizzling_functor"><code class="docutils literal notranslate"><span class="pre">Gemm.swizzling_functor</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.op.gemm.Gemm.tile_descriptions"><code class="docutils literal notranslate"><span class="pre">Gemm.tile_descriptions()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#module-cutlass.op.gemm_grouped">Grouped GEMM</a><ul>
|
||||
<li><a class="reference internal" href="#cutlass.op.gemm_grouped.GroupedGemm"><code class="docutils literal notranslate"><span class="pre">GroupedGemm</span></code></a><ul>
|
||||
<li><a class="reference internal" href="#cutlass.op.gemm_grouped.GroupedGemm.construct"><code class="docutils literal notranslate"><span class="pre">GroupedGemm.construct()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.op.gemm_grouped.GroupedGemm.run"><code class="docutils literal notranslate"><span class="pre">GroupedGemm.run()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.op.gemm_grouped.GroupedGemm.swizzling_functor"><code class="docutils literal notranslate"><span class="pre">GroupedGemm.swizzling_functor</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#module-cutlass.op.op">Operation</a><ul>
|
||||
<li><a class="reference internal" href="#cutlass.op.op.OperationBase"><code class="docutils literal notranslate"><span class="pre">OperationBase</span></code></a><ul>
|
||||
<li><a class="reference internal" href="#cutlass.op.op.OperationBase.activations"><code class="docutils literal notranslate"><span class="pre">OperationBase.activations()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.op.op.OperationBase.swizzling_functors"><code class="docutils literal notranslate"><span class="pre">OperationBase.swizzling_functors()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/scripts/furo.js"></script>
|
||||
<script src="_static/clipboard.min.js"></script>
|
||||
<script src="_static/copybutton.js"></script>
|
||||
<script src="_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
663
python/docs/cutlass.utils.html
Normal file
@ -0,0 +1,663 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
<link rel="index" title="Index" href="genindex.html" /><link rel="search" title="Search" href="search.html" /><link rel="next" title="Examples" href="examples.html" /><link rel="prev" title="Operations" href="cutlass.op.html" />
|
||||
<link rel="canonical" href="docs/cutlass.utils.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>Utilities - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul class="current">
|
||||
<li class="toctree-l1 current has-children"><a class="reference internal" href="modules.html">CUTLASS Python API</a><input checked="" class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul class="current">
|
||||
<li class="toctree-l2 current has-children"><a class="reference internal" href="cutlass.html">CUTLASS</a><input checked="" class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul class="current">
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3 current current-page"><a class="current reference internal" href="#">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<section id="utilities">
|
||||
<h1>Utilities<a class="headerlink" href="#utilities" title="Permalink to this heading">#</a></h1>
|
||||
<section id="module-cutlass.utils.check">
|
||||
<span id="checks"></span><h2>Checks<a class="headerlink" href="#module-cutlass.utils.check" title="Permalink to this heading">#</a></h2>
|
||||
<p>Utility functions for checking constraints on kernels and calculating kernel attributes</p>
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.check.alignment_or_default">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.check.</span></span><span class="sig-name descname"><span class="pre">alignment_or_default</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">alignment_provided</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">default_alignment</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/check.html#alignment_or_default"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.check.alignment_or_default" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Returns <cite>alignment_provided</cite> if it is set, otherwise <cite>default_alignment</cite> and checks
|
||||
that <cite>alignment_provided</cite> does not exceed <cite>default_alignment</cite>.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>alignment_provided</strong> (<em>int</em>) – alignment preference specified. Can be None.</p></li>
|
||||
<li><p><strong>default_alignment</strong> (<em>int</em>) – alignment to use if <cite>alignment_provided</cite> is None</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>alignment to use</p>
|
||||
</dd>
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>int</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.check.calculate_smem_usage">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.check.</span></span><span class="sig-name descname"><span class="pre">calculate_smem_usage</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">operation</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/check.html#calculate_smem_usage"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.check.calculate_smem_usage" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Returns the amount of shared memory in bytes consumed by a kernel.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>number of bytes of shared memory consumed by the operation</p>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>int</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.check.calculate_smem_usage_per_stage">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.check.</span></span><span class="sig-name descname"><span class="pre">calculate_smem_usage_per_stage</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">tile_description</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">operation_kind</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/check.html#calculate_smem_usage_per_stage"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.check.calculate_smem_usage_per_stage" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Returns the amount of shared memory in bytes consumed in a single stage of a kernel.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>number of bytes of shared memory consumed by a single stage</p>
|
||||
</dd>
|
||||
<dt class="field-even">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>int</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.check.valid_cluster_shape">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.check.</span></span><span class="sig-name descname"><span class="pre">valid_cluster_shape</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">cc</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">cluster_shape</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/check.html#valid_cluster_shape"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.check.valid_cluster_shape" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Checks whether a device with <cite>cc</cite> supports a thread block cluster of shape <cite>cluster_shape</cite>.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>cc</strong> (<em>int</em>) – compute capability of device in question</p></li>
|
||||
<li><p><strong>cluster_shape</strong> (<em>list</em>) – dimensions of thread block cluster shape to check</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>tuple with the first element indicating whether the provided cluster shape is
|
||||
valid for the provided device and the second element being an error message</p>
|
||||
</dd>
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>tuple</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.check.valid_kernel_schedule">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.check.</span></span><span class="sig-name descname"><span class="pre">valid_kernel_schedule</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">cc</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">kernel_schedule</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/check.html#valid_kernel_schedule"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.check.valid_kernel_schedule" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Checks whether a device with <code class="docutils literal notranslate"><span class="pre">cc</span></code> supports <code class="docutils literal notranslate"><span class="pre">kernel_schedule</span></code>.</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>cc</strong> (<em>int</em>) – compute capability of device in question</p></li>
|
||||
<li><p><strong>kernel_schedule</strong> (<em>KernelScheduleType</em>) – kernel schedule type</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>tuple with the first element indicating whether the provided kernel schedule is
|
||||
valid for the provided device and the second element being an error message</p>
|
||||
</dd>
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>tuple</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.check.valid_stage_count">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.check.</span></span><span class="sig-name descname"><span class="pre">valid_stage_count</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">cc</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">td</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/check.html#valid_stage_count"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.check.valid_stage_count" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Checks whether a device with <cite>cc</cite> supports the number of stages within <cite>tile_description</cite>, both
|
||||
based on raw limits on the number of stages and based on shared memory capacity</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>cc</strong> (<em>int</em>) – compute capability of device in question</p></li>
|
||||
<li><p><strong>td</strong> (<em>TileDescription</em>) – tile description to check</p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>tuple with the first element indicating whether the provided tile description is
|
||||
valid for the provided device and the second element being an error message</p>
|
||||
</dd>
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>tuple</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
</section>
|
||||
<section id="module-cutlass.utils.datatypes">
|
||||
<span id="data-types"></span><h2>Data Types<a class="headerlink" href="#module-cutlass.utils.datatypes" title="Permalink to this heading">#</a></h2>
|
||||
<p>Utility functions for converting between frontend datatypes and CUTLASS datatypes</p>
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.backend_math_operation">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">backend_math_operation</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">math_op</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#backend_math_operation"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.backend_math_operation" title="Permalink to this definition">#</a></dt>
|
||||
<dd><dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>math_op</strong> (<em>MathOperation</em>) – </p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.bfloat16_library_type">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">bfloat16_library_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">inp</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#bfloat16_library_type"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.bfloat16_library_type" title="Permalink to this definition">#</a></dt>
|
||||
<dd><dl class="field-list simple">
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><em>DataType</em></p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.bfloat16_type">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">bfloat16_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">inp</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#bfloat16_type"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.bfloat16_type" title="Permalink to this definition">#</a></dt>
|
||||
<dd><dl class="field-list simple">
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>bfloat16</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.binding_layout">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">binding_layout</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">layout</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#binding_layout"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.binding_layout" title="Permalink to this definition">#</a></dt>
|
||||
<dd></dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.binding_library_type">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">binding_library_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">inp</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#binding_library_type"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.binding_library_type" title="Permalink to this definition">#</a></dt>
|
||||
<dd></dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.binding_opclass">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">binding_opclass</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">opclass</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#binding_opclass"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.binding_opclass" title="Permalink to this definition">#</a></dt>
|
||||
<dd><dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>opclass</strong> (<em>OpcodeClass</em>) – </p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.binding_type">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">binding_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">inp</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#binding_type"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.binding_type" title="Permalink to this definition">#</a></dt>
|
||||
<dd></dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.construct_backend_td">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">construct_backend_td</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">td</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">kernel_schedule</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#construct_backend_td"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.construct_backend_td" title="Permalink to this definition">#</a></dt>
|
||||
<dd><dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><ul class="simple">
|
||||
<li><p><strong>td</strong> (<em>TileDescription</em>) – </p></li>
|
||||
<li><p><strong>kernel_schedule</strong> (<em>KernelScheduleType</em>) – </p></li>
|
||||
</ul>
|
||||
</dd>
|
||||
<dt class="field-even">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p><em>TileDescription</em></p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.cupy_library_type">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">cupy_library_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">inp</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#cupy_library_type"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.cupy_library_type" title="Permalink to this definition">#</a></dt>
|
||||
<dd><dl class="field-list simple">
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><em>DataType</em></p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.cupy_type">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">cupy_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">inp</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#cupy_type"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.cupy_type" title="Permalink to this definition">#</a></dt>
|
||||
<dd></dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.get_datatype_and_layout">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">get_datatype_and_layout</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">tensor</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#get_datatype_and_layout"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.get_datatype_and_layout" title="Permalink to this definition">#</a></dt>
|
||||
<dd></dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.has_binding_type">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">has_binding_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">inp</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#has_binding_type"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.has_binding_type" title="Permalink to this definition">#</a></dt>
|
||||
<dd><dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>inp</strong> (<em>DataType</em>) – </p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.library_layout">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">library_layout</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">layout</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#library_layout"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.library_layout" title="Permalink to this definition">#</a></dt>
|
||||
<dd></dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.library_to_binding">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">library_to_binding</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">inp</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#library_to_binding"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.library_to_binding" title="Permalink to this definition">#</a></dt>
|
||||
<dd><dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>inp</strong> (<em>DataType</em>) – </p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.library_type">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">library_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">inp</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#library_type"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.library_type" title="Permalink to this definition">#</a></dt>
|
||||
<dd></dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.numpy_library_type">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">numpy_library_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">inp</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#numpy_library_type"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.numpy_library_type" title="Permalink to this definition">#</a></dt>
|
||||
<dd><dl class="field-list simple">
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><em>DataType</em></p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.numpy_type">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">numpy_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">inp</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#numpy_type"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.numpy_type" title="Permalink to this definition">#</a></dt>
|
||||
<dd></dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.td_from_profiler_op">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">td_from_profiler_op</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">op</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#td_from_profiler_op"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.td_from_profiler_op" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Converts the profiler’s TileDescription in <code class="docutils literal notranslate"><span class="pre">op</span></code> into the backend TileDescription</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>op</strong> – profiler Operation</p>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>backend TileDescription</p>
|
||||
</dd>
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>cutlass.backend.TileDescription</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.td_from_profiler_td">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">td_from_profiler_td</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">td</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#td_from_profiler_td"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.td_from_profiler_td" title="Permalink to this definition">#</a></dt>
|
||||
<dd><p>Converts the profiler’s TileDescription into the backend TileDescription</p>
|
||||
<dl class="field-list simple">
|
||||
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><strong>td</strong> (<em>cutlass.TileDescription</em>) – profiler TileDescription</p>
|
||||
</dd>
|
||||
<dt class="field-even">Returns<span class="colon">:</span></dt>
|
||||
<dd class="field-even"><p>backend TileDescription</p>
|
||||
</dd>
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p>cutlass.backend.TileDescription</p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.torch_library_type">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">torch_library_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">inp</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#torch_library_type"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.torch_library_type" title="Permalink to this definition">#</a></dt>
|
||||
<dd><dl class="field-list simple">
|
||||
<dt class="field-odd">Return type<span class="colon">:</span></dt>
|
||||
<dd class="field-odd"><p><em>DataType</em></p>
|
||||
</dd>
|
||||
</dl>
|
||||
</dd></dl>
|
||||
|
||||
<dl class="py function">
|
||||
<dt class="sig sig-object py" id="cutlass.utils.datatypes.torch_type">
|
||||
<span class="sig-prename descclassname"><span class="pre">cutlass.utils.datatypes.</span></span><span class="sig-name descname"><span class="pre">torch_type</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">inp</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="_modules/cutlass/utils/datatypes.html#torch_type"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#cutlass.utils.datatypes.torch_type" title="Permalink to this definition">#</a></dt>
|
||||
<dd></dd></dl>
|
||||
|
||||
</section>
|
||||
</section>
|
||||
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
<a class="next-page" href="examples.html">
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Next</span>
|
||||
</div>
|
||||
<div class="title">Examples</div>
|
||||
</div>
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
</a>
|
||||
<a class="prev-page" href="cutlass.op.html">
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Previous</span>
|
||||
</div>
|
||||
|
||||
<div class="title">Operations</div>
|
||||
|
||||
</div>
|
||||
</a>
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer">
|
||||
|
||||
|
||||
<div class="toc-sticky toc-scroll">
|
||||
<div class="toc-title-container">
|
||||
<span class="toc-title">
|
||||
On this page
|
||||
</span>
|
||||
</div>
|
||||
<div class="toc-tree-container">
|
||||
<div class="toc-tree">
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Utilities</a><ul>
|
||||
<li><a class="reference internal" href="#module-cutlass.utils.check">Checks</a><ul>
|
||||
<li><a class="reference internal" href="#cutlass.utils.check.alignment_or_default"><code class="docutils literal notranslate"><span class="pre">alignment_or_default()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.check.calculate_smem_usage"><code class="docutils literal notranslate"><span class="pre">calculate_smem_usage()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.check.calculate_smem_usage_per_stage"><code class="docutils literal notranslate"><span class="pre">calculate_smem_usage_per_stage()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.check.valid_cluster_shape"><code class="docutils literal notranslate"><span class="pre">valid_cluster_shape()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.check.valid_kernel_schedule"><code class="docutils literal notranslate"><span class="pre">valid_kernel_schedule()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.check.valid_stage_count"><code class="docutils literal notranslate"><span class="pre">valid_stage_count()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#module-cutlass.utils.datatypes">Data Types</a><ul>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.backend_math_operation"><code class="docutils literal notranslate"><span class="pre">backend_math_operation()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.bfloat16_library_type"><code class="docutils literal notranslate"><span class="pre">bfloat16_library_type()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.bfloat16_type"><code class="docutils literal notranslate"><span class="pre">bfloat16_type()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.binding_layout"><code class="docutils literal notranslate"><span class="pre">binding_layout()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.binding_library_type"><code class="docutils literal notranslate"><span class="pre">binding_library_type()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.binding_opclass"><code class="docutils literal notranslate"><span class="pre">binding_opclass()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.binding_type"><code class="docutils literal notranslate"><span class="pre">binding_type()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.construct_backend_td"><code class="docutils literal notranslate"><span class="pre">construct_backend_td()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.cupy_library_type"><code class="docutils literal notranslate"><span class="pre">cupy_library_type()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.cupy_type"><code class="docutils literal notranslate"><span class="pre">cupy_type()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.get_datatype_and_layout"><code class="docutils literal notranslate"><span class="pre">get_datatype_and_layout()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.has_binding_type"><code class="docutils literal notranslate"><span class="pre">has_binding_type()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.library_layout"><code class="docutils literal notranslate"><span class="pre">library_layout()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.library_to_binding"><code class="docutils literal notranslate"><span class="pre">library_to_binding()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.library_type"><code class="docutils literal notranslate"><span class="pre">library_type()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.numpy_library_type"><code class="docutils literal notranslate"><span class="pre">numpy_library_type()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.numpy_type"><code class="docutils literal notranslate"><span class="pre">numpy_type()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.td_from_profiler_op"><code class="docutils literal notranslate"><span class="pre">td_from_profiler_op()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.td_from_profiler_td"><code class="docutils literal notranslate"><span class="pre">td_from_profiler_td()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.torch_library_type"><code class="docutils literal notranslate"><span class="pre">torch_library_type()</span></code></a></li>
|
||||
<li><a class="reference internal" href="#cutlass.utils.datatypes.torch_type"><code class="docutils literal notranslate"><span class="pre">torch_type()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/scripts/furo.js"></script>
|
||||
<script src="_static/clipboard.min.js"></script>
|
||||
<script src="_static/copybutton.js"></script>
|
||||
<script src="_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
330
python/docs/examples.html
Normal file
@ -0,0 +1,330 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
<link rel="index" title="Index" href="genindex.html" /><link rel="search" title="Search" href="search.html" /><link rel="next" title="Example of using elementwise activation functions in the CUTLASS Python interface" href="externals/01_epilogue.html" /><link rel="prev" title="Utilities" href="cutlass.utils.html" />
|
||||
<link rel="canonical" href="docs/examples.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>Examples - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul class="current">
|
||||
<li class="toctree-l1 current has-children current-page"><a class="current reference internal" href="#">Examples</a><input checked="" class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<section id="examples">
|
||||
<h1>Examples<a class="headerlink" href="#examples" title="Permalink to this heading">#</a></h1>
|
||||
<div class="toctree-wrapper compound">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="externals/00_basic_gemm.html">Basic GEMM</a><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/00_basic_gemm.html#Declaring-and-running-a-GEMM">Declaring and running a GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/00_basic_gemm.html#Changing-operation-modes">Changing operation modes</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/00_basic_gemm.html#Running-cached-kernels">Running cached kernels</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/00_basic_gemm.html#Running-non-default-GEMMs">Running non-default GEMMs</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/00_basic_gemm.html#Handling-errors">Handling errors</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="externals/01_epilogue.html">Epilogue</a><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/01_epilogue.html#Run-a-GEMM-with-an-identity-activation-function">Run a GEMM with an identity activation function</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/01_epilogue.html#Run-a-GEMM-with-a-ReLU-element-wise-activation-function">Run a GEMM with a ReLU element-wise activation function</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/01_epilogue.html#Other-element-wise-activation-functions">Other element-wise activation functions</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/02_pytorch_extension_grouped_gemm.html#Background-on-grouped-GEMM">Background on grouped GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/02_pytorch_extension_grouped_gemm.html#Declaring-a-grouped-GEMM-via-the-CUTLASS-Python-interface">Declaring a grouped GEMM via the CUTLASS Python interface</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/02_pytorch_extension_grouped_gemm.html#Exporting-the-CUTLASS-kernel-to-a-PyTorch-CUDA-extension">Exporting the CUTLASS kernel to a PyTorch CUDA extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
<a class="next-page" href="externals/01_epilogue.html">
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Next</span>
|
||||
</div>
|
||||
<div class="title">Example of using elementwise activation functions in the CUTLASS Python interface</div>
|
||||
</div>
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
</a>
|
||||
<a class="prev-page" href="cutlass.utils.html">
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Previous</span>
|
||||
</div>
|
||||
|
||||
<div class="title">Utilities</div>
|
||||
|
||||
</div>
|
||||
</a>
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer no-toc">
|
||||
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/scripts/furo.js"></script>
|
||||
<script src="_static/clipboard.min.js"></script>
|
||||
<script src="_static/copybutton.js"></script>
|
||||
<script src="_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
821
python/docs/externals/00_basic_gemm.html
vendored
Normal file
@ -0,0 +1,821 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
<link rel="index" title="Index" href="../genindex.html" /><link rel="search" title="Search" href="../search.html" /><link rel="next" title="Contributing" href="../contribute.html" /><link rel="prev" title="Installation" href="../install.html" />
|
||||
<link rel="canonical" href="docs/externals/00_basic_gemm.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>Basic example of using the CUTLASS Python interface - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/nbsphinx-code-cells.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="../index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="../index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="../_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="../_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="../search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul class="current">
|
||||
<li class="toctree-l1"><a class="reference internal" href="../install.html">Installation</a></li>
|
||||
<li class="toctree-l1 current"><a class="current reference internal" href="#">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="../cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul class="current">
|
||||
<li class="toctree-l1 current has-children"><a class="reference internal" href="../examples.html">Examples</a><input checked="" class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul class="current">
|
||||
<li class="toctree-l2 current current-page"><a class="current reference internal" href="#">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<section id="Basic-example-of-using-the-CUTLASS-Python-interface">
|
||||
<h1>Basic example of using the CUTLASS Python interface<a class="headerlink" href="#Basic-example-of-using-the-CUTLASS-Python-interface" title="Permalink to this heading">#</a></h1>
|
||||
<p>This notebook walks through a basic example of using the CUTLASS Python interface to declare, compile, and run GEMMs.</p>
|
||||
<p><a class="reference external" href="https://colab.research.google.com/github/NVIDIA/cutlass/tree/master/examples/00_basic_gemm.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a></p>
|
||||
<p>We first import various packages needed for the example and construct the input and output tensors that will be used in our example.</p>
|
||||
<div class="nbinput docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[1]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
<span class="kn">import</span> <span class="nn">random</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">cutlass</span>
|
||||
|
||||
<span class="c1"># This controls whether ther C++ GEMM declaration will be printed at each step. Set to `false` to</span>
|
||||
<span class="c1"># omit this information.</span>
|
||||
<span class="n">print_module</span> <span class="o">=</span> <span class="kc">True</span>
|
||||
|
||||
<span class="n">m</span> <span class="o">=</span> <span class="mi">128</span>
|
||||
<span class="n">n</span> <span class="o">=</span> <span class="n">m</span>
|
||||
<span class="n">k</span> <span class="o">=</span> <span class="n">m</span>
|
||||
|
||||
<span class="n">dtype</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">float16</span>
|
||||
<span class="n">type_A</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">float16</span>
|
||||
<span class="n">type_B</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">float16</span>
|
||||
<span class="n">type_C</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">float16</span>
|
||||
<span class="n">type_D</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">float16</span>
|
||||
|
||||
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">1234</span><span class="p">)</span>
|
||||
<span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">1234</span><span class="p">)</span>
|
||||
<span class="n">scope_min</span> <span class="o">=</span> <span class="o">-</span><span class="mi">4</span>
|
||||
<span class="n">scope_max</span> <span class="o">=</span> <span class="mi">4</span>
|
||||
<span class="n">tensor_A</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">low</span><span class="o">=</span><span class="n">scope_min</span><span class="p">,</span> <span class="n">high</span><span class="o">=</span><span class="n">scope_max</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">k</span><span class="p">))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">type_A</span><span class="p">))</span>
|
||||
<span class="n">tensor_B</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">low</span><span class="o">=</span><span class="n">scope_min</span><span class="p">,</span> <span class="n">high</span><span class="o">=</span><span class="n">scope_max</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">n</span><span class="p">))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">type_B</span><span class="p">))</span>
|
||||
<span class="n">tensor_C</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">low</span><span class="o">=</span><span class="n">scope_min</span><span class="p">,</span> <span class="n">high</span><span class="o">=</span><span class="n">scope_max</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">type_C</span><span class="p">))</span>
|
||||
|
||||
<span class="n">alpha</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">float16</span><span class="p">(</span><span class="mf">1.</span><span class="p">)</span>
|
||||
<span class="n">beta</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">float16</span><span class="p">(</span><span class="mf">0.</span><span class="p">)</span>
|
||||
|
||||
<span class="n">tensor_D</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">tensor_C</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">type_D</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="nboutput nblast docutils container">
|
||||
<div class="prompt empty docutils container">
|
||||
</div>
|
||||
<div class="output_area stderr docutils container">
|
||||
<div class="highlight"><pre>
|
||||
/usr/local/lib/python3.8/dist-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
|
||||
from .autonotebook import tqdm as notebook_tqdm
|
||||
</pre></div></div>
|
||||
</div>
|
||||
<section id="Declaring-and-running-a-GEMM">
|
||||
<h2>Declaring and running a GEMM<a class="headerlink" href="#Declaring-and-running-a-GEMM" title="Permalink to this heading">#</a></h2>
|
||||
<p>To get started, one only needs to provide the tensors declared above to the <code class="docutils literal notranslate"><span class="pre">cutlass.op.Gemm</span></code> call. This sets up a default GEMM operation for the given device on which you are running.</p>
|
||||
<p>Assuming that we are running on SM80, this default to using a GEMM that leverages FP16 Tensor Core operations.</p>
|
||||
<p>Calling <code class="docutils literal notranslate"><span class="pre">plan.run()</span></code> will generate the CUTLASS C++ kernel in question, compile it, and run it on the tensors we previously passed in. By setting <code class="docutils literal notranslate"><span class="pre">print_module</span></code> to <code class="docutils literal notranslate"><span class="pre">true</span></code>, the C++ code that is emitted is printed.</p>
|
||||
<div class="nbinput docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[2]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="c1"># We specify `element_accumulator` here so as to match the kernel run by NumPy below. However,</span>
|
||||
<span class="c1"># specifying `element_accumulator` is not required if it is the same as `element`</span>
|
||||
<span class="n">plan</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">Gemm</span><span class="p">(</span><span class="n">element</span><span class="o">=</span><span class="n">dtype</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">,</span> <span class="n">element_accumulator</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">float32</span><span class="p">)</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">tensor_A</span><span class="p">,</span> <span class="n">tensor_B</span><span class="p">,</span> <span class="n">tensor_C</span><span class="p">,</span> <span class="n">tensor_D</span><span class="p">,</span> <span class="n">print_module</span><span class="o">=</span><span class="n">print_module</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="nboutput docutils container">
|
||||
<div class="prompt empty docutils container">
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
|
||||
// Gemm operator cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_256x128_64x3_tt_align8
|
||||
using cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_256x128_64x3_tt_align8_base =
|
||||
typename cutlass::gemm::kernel::DefaultGemmUniversal<
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor,
|
||||
float,
|
||||
cutlass::arch::OpClassTensorOp,
|
||||
cutlass::arch::Sm80,
|
||||
cutlass::gemm::GemmShape<256, 128, 64>,
|
||||
cutlass::gemm::GemmShape<64, 64, 64>,
|
||||
cutlass::gemm::GemmShape<16, 8, 16>,
|
||||
cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, float, float>,
|
||||
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,
|
||||
3,
|
||||
cutlass::arch::OpMultiplyAdd
|
||||
>::GemmKernel;
|
||||
|
||||
// Define named type
|
||||
struct cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_256x128_64x3_tt_align8_type :
|
||||
public cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_256x128_64x3_tt_align8_base { };
|
||||
|
||||
</pre></div></div>
|
||||
</div>
|
||||
<div class="nboutput nblast docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[2]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
<cutlass.backend.gemm_operation.GemmArguments2x at 0x7f79cc556070>
|
||||
</pre></div></div>
|
||||
</div>
|
||||
<p>There are many other ways to construct a plan from <code class="docutils literal notranslate"><span class="pre">cutlass.op.Gemm</span></code> (e.g., by specifiying they types and layouts of each operand, by providing representative tensors as inputs). For more details on these, see the documentation in the <code class="docutils literal notranslate"><span class="pre">cutlass.op.Gemm</span></code> constructor.</p>
|
||||
<p>We then compare the output to running the GEMM using NumPy.</p>
|
||||
<div class="nbinput nblast docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[3]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">tensor_D_numpy</span> <span class="o">=</span> <span class="p">(</span><span class="n">alpha</span> <span class="o">*</span> <span class="p">(</span><span class="n">tensor_A</span> <span class="o">@</span> <span class="n">tensor_B</span><span class="p">))</span> <span class="o">+</span> <span class="p">(</span><span class="n">beta</span> <span class="o">*</span> <span class="n">tensor_C</span><span class="p">)</span>
|
||||
<span class="n">np</span><span class="o">.</span><span class="n">testing</span><span class="o">.</span><span class="n">assert_array_equal</span><span class="p">(</span><span class="n">tensor_D</span><span class="p">,</span> <span class="n">tensor_D_numpy</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<p>Note that one could use the same kernel just declared for tensors provided by other frameworks beyond NumPy, such as PyTorch or CuPy.</p>
|
||||
</section>
|
||||
<section id="Changing-operation-modes">
|
||||
<h2>Changing operation modes<a class="headerlink" href="#Changing-operation-modes" title="Permalink to this heading">#</a></h2>
|
||||
<p>By default, the CUTLASS Python interface will try to use Tensor Core operations whenever possible. If the configuration provided to <code class="docutils literal notranslate"><span class="pre">cutlass.op.Gemm</span></code> is not supported on Tensor Cores, the interface will fall back to using a SIMT kernel.</p>
|
||||
<p>The operation mode currently in use can be returned via the <code class="docutils literal notranslate"><span class="pre">plan.opclass</span></code> property. In this case Tensor Core operations.</p>
|
||||
<div class="nbinput docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[4]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="n">plan</span><span class="o">.</span><span class="n">opclass</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="nboutput nblast docutils container">
|
||||
<div class="prompt empty docutils container">
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
OpcodeClass.TensorOp
|
||||
</pre></div></div>
|
||||
</div>
|
||||
<p>Suppose that we don’t want to use Tensor Cores for this GEMM. One can change to using CUTLASS’s SIMT GEMMs by setting the plan’s <code class="docutils literal notranslate"><span class="pre">opclass</span></code> field.</p>
|
||||
<p>As is shown in the printed output, the emitted kernel uses template parameters that fit CUTLASS’s SIMT GEMMs.</p>
|
||||
<p>Also notice that, this time around, we provided tensor parameters to <code class="docutils literal notranslate"><span class="pre">plan.run()</span></code>. One is free to provide different parameters to <code class="docutils literal notranslate"><span class="pre">plan.run()</span></code> than were passed in at the initial call to <code class="docutils literal notranslate"><span class="pre">cutlass.op.Gemm</span></code>, provided that the passed-in tensors have the same data type and layout as those passed in on intialization.</p>
|
||||
<div class="nbinput docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[5]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">tensor_D_simt</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">tensor_C</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">type_D</span><span class="p">)</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">opclass</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">Simt</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">tensor_A</span><span class="p">,</span> <span class="n">tensor_B</span><span class="p">,</span> <span class="n">tensor_C</span><span class="p">,</span> <span class="n">tensor_D_simt</span><span class="p">,</span> <span class="n">alpha</span><span class="p">,</span> <span class="n">beta</span><span class="p">,</span> <span class="n">print_module</span><span class="o">=</span><span class="n">print_module</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="nboutput docutils container">
|
||||
<div class="prompt empty docutils container">
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
|
||||
// Gemm operator cutlass_sm80_simt_f16_sgemm_f16_1x1x1_128x128_8x2_tt_align1
|
||||
using cutlass_sm80_simt_f16_sgemm_f16_1x1x1_128x128_8x2_tt_align1_base =
|
||||
typename cutlass::gemm::kernel::DefaultGemmUniversal<
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 1,
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 1,
|
||||
cutlass::half_t, cutlass::layout::RowMajor,
|
||||
float,
|
||||
cutlass::arch::OpClassSimt,
|
||||
cutlass::arch::Sm80,
|
||||
cutlass::gemm::GemmShape<128, 128, 8>,
|
||||
cutlass::gemm::GemmShape<32, 64, 8>,
|
||||
cutlass::gemm::GemmShape<1, 1, 1>,
|
||||
cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 1, float, float>,
|
||||
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,
|
||||
2,
|
||||
cutlass::arch::OpMultiplyAdd
|
||||
>::GemmKernel;
|
||||
|
||||
// Define named type
|
||||
struct cutlass_sm80_simt_f16_sgemm_f16_1x1x1_128x128_8x2_tt_align1_type :
|
||||
public cutlass_sm80_simt_f16_sgemm_f16_1x1x1_128x128_8x2_tt_align1_base { };
|
||||
|
||||
</pre></div></div>
|
||||
</div>
|
||||
<div class="nboutput nblast docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[5]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
<cutlass.backend.gemm_operation.GemmArguments2x at 0x7f7b3075abe0>
|
||||
</pre></div></div>
|
||||
</div>
|
||||
<p>If we compare the output of the Tensor Core and SIMT GEMMs we just ran we see that they are equal.</p>
|
||||
<div class="nbinput nblast docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[6]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">np</span><span class="o">.</span><span class="n">testing</span><span class="o">.</span><span class="n">assert_array_equal</span><span class="p">(</span><span class="n">tensor_D</span><span class="p">,</span> <span class="n">tensor_D_simt</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="Running-cached-kernels">
|
||||
<h2>Running cached kernels<a class="headerlink" href="#Running-cached-kernels" title="Permalink to this heading">#</a></h2>
|
||||
<p>You may have noticed that the <code class="docutils literal notranslate"><span class="pre">plan.run()</span></code> calls for the previous two kernels took some time to execute. This is because the kernel being emitted had not yet been compiled.</p>
|
||||
<p>CUTLASS caches compiled binaries so that recompilation isn’t necessary every time a kernel is run. For example, if we change modes back to using Tensor Cores and call <code class="docutils literal notranslate"><span class="pre">plan.run()</span></code> again (with a different set of tensor parameters), you’ll find the call to return much faster.</p>
|
||||
<div class="nbinput docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[7]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">m</span> <span class="o">=</span> <span class="mi">2400</span>
|
||||
<span class="n">n</span> <span class="o">=</span> <span class="mi">3232</span>
|
||||
<span class="n">k</span> <span class="o">=</span> <span class="mi">4096</span>
|
||||
|
||||
<span class="n">tensor_A</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">low</span><span class="o">=</span><span class="n">scope_min</span><span class="p">,</span> <span class="n">high</span><span class="o">=</span><span class="n">scope_max</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">k</span><span class="p">))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">type_A</span><span class="p">))</span>
|
||||
<span class="n">tensor_B</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">low</span><span class="o">=</span><span class="n">scope_min</span><span class="p">,</span> <span class="n">high</span><span class="o">=</span><span class="n">scope_max</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">n</span><span class="p">))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">type_B</span><span class="p">))</span>
|
||||
<span class="n">tensor_C</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">low</span><span class="o">=</span><span class="n">scope_min</span><span class="p">,</span> <span class="n">high</span><span class="o">=</span><span class="n">scope_max</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">type_C</span><span class="p">))</span>
|
||||
<span class="n">tensor_D</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">tensor_C</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">type_D</span><span class="p">)</span>
|
||||
|
||||
<span class="n">alpha</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">float16</span><span class="p">(</span><span class="mf">1.</span><span class="p">)</span>
|
||||
<span class="n">beta</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">float16</span><span class="p">(</span><span class="mf">2.</span><span class="p">)</span>
|
||||
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">opclass</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">OpcodeClass</span><span class="o">.</span><span class="n">TensorOp</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">tensor_A</span><span class="p">,</span> <span class="n">tensor_B</span><span class="p">,</span> <span class="n">tensor_C</span><span class="p">,</span> <span class="n">tensor_D</span><span class="p">,</span> <span class="n">alpha</span><span class="p">,</span> <span class="n">beta</span><span class="p">,</span> <span class="n">print_module</span><span class="o">=</span><span class="n">print_module</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="nboutput docutils container">
|
||||
<div class="prompt empty docutils container">
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
|
||||
// Gemm operator cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_256x128_64x3_tt_align8
|
||||
using cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_256x128_64x3_tt_align8_base =
|
||||
typename cutlass::gemm::kernel::DefaultGemmUniversal<
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor,
|
||||
float,
|
||||
cutlass::arch::OpClassTensorOp,
|
||||
cutlass::arch::Sm80,
|
||||
cutlass::gemm::GemmShape<256, 128, 64>,
|
||||
cutlass::gemm::GemmShape<64, 64, 64>,
|
||||
cutlass::gemm::GemmShape<16, 8, 16>,
|
||||
cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, float, float>,
|
||||
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,
|
||||
3,
|
||||
cutlass::arch::OpMultiplyAdd
|
||||
>::GemmKernel;
|
||||
|
||||
// Define named type
|
||||
struct cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_256x128_64x3_tt_align8_type :
|
||||
public cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_256x128_64x3_tt_align8_base { };
|
||||
|
||||
</pre></div></div>
|
||||
</div>
|
||||
<div class="nboutput nblast docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[7]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
<cutlass.backend.gemm_operation.GemmArguments2x at 0x7f7b30fb9880>
|
||||
</pre></div></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="Running-non-default-GEMMs">
|
||||
<h2>Running non-default GEMMs<a class="headerlink" href="#Running-non-default-GEMMs" title="Permalink to this heading">#</a></h2>
|
||||
<p>The previous examples showed how it is simple to get started running a default GEMM kernel in CUTLASS. But, what do you do if you want a bit more control over the parameters to the GEMM?</p>
|
||||
<p>Under the hood, CUTLASS enumerates the different GEMM configuration parameters possible for this kernel from the CUTLASS profiler. The code below shows how one can access the tile descriptions for the kernels (e.g., cluster, threadblock, and warp shape).</p>
|
||||
<div class="nbinput docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[8]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">tiles</span> <span class="o">=</span> <span class="n">plan</span><span class="o">.</span><span class="n">tile_descriptions</span><span class="p">()</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'</span><span class="si">{}</span><span class="s1"> tile descriptions returned'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">tiles</span><span class="p">)))</span>
|
||||
<span class="n">num_print</span> <span class="o">=</span> <span class="mi">10</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'First </span><span class="si">{}</span><span class="s1"> tile descriptions are:'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">num_print</span><span class="p">))</span>
|
||||
<span class="k">for</span> <span class="n">td</span> <span class="ow">in</span> <span class="n">tiles</span><span class="p">[:</span><span class="n">num_print</span><span class="p">]:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">td</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="nboutput nblast docutils container">
|
||||
<div class="prompt empty docutils container">
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
132 tile descriptions returned
|
||||
First 10 tile descriptions are:
|
||||
|
||||
{
|
||||
ClusterShape: [1, 1, 1]
|
||||
ThreadblockShape: [256, 128, 64]
|
||||
WarpCount: [4, 2, 1]
|
||||
Stages: 3
|
||||
Kernel schedule: ScheduleAuto
|
||||
}
|
||||
|
||||
{
|
||||
ClusterShape: [1, 1, 1]
|
||||
ThreadblockShape: [128, 256, 64]
|
||||
WarpCount: [2, 4, 1]
|
||||
Stages: 3
|
||||
Kernel schedule: ScheduleAuto
|
||||
}
|
||||
|
||||
{
|
||||
ClusterShape: [1, 1, 1]
|
||||
ThreadblockShape: [256, 128, 64]
|
||||
WarpCount: [4, 2, 1]
|
||||
Stages: 3
|
||||
Kernel schedule: ScheduleAuto
|
||||
}
|
||||
|
||||
{
|
||||
ClusterShape: [1, 1, 1]
|
||||
ThreadblockShape: [128, 256, 64]
|
||||
WarpCount: [2, 4, 1]
|
||||
Stages: 3
|
||||
Kernel schedule: ScheduleAuto
|
||||
}
|
||||
|
||||
{
|
||||
ClusterShape: [1, 1, 1]
|
||||
ThreadblockShape: [256, 128, 32]
|
||||
WarpCount: [4, 2, 1]
|
||||
Stages: 3
|
||||
Kernel schedule: ScheduleAuto
|
||||
}
|
||||
|
||||
{
|
||||
ClusterShape: [1, 1, 1]
|
||||
ThreadblockShape: [128, 256, 32]
|
||||
WarpCount: [2, 4, 1]
|
||||
Stages: 3
|
||||
Kernel schedule: ScheduleAuto
|
||||
}
|
||||
|
||||
{
|
||||
ClusterShape: [1, 1, 1]
|
||||
ThreadblockShape: [256, 64, 64]
|
||||
WarpCount: [4, 1, 1]
|
||||
Stages: 4
|
||||
Kernel schedule: ScheduleAuto
|
||||
}
|
||||
|
||||
{
|
||||
ClusterShape: [1, 1, 1]
|
||||
ThreadblockShape: [64, 256, 64]
|
||||
WarpCount: [1, 4, 1]
|
||||
Stages: 4
|
||||
Kernel schedule: ScheduleAuto
|
||||
}
|
||||
|
||||
{
|
||||
ClusterShape: [1, 1, 1]
|
||||
ThreadblockShape: [128, 128, 64]
|
||||
WarpCount: [2, 2, 1]
|
||||
Stages: 4
|
||||
Kernel schedule: ScheduleAuto
|
||||
}
|
||||
|
||||
{
|
||||
ClusterShape: [1, 1, 1]
|
||||
ThreadblockShape: [256, 64, 64]
|
||||
WarpCount: [4, 1, 1]
|
||||
Stages: 3
|
||||
Kernel schedule: ScheduleAuto
|
||||
}
|
||||
</pre></div></div>
|
||||
</div>
|
||||
<p>Next, we’ll pick one of these configurations at random and compile and run it.</p>
|
||||
<div class="nbinput docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[9]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">idx</span> <span class="o">=</span> <span class="n">random</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">tiles</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span>
|
||||
<span class="n">td</span> <span class="o">=</span> <span class="n">tiles</span><span class="p">[</span><span class="n">idx</span><span class="p">]</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'Tile description </span><span class="si">{}</span><span class="s1"> is: </span><span class="si">{}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">idx</span><span class="p">,</span> <span class="n">td</span><span class="p">))</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">compile</span><span class="p">(</span><span class="n">td</span><span class="p">)</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">tensor_A</span><span class="p">,</span> <span class="n">tensor_B</span><span class="p">,</span> <span class="n">tensor_C</span><span class="p">,</span> <span class="n">tensor_D</span><span class="p">,</span> <span class="n">alpha</span><span class="p">,</span> <span class="n">beta</span><span class="p">,</span> <span class="n">print_module</span><span class="o">=</span><span class="n">print_module</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="nboutput docutils container">
|
||||
<div class="prompt empty docutils container">
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
Tile description 112 is:
|
||||
{
|
||||
ClusterShape: [1, 1, 1]
|
||||
ThreadblockShape: [128, 128, 32]
|
||||
WarpCount: [2, 2, 1]
|
||||
Stages: 4
|
||||
Kernel schedule: ScheduleAuto
|
||||
}
|
||||
|
||||
// Gemm operator cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_128x128_32x4_tt_align8
|
||||
using cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_128x128_32x4_tt_align8_base =
|
||||
typename cutlass::gemm::kernel::DefaultGemmUniversal<
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor,
|
||||
float,
|
||||
cutlass::arch::OpClassTensorOp,
|
||||
cutlass::arch::Sm80,
|
||||
cutlass::gemm::GemmShape<128, 128, 32>,
|
||||
cutlass::gemm::GemmShape<64, 64, 32>,
|
||||
cutlass::gemm::GemmShape<16, 8, 16>,
|
||||
cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, float, float>,
|
||||
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,
|
||||
4,
|
||||
cutlass::arch::OpMultiplyAdd
|
||||
>::GemmKernel;
|
||||
|
||||
// Define named type
|
||||
struct cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_128x128_32x4_tt_align8_type :
|
||||
public cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_128x128_32x4_tt_align8_base { };
|
||||
|
||||
</pre></div></div>
|
||||
</div>
|
||||
<div class="nboutput nblast docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[9]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
<cutlass.backend.gemm_operation.GemmArguments2x at 0x7f79cc58de20>
|
||||
</pre></div></div>
|
||||
</div>
|
||||
<p>One can also change the swizzling function used by the kernel. For example, one can modify the kernel to use the stream K feature of CUTLASS via:</p>
|
||||
<div class="nbinput docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[10]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="c1"># Stream K is only supported pre-SM90 (at least when this example was written)</span>
|
||||
<span class="k">if</span> <span class="n">plan</span><span class="o">.</span><span class="n">cc</span> <span class="o">!=</span> <span class="mi">90</span><span class="p">:</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">swizzling_functor</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">swizzle</span><span class="o">.</span><span class="n">ThreadblockSwizzleStreamK</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">tensor_A</span><span class="p">,</span> <span class="n">tensor_B</span><span class="p">,</span> <span class="n">tensor_C</span><span class="p">,</span> <span class="n">tensor_D</span><span class="p">,</span> <span class="n">alpha</span><span class="p">,</span> <span class="n">beta</span><span class="p">,</span> <span class="n">print_module</span><span class="o">=</span><span class="n">print_module</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="nboutput nblast docutils container">
|
||||
<div class="prompt empty docutils container">
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
|
||||
// Gemm operator cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_128x128_32x4_tt_align8
|
||||
using cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_128x128_32x4_tt_align8_base =
|
||||
typename cutlass::gemm::kernel::DefaultGemmUniversal<
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor,
|
||||
float,
|
||||
cutlass::arch::OpClassTensorOp,
|
||||
cutlass::arch::Sm80,
|
||||
cutlass::gemm::GemmShape<128, 128, 32>,
|
||||
cutlass::gemm::GemmShape<64, 64, 32>,
|
||||
cutlass::gemm::GemmShape<16, 8, 16>,
|
||||
cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, float, float>,
|
||||
cutlass::gemm::threadblock::ThreadblockSwizzleStreamK,
|
||||
4,
|
||||
cutlass::arch::OpMultiplyAdd
|
||||
>::GemmKernel;
|
||||
|
||||
// Define named type
|
||||
struct cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_128x128_32x4_tt_align8_type :
|
||||
public cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_128x128_32x4_tt_align8_base { };
|
||||
|
||||
</pre></div></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="Handling-errors">
|
||||
<h2>Handling errors<a class="headerlink" href="#Handling-errors" title="Permalink to this heading">#</a></h2>
|
||||
<p>The CUTLASS Python interface attempts to catch runtime and compilation errors in Python so as to provide more understandable error messages.</p>
|
||||
<p>Here’s an example in which we try to use too many stages for a given GEMM kernel. Normally, this would result in a runtime error due to the GPU having insufficient shared memory to launch the kernel with 8 stages. The CUTLASS Python interface is able to detect this issue before compiling the kernel, and reports it back to the user.</p>
|
||||
<div class="nbinput nblast docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[11]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="c1"># td = tiles[0]</span>
|
||||
<span class="c1"># td.stages = 8</span>
|
||||
<span class="c1"># plan.compile(td)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
<a class="next-page" href="../contribute.html">
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Next</span>
|
||||
</div>
|
||||
<div class="title">Contributing</div>
|
||||
</div>
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
</a>
|
||||
<a class="prev-page" href="../install.html">
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Previous</span>
|
||||
</div>
|
||||
|
||||
<div class="title">Installation</div>
|
||||
|
||||
</div>
|
||||
</a>
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer">
|
||||
|
||||
|
||||
<div class="toc-sticky toc-scroll">
|
||||
<div class="toc-title-container">
|
||||
<span class="toc-title">
|
||||
On this page
|
||||
</span>
|
||||
</div>
|
||||
<div class="toc-tree-container">
|
||||
<div class="toc-tree">
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Basic example of using the CUTLASS Python interface</a><ul>
|
||||
<li><a class="reference internal" href="#Declaring-and-running-a-GEMM">Declaring and running a GEMM</a></li>
|
||||
<li><a class="reference internal" href="#Changing-operation-modes">Changing operation modes</a></li>
|
||||
<li><a class="reference internal" href="#Running-cached-kernels">Running cached kernels</a></li>
|
||||
<li><a class="reference internal" href="#Running-non-default-GEMMs">Running non-default GEMMs</a></li>
|
||||
<li><a class="reference internal" href="#Handling-errors">Handling errors</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
|
||||
<script src="../_static/doctools.js"></script>
|
||||
<script src="../_static/sphinx_highlight.js"></script>
|
||||
<script src="../_static/scripts/furo.js"></script>
|
||||
<script src="../_static/clipboard.min.js"></script>
|
||||
<script src="../_static/copybutton.js"></script>
|
||||
<script src="../_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
<script>window.MathJax = {"tex": {"inlineMath": [["$", "$"], ["\\(", "\\)"]], "processEscapes": true}, "options": {"ignoreHtmlClass": "tex2jax_ignore|mathjax_ignore|document", "processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
|
||||
<script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
727
python/docs/externals/00_basic_gemm.ipynb
vendored
Normal file
@ -0,0 +1,727 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1ef96b3f",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Basic example of using the CUTLASS Python interface\n",
|
||||
"This notebook walks through a basic example of using the CUTLASS Python interface to declare, compile, and run GEMMs.\n",
|
||||
"\n",
|
||||
"[](https://colab.research.google.com/github/NVIDIA/cutlass/tree/master/examples/00_basic_gemm.ipynb)\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "962324fd",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We first import various packages needed for the example and construct the input and output tensors that will be used in our example.\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "0e324219",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T17:59:39.749457Z",
|
||||
"iopub.status.busy": "2023-04-18T17:59:39.748884Z",
|
||||
"iopub.status.idle": "2023-04-18T17:59:43.907956Z",
|
||||
"shell.execute_reply": "2023-04-18T17:59:43.907069Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/usr/local/lib/python3.8/dist-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
|
||||
" from .autonotebook import tqdm as notebook_tqdm\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"import random\n",
|
||||
"\n",
|
||||
"import cutlass\n",
|
||||
"\n",
|
||||
"# This controls whether ther C++ GEMM declaration will be printed at each step. Set to `false` to\n",
|
||||
"# omit this information.\n",
|
||||
"print_module = True\n",
|
||||
"\n",
|
||||
"m = 128\n",
|
||||
"n = m\n",
|
||||
"k = m\n",
|
||||
"\n",
|
||||
"dtype = np.float16\n",
|
||||
"type_A = np.float16\n",
|
||||
"type_B = np.float16\n",
|
||||
"type_C = np.float16\n",
|
||||
"type_D = np.float16\n",
|
||||
"\n",
|
||||
"np.random.seed(1234)\n",
|
||||
"random.seed(1234)\n",
|
||||
"scope_min = -4\n",
|
||||
"scope_max = 4\n",
|
||||
"tensor_A = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(m, k)).astype(type_A))\n",
|
||||
"tensor_B = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(k, n)).astype(type_B))\n",
|
||||
"tensor_C = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(m, n)).astype(type_C))\n",
|
||||
"\n",
|
||||
"alpha = np.float16(1.)\n",
|
||||
"beta = np.float16(0.)\n",
|
||||
"\n",
|
||||
"tensor_D = np.zeros(tensor_C.shape).astype(type_D)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "f2c7bf48",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Declaring and running a GEMM\n",
|
||||
"To get started, one only needs to provide the tensors declared above to the `cutlass.op.Gemm` call.\n",
|
||||
"This sets up a default GEMM operation for the given device on which you are running.\n",
|
||||
"\n",
|
||||
"Assuming that we are running on SM80, this default to using a GEMM that leverages FP16 Tensor Core operations.\n",
|
||||
"\n",
|
||||
"Calling `plan.run()` will generate the CUTLASS C++ kernel in question, compile it, and run it on the tensors we previously passed in. By setting `print_module` to `true`, the C++ code that is emitted is printed."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "0dfd8975",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T17:59:43.911740Z",
|
||||
"iopub.status.busy": "2023-04-18T17:59:43.911512Z",
|
||||
"iopub.status.idle": "2023-04-18T17:59:49.103941Z",
|
||||
"shell.execute_reply": "2023-04-18T17:59:49.103231Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"// Gemm operator cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_256x128_64x3_tt_align8\n",
|
||||
"using cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_256x128_64x3_tt_align8_base =\n",
|
||||
" typename cutlass::gemm::kernel::DefaultGemmUniversal<\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor,\n",
|
||||
" float,\n",
|
||||
" cutlass::arch::OpClassTensorOp,\n",
|
||||
" cutlass::arch::Sm80,\n",
|
||||
" cutlass::gemm::GemmShape<256, 128, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<64, 64, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<16, 8, 16>,\n",
|
||||
" cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, float, float>,\n",
|
||||
" cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,\n",
|
||||
" 3,\n",
|
||||
" cutlass::arch::OpMultiplyAdd\n",
|
||||
">::GemmKernel;\n",
|
||||
"\n",
|
||||
"// Define named type\n",
|
||||
"struct cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_256x128_64x3_tt_align8_type : \n",
|
||||
" public cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_256x128_64x3_tt_align8_base { };\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"<cutlass.backend.gemm_operation.GemmArguments2x at 0x7f79cc556070>"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# We specify `element_accumulator` here so as to match the kernel run by NumPy below. However,\n",
|
||||
"# specifying `element_accumulator` is not required if it is the same as `element`\n",
|
||||
"plan = cutlass.Gemm(element=dtype, layout=cutlass.LayoutType.RowMajor, element_accumulator=np.float32)\n",
|
||||
"plan.run(tensor_A, tensor_B, tensor_C, tensor_D, print_module=print_module)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "4a5856de",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"There are many other ways to construct a plan from `cutlass.op.Gemm` (e.g., by specifiying they types and layouts of each operand, by providing representative tensors as inputs). For more details on these, see the documentation in the `cutlass.op.Gemm` constructor."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "945478ef",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We then compare the output to running the GEMM using NumPy."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "6b669de6",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T17:59:49.107492Z",
|
||||
"iopub.status.busy": "2023-04-18T17:59:49.107284Z",
|
||||
"iopub.status.idle": "2023-04-18T17:59:49.138511Z",
|
||||
"shell.execute_reply": "2023-04-18T17:59:49.137837Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tensor_D_numpy = (alpha * (tensor_A @ tensor_B)) + (beta * tensor_C)\n",
|
||||
"np.testing.assert_array_equal(tensor_D, tensor_D_numpy)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "ee5cbbbe",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note that one could use the same kernel just declared for tensors provided by other frameworks beyond NumPy, such as PyTorch or CuPy."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "b6c86493",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Changing operation modes\n",
|
||||
"By default, the CUTLASS Python interface will try to use Tensor Core operations whenever possible. If the configuration provided to `cutlass.op.Gemm` is not supported on Tensor Cores, the interface will fall back to using a SIMT kernel.\n",
|
||||
"\n",
|
||||
"The operation mode currently in use can be returned via the `plan.opclass` property. In this case Tensor Core operations."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "529fda93",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T17:59:49.141458Z",
|
||||
"iopub.status.busy": "2023-04-18T17:59:49.141305Z",
|
||||
"iopub.status.idle": "2023-04-18T17:59:49.145005Z",
|
||||
"shell.execute_reply": "2023-04-18T17:59:49.144332Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"OpcodeClass.TensorOp\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"print(plan.opclass)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "6d27c575",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Suppose that we don't want to use Tensor Cores for this GEMM. One can change to using CUTLASS's SIMT GEMMs by setting the plan's `opclass` field.\n",
|
||||
"\n",
|
||||
"As is shown in the printed output, the emitted kernel uses template parameters that fit CUTLASS's SIMT GEMMs.\n",
|
||||
"\n",
|
||||
"Also notice that, this time around, we provided tensor parameters to `plan.run()`. One is free to provide different parameters to `plan.run()` than were passed in at the initial call to `cutlass.op.Gemm`, provided that the passed-in tensors have the same data type and layout as those passed in on intialization."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "6a44d35b",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T17:59:49.148548Z",
|
||||
"iopub.status.busy": "2023-04-18T17:59:49.148042Z",
|
||||
"iopub.status.idle": "2023-04-18T17:59:54.365792Z",
|
||||
"shell.execute_reply": "2023-04-18T17:59:54.364734Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"// Gemm operator cutlass_sm80_simt_f16_sgemm_f16_1x1x1_128x128_8x2_tt_align1\n",
|
||||
"using cutlass_sm80_simt_f16_sgemm_f16_1x1x1_128x128_8x2_tt_align1_base =\n",
|
||||
" typename cutlass::gemm::kernel::DefaultGemmUniversal<\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 1,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 1,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor,\n",
|
||||
" float,\n",
|
||||
" cutlass::arch::OpClassSimt,\n",
|
||||
" cutlass::arch::Sm80,\n",
|
||||
" cutlass::gemm::GemmShape<128, 128, 8>,\n",
|
||||
" cutlass::gemm::GemmShape<32, 64, 8>,\n",
|
||||
" cutlass::gemm::GemmShape<1, 1, 1>,\n",
|
||||
" cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 1, float, float>,\n",
|
||||
" cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,\n",
|
||||
" 2,\n",
|
||||
" cutlass::arch::OpMultiplyAdd\n",
|
||||
">::GemmKernel;\n",
|
||||
"\n",
|
||||
"// Define named type\n",
|
||||
"struct cutlass_sm80_simt_f16_sgemm_f16_1x1x1_128x128_8x2_tt_align1_type : \n",
|
||||
" public cutlass_sm80_simt_f16_sgemm_f16_1x1x1_128x128_8x2_tt_align1_base { };\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"<cutlass.backend.gemm_operation.GemmArguments2x at 0x7f7b3075abe0>"
|
||||
]
|
||||
},
|
||||
"execution_count": 5,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tensor_D_simt = np.zeros(tensor_C.shape).astype(type_D)\n",
|
||||
"plan.opclass = cutlass.OpcodeClass.Simt\n",
|
||||
"plan.run(tensor_A, tensor_B, tensor_C, tensor_D_simt, alpha, beta, print_module=print_module)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "639dcb59",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"If we compare the output of the Tensor Core and SIMT GEMMs we just ran we see that they are equal."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "9b480853",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T17:59:54.369977Z",
|
||||
"iopub.status.busy": "2023-04-18T17:59:54.369302Z",
|
||||
"iopub.status.idle": "2023-04-18T17:59:54.375239Z",
|
||||
"shell.execute_reply": "2023-04-18T17:59:54.374405Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"np.testing.assert_array_equal(tensor_D, tensor_D_simt)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0cce1eae",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Running cached kernels\n",
|
||||
"You may have noticed that the `plan.run()` calls for the previous two kernels took some time to execute. This is because the kernel being emitted had not yet been compiled.\n",
|
||||
"\n",
|
||||
"CUTLASS caches compiled binaries so that recompilation isn't necessary every time a kernel is run. For example, if we change modes back to using Tensor Cores and call `plan.run()` again (with a different set of tensor parameters), you'll find the call to return much faster."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"id": "f8051e5e",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T17:59:54.378373Z",
|
||||
"iopub.status.busy": "2023-04-18T17:59:54.378060Z",
|
||||
"iopub.status.idle": "2023-04-18T17:59:55.220086Z",
|
||||
"shell.execute_reply": "2023-04-18T17:59:55.219198Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"// Gemm operator cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_256x128_64x3_tt_align8\n",
|
||||
"using cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_256x128_64x3_tt_align8_base =\n",
|
||||
" typename cutlass::gemm::kernel::DefaultGemmUniversal<\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor,\n",
|
||||
" float,\n",
|
||||
" cutlass::arch::OpClassTensorOp,\n",
|
||||
" cutlass::arch::Sm80,\n",
|
||||
" cutlass::gemm::GemmShape<256, 128, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<64, 64, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<16, 8, 16>,\n",
|
||||
" cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, float, float>,\n",
|
||||
" cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,\n",
|
||||
" 3,\n",
|
||||
" cutlass::arch::OpMultiplyAdd\n",
|
||||
">::GemmKernel;\n",
|
||||
"\n",
|
||||
"// Define named type\n",
|
||||
"struct cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_256x128_64x3_tt_align8_type : \n",
|
||||
" public cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_256x128_64x3_tt_align8_base { };\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"<cutlass.backend.gemm_operation.GemmArguments2x at 0x7f7b30fb9880>"
|
||||
]
|
||||
},
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"m = 2400\n",
|
||||
"n = 3232\n",
|
||||
"k = 4096\n",
|
||||
"\n",
|
||||
"tensor_A = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(m, k)).astype(type_A))\n",
|
||||
"tensor_B = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(k, n)).astype(type_B))\n",
|
||||
"tensor_C = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(m, n)).astype(type_C))\n",
|
||||
"tensor_D = np.zeros(tensor_C.shape).astype(type_D)\n",
|
||||
"\n",
|
||||
"alpha = np.float16(1.)\n",
|
||||
"beta = np.float16(2.)\n",
|
||||
"\n",
|
||||
"plan.opclass = cutlass.OpcodeClass.TensorOp\n",
|
||||
"plan.run(tensor_A, tensor_B, tensor_C, tensor_D, alpha, beta, print_module=print_module)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "52a4e318",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Running non-default GEMMs\n",
|
||||
"The previous examples showed how it is simple to get started running a default GEMM kernel in CUTLASS. But, what do you do if you want a bit more control over the parameters to the GEMM?\n",
|
||||
"\n",
|
||||
"Under the hood, CUTLASS enumerates the different GEMM configuration parameters possible for this kernel from the CUTLASS profiler. The code below shows how one can access the tile descriptions for the kernels (e.g., cluster, threadblock, and warp shape)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 8,
|
||||
"id": "1c593be1",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T17:59:55.223812Z",
|
||||
"iopub.status.busy": "2023-04-18T17:59:55.223651Z",
|
||||
"iopub.status.idle": "2023-04-18T17:59:55.228769Z",
|
||||
"shell.execute_reply": "2023-04-18T17:59:55.228101Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"132 tile descriptions returned\n",
|
||||
"First 10 tile descriptions are:\n",
|
||||
"\n",
|
||||
"{\n",
|
||||
" ClusterShape: [1, 1, 1]\n",
|
||||
" ThreadblockShape: [256, 128, 64]\n",
|
||||
" WarpCount: [4, 2, 1]\n",
|
||||
" Stages: 3\n",
|
||||
" Kernel schedule: ScheduleAuto\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"{\n",
|
||||
" ClusterShape: [1, 1, 1]\n",
|
||||
" ThreadblockShape: [128, 256, 64]\n",
|
||||
" WarpCount: [2, 4, 1]\n",
|
||||
" Stages: 3\n",
|
||||
" Kernel schedule: ScheduleAuto\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"{\n",
|
||||
" ClusterShape: [1, 1, 1]\n",
|
||||
" ThreadblockShape: [256, 128, 64]\n",
|
||||
" WarpCount: [4, 2, 1]\n",
|
||||
" Stages: 3\n",
|
||||
" Kernel schedule: ScheduleAuto\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"{\n",
|
||||
" ClusterShape: [1, 1, 1]\n",
|
||||
" ThreadblockShape: [128, 256, 64]\n",
|
||||
" WarpCount: [2, 4, 1]\n",
|
||||
" Stages: 3\n",
|
||||
" Kernel schedule: ScheduleAuto\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"{\n",
|
||||
" ClusterShape: [1, 1, 1]\n",
|
||||
" ThreadblockShape: [256, 128, 32]\n",
|
||||
" WarpCount: [4, 2, 1]\n",
|
||||
" Stages: 3\n",
|
||||
" Kernel schedule: ScheduleAuto\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"{\n",
|
||||
" ClusterShape: [1, 1, 1]\n",
|
||||
" ThreadblockShape: [128, 256, 32]\n",
|
||||
" WarpCount: [2, 4, 1]\n",
|
||||
" Stages: 3\n",
|
||||
" Kernel schedule: ScheduleAuto\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"{\n",
|
||||
" ClusterShape: [1, 1, 1]\n",
|
||||
" ThreadblockShape: [256, 64, 64]\n",
|
||||
" WarpCount: [4, 1, 1]\n",
|
||||
" Stages: 4\n",
|
||||
" Kernel schedule: ScheduleAuto\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"{\n",
|
||||
" ClusterShape: [1, 1, 1]\n",
|
||||
" ThreadblockShape: [64, 256, 64]\n",
|
||||
" WarpCount: [1, 4, 1]\n",
|
||||
" Stages: 4\n",
|
||||
" Kernel schedule: ScheduleAuto\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"{\n",
|
||||
" ClusterShape: [1, 1, 1]\n",
|
||||
" ThreadblockShape: [128, 128, 64]\n",
|
||||
" WarpCount: [2, 2, 1]\n",
|
||||
" Stages: 4\n",
|
||||
" Kernel schedule: ScheduleAuto\n",
|
||||
"}\n",
|
||||
"\n",
|
||||
"{\n",
|
||||
" ClusterShape: [1, 1, 1]\n",
|
||||
" ThreadblockShape: [256, 64, 64]\n",
|
||||
" WarpCount: [4, 1, 1]\n",
|
||||
" Stages: 3\n",
|
||||
" Kernel schedule: ScheduleAuto\n",
|
||||
"}\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tiles = plan.tile_descriptions()\n",
|
||||
"print('{} tile descriptions returned'.format(len(tiles)))\n",
|
||||
"num_print = 10\n",
|
||||
"print('First {} tile descriptions are:'.format(num_print))\n",
|
||||
"for td in tiles[:num_print]:\n",
|
||||
" print(td)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "dc3ad875",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Next, we'll pick one of these configurations at random and compile and run it."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 9,
|
||||
"id": "a8dc5287",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T17:59:55.231498Z",
|
||||
"iopub.status.busy": "2023-04-18T17:59:55.230924Z",
|
||||
"iopub.status.idle": "2023-04-18T18:00:00.340161Z",
|
||||
"shell.execute_reply": "2023-04-18T18:00:00.339603Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Tile description 112 is: \n",
|
||||
"{\n",
|
||||
" ClusterShape: [1, 1, 1]\n",
|
||||
" ThreadblockShape: [128, 128, 32]\n",
|
||||
" WarpCount: [2, 2, 1]\n",
|
||||
" Stages: 4\n",
|
||||
" Kernel schedule: ScheduleAuto\n",
|
||||
"}\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"// Gemm operator cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_128x128_32x4_tt_align8\n",
|
||||
"using cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_128x128_32x4_tt_align8_base =\n",
|
||||
" typename cutlass::gemm::kernel::DefaultGemmUniversal<\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor,\n",
|
||||
" float,\n",
|
||||
" cutlass::arch::OpClassTensorOp,\n",
|
||||
" cutlass::arch::Sm80,\n",
|
||||
" cutlass::gemm::GemmShape<128, 128, 32>,\n",
|
||||
" cutlass::gemm::GemmShape<64, 64, 32>,\n",
|
||||
" cutlass::gemm::GemmShape<16, 8, 16>,\n",
|
||||
" cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, float, float>,\n",
|
||||
" cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,\n",
|
||||
" 4,\n",
|
||||
" cutlass::arch::OpMultiplyAdd\n",
|
||||
">::GemmKernel;\n",
|
||||
"\n",
|
||||
"// Define named type\n",
|
||||
"struct cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_128x128_32x4_tt_align8_type : \n",
|
||||
" public cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_128x128_32x4_tt_align8_base { };\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"<cutlass.backend.gemm_operation.GemmArguments2x at 0x7f79cc58de20>"
|
||||
]
|
||||
},
|
||||
"execution_count": 9,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"idx = random.randint(0, len(tiles)-1)\n",
|
||||
"td = tiles[idx]\n",
|
||||
"print('Tile description {} is: {}'.format(idx, td))\n",
|
||||
"plan.compile(td)\n",
|
||||
"plan.run(tensor_A, tensor_B, tensor_C, tensor_D, alpha, beta, print_module=print_module)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c5a8b534",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"One can also change the swizzling function used by the kernel. For example, one can modify the kernel to use the stream K feature of CUTLASS via:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"id": "e5e88d17",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T18:00:00.343772Z",
|
||||
"iopub.status.busy": "2023-04-18T18:00:00.343582Z",
|
||||
"iopub.status.idle": "2023-04-18T18:00:06.192256Z",
|
||||
"shell.execute_reply": "2023-04-18T18:00:06.191286Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"// Gemm operator cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_128x128_32x4_tt_align8\n",
|
||||
"using cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_128x128_32x4_tt_align8_base =\n",
|
||||
" typename cutlass::gemm::kernel::DefaultGemmUniversal<\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor,\n",
|
||||
" float,\n",
|
||||
" cutlass::arch::OpClassTensorOp,\n",
|
||||
" cutlass::arch::Sm80,\n",
|
||||
" cutlass::gemm::GemmShape<128, 128, 32>,\n",
|
||||
" cutlass::gemm::GemmShape<64, 64, 32>,\n",
|
||||
" cutlass::gemm::GemmShape<16, 8, 16>,\n",
|
||||
" cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, float, float>,\n",
|
||||
" cutlass::gemm::threadblock::ThreadblockSwizzleStreamK,\n",
|
||||
" 4,\n",
|
||||
" cutlass::arch::OpMultiplyAdd\n",
|
||||
">::GemmKernel;\n",
|
||||
"\n",
|
||||
"// Define named type\n",
|
||||
"struct cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_128x128_32x4_tt_align8_type : \n",
|
||||
" public cutlass_sm80_tensorop_f16_s16x8x16gemm_f16_1x1x1_128x128_32x4_tt_align8_base { };\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"# Stream K is only supported pre-SM90 (at least when this example was written)\n",
|
||||
"if plan.cc != 90:\n",
|
||||
" plan.swizzling_functor = cutlass.swizzle.ThreadblockSwizzleStreamK\n",
|
||||
" plan.run(tensor_A, tensor_B, tensor_C, tensor_D, alpha, beta, print_module=print_module)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "5a8ba2ba",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Handling errors\n",
|
||||
"The CUTLASS Python interface attempts to catch runtime and compilation errors in Python so as to provide more understandable error messages.\n",
|
||||
"\n",
|
||||
"Here's an example in which we try to use too many stages for a given GEMM kernel. Normally, this would result in a runtime error due to the GPU having insufficient shared memory to launch the kernel with 8 stages. The CUTLASS Python interface is able to detect this issue before compiling the kernel, and reports it back to the user."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"id": "fe7d0e42",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T18:00:06.196345Z",
|
||||
"iopub.status.busy": "2023-04-18T18:00:06.195784Z",
|
||||
"iopub.status.idle": "2023-04-18T18:00:06.199248Z",
|
||||
"shell.execute_reply": "2023-04-18T18:00:06.198438Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"# td = tiles[0]\n",
|
||||
"# td.stages = 8\n",
|
||||
"# plan.compile(td)"
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.10"
|
||||
},
|
||||
"vscode": {
|
||||
"interpreter": {
|
||||
"hash": "0466d96796c9cd8f7a1cad264ff326ececc950ba2420e0256d5105fc1a3c6e70"
|
||||
}
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
764
python/docs/externals/01_epilogue.html
vendored
Normal file
@ -0,0 +1,764 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
<link rel="index" title="Index" href="../genindex.html" /><link rel="search" title="Search" href="../search.html" /><link rel="next" title="Exporting a CUTLASS grouped GEMM kernel to a PyTorch CUDA extension" href="02_pytorch_extension_grouped_gemm.html" /><link rel="prev" title="Examples" href="../examples.html" />
|
||||
<link rel="canonical" href="docs/externals/01_epilogue.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>Example of using elementwise activation functions in the CUTLASS Python interface - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/nbsphinx-code-cells.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="../index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="../index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="../_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="../_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="../search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="../cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul class="current">
|
||||
<li class="toctree-l1 current has-children"><a class="reference internal" href="../examples.html">Examples</a><input checked="" class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul class="current">
|
||||
<li class="toctree-l2"><a class="reference internal" href="00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2 current current-page"><a class="current reference internal" href="#">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<section id="Example-of-using-elementwise-activation-functions-in-the-CUTLASS-Python-interface">
|
||||
<h1>Example of using elementwise activation functions in the CUTLASS Python interface<a class="headerlink" href="#Example-of-using-elementwise-activation-functions-in-the-CUTLASS-Python-interface" title="Permalink to this heading">#</a></h1>
|
||||
<p>This notebook walks through a basic example of using the CUTLASS Python interface to declare, compile, and run GEMMs with different epilogues.</p>
|
||||
<p><a class="reference external" href="https://colab.research.google.com/github/NVIDIA/cutlass/tree/master/examples/00_basic_gemm.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a></p>
|
||||
<p>We first import various packages needed for the example and construct the input and output tensors that will be used in our example.</p>
|
||||
<div class="nbinput docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[1]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
|
||||
<span class="kn">import</span> <span class="nn">cutlass</span>
|
||||
|
||||
<span class="c1"># This controls whether ther C++ GEMM declaration will be printed at each step. Set to `false` to</span>
|
||||
<span class="c1"># omit this information.</span>
|
||||
<span class="n">print_module</span> <span class="o">=</span> <span class="kc">True</span>
|
||||
|
||||
<span class="n">m</span> <span class="o">=</span> <span class="mi">256</span>
|
||||
<span class="n">n</span> <span class="o">=</span> <span class="n">m</span>
|
||||
<span class="n">k</span> <span class="o">=</span> <span class="n">m</span>
|
||||
|
||||
<span class="n">type_A</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">float16</span>
|
||||
<span class="n">type_B</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">float16</span>
|
||||
<span class="n">type_C</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">float16</span>
|
||||
<span class="n">type_D</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">float16</span>
|
||||
|
||||
<span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">1234</span><span class="p">)</span>
|
||||
<span class="n">scope_min</span> <span class="o">=</span> <span class="o">-</span><span class="mi">4</span>
|
||||
<span class="n">scope_max</span> <span class="o">=</span> <span class="mi">4</span>
|
||||
<span class="n">tensor_A</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">low</span><span class="o">=</span><span class="n">scope_min</span><span class="p">,</span> <span class="n">high</span><span class="o">=</span><span class="n">scope_max</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">k</span><span class="p">))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">type_A</span><span class="p">))</span>
|
||||
<span class="n">tensor_B</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">low</span><span class="o">=</span><span class="n">scope_min</span><span class="p">,</span> <span class="n">high</span><span class="o">=</span><span class="n">scope_max</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">k</span><span class="p">,</span> <span class="n">n</span><span class="p">))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">type_B</span><span class="p">))</span>
|
||||
<span class="n">tensor_C</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">random</span><span class="o">.</span><span class="n">uniform</span><span class="p">(</span><span class="n">low</span><span class="o">=</span><span class="n">scope_min</span><span class="p">,</span> <span class="n">high</span><span class="o">=</span><span class="n">scope_max</span><span class="p">,</span> <span class="n">size</span><span class="o">=</span><span class="p">(</span><span class="n">m</span><span class="p">,</span> <span class="n">n</span><span class="p">))</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">type_C</span><span class="p">))</span>
|
||||
|
||||
<span class="n">alpha</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">float16</span><span class="p">(</span><span class="mf">1.</span><span class="p">)</span>
|
||||
<span class="n">beta</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">float16</span><span class="p">(</span><span class="mf">0.</span><span class="p">)</span>
|
||||
|
||||
<span class="n">tensor_D</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">tensor_C</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">type_D</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="nboutput nblast docutils container">
|
||||
<div class="prompt empty docutils container">
|
||||
</div>
|
||||
<div class="output_area stderr docutils container">
|
||||
<div class="highlight"><pre>
|
||||
/usr/local/lib/python3.8/dist-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
|
||||
from .autonotebook import tqdm as notebook_tqdm
|
||||
</pre></div></div>
|
||||
</div>
|
||||
<section id="Run-a-GEMM-with-an-identity-activation-function">
|
||||
<h2>Run a GEMM with an identity activation function<a class="headerlink" href="#Run-a-GEMM-with-an-identity-activation-function" title="Permalink to this heading">#</a></h2>
|
||||
<p>To begin, we simply run a default GEMM with an identity activation function. This performs the well-known operation <code class="docutils literal notranslate"><span class="pre">D</span> <span class="pre">=</span> <span class="pre">alpha</span> <span class="pre">*</span> <span class="pre">(A</span> <span class="pre">@</span> <span class="pre">B)</span> <span class="pre">+</span> <span class="pre">beta</span> <span class="pre">*</span> <span class="pre">C</span></code>. This is the default activation function used, and does not need to be specified.</p>
|
||||
<div class="nbinput docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[2]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">plan</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">Gemm</span><span class="p">(</span><span class="n">element</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">float16</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">)</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">tensor_A</span><span class="p">,</span> <span class="n">tensor_B</span><span class="p">,</span> <span class="n">tensor_C</span><span class="p">,</span> <span class="n">tensor_D</span><span class="p">,</span> <span class="n">print_module</span><span class="o">=</span><span class="n">print_module</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="nboutput docutils container">
|
||||
<div class="prompt empty docutils container">
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
|
||||
// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8
|
||||
using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =
|
||||
typename cutlass::gemm::kernel::DefaultGemmUniversal<
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor,
|
||||
cutlass::half_t,
|
||||
cutlass::arch::OpClassTensorOp,
|
||||
cutlass::arch::Sm80,
|
||||
cutlass::gemm::GemmShape<256, 128, 64>,
|
||||
cutlass::gemm::GemmShape<64, 64, 64>,
|
||||
cutlass::gemm::GemmShape<16, 8, 16>,
|
||||
cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,
|
||||
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,
|
||||
3,
|
||||
cutlass::arch::OpMultiplyAdd
|
||||
>::GemmKernel;
|
||||
|
||||
// Define named type
|
||||
struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type :
|
||||
public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };
|
||||
|
||||
</pre></div></div>
|
||||
</div>
|
||||
<div class="nboutput nblast docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[2]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
<cutlass.backend.gemm_operation.GemmArguments2x at 0x7fed907287c0>
|
||||
</pre></div></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="Run-a-GEMM-with-a-ReLU-element-wise-activation-function">
|
||||
<h2>Run a GEMM with a ReLU element-wise activation function<a class="headerlink" href="#Run-a-GEMM-with-a-ReLU-element-wise-activation-function" title="Permalink to this heading">#</a></h2>
|
||||
<p>CUTLASS makes it easy to support other element-wise activation functions. This results in performing an element-wise after the generic linear combination performed in a GEMM. If we call such an activation function <code class="docutils literal notranslate"><span class="pre">act</span></code>, the resulting formulation is:</p>
|
||||
<div class="highlight-none notranslate"><div class="highlight"><pre><span></span>D = alpha * (A @ B) + beta * C
|
||||
D = act(D)
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Here, we will add a ReLU activation function. Given an input <code class="docutils literal notranslate"><span class="pre">x</span></code>, ReLU returns <code class="docutils literal notranslate"><span class="pre">max(x,</span> <span class="pre">0)</span></code>.</p>
|
||||
<p>This is easy to do in CUTLASS. One only needs to set the plan’s <code class="docutils literal notranslate"><span class="pre">activation</span></code> field.</p>
|
||||
<div class="nbinput docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[3]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">tensor_D_relu</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">tensor_C</span><span class="o">.</span><span class="n">shape</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">type_D</span><span class="p">)</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">activation</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">epilogue</span><span class="o">.</span><span class="n">relu</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">tensor_A</span><span class="p">,</span> <span class="n">tensor_B</span><span class="p">,</span> <span class="n">tensor_C</span><span class="p">,</span> <span class="n">tensor_D_relu</span><span class="p">,</span> <span class="n">print_module</span><span class="o">=</span><span class="n">print_module</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="nboutput docutils container">
|
||||
<div class="prompt empty docutils container">
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
|
||||
// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8
|
||||
using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =
|
||||
typename cutlass::gemm::kernel::DefaultGemmUniversal<
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor,
|
||||
cutlass::half_t,
|
||||
cutlass::arch::OpClassTensorOp,
|
||||
cutlass::arch::Sm80,
|
||||
cutlass::gemm::GemmShape<256, 128, 64>,
|
||||
cutlass::gemm::GemmShape<64, 64, 64>,
|
||||
cutlass::gemm::GemmShape<16, 8, 16>,
|
||||
cutlass::epilogue::thread::LinearCombinationGeneric<cutlass::epilogue::thread::ReLu, cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,
|
||||
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,
|
||||
3,
|
||||
cutlass::arch::OpMultiplyAdd
|
||||
>::GemmKernel;
|
||||
|
||||
// Define named type
|
||||
struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type :
|
||||
public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };
|
||||
|
||||
</pre></div></div>
|
||||
</div>
|
||||
<div class="nboutput nblast docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[3]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
<cutlass.backend.gemm_operation.GemmArguments2x at 0x7fed906f2460>
|
||||
</pre></div></div>
|
||||
</div>
|
||||
<p>We can now verify that the result of the GEMM that used a ReLU activation function:</p>
|
||||
<div class="nbinput nblast docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[4]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">relu_ref</span> <span class="o">=</span> <span class="p">(</span><span class="n">tensor_D</span> <span class="o">>=</span> <span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">astype</span><span class="p">(</span><span class="n">type_D</span><span class="p">)</span> <span class="o">*</span> <span class="n">tensor_D</span>
|
||||
<span class="n">np</span><span class="o">.</span><span class="n">testing</span><span class="o">.</span><span class="n">assert_array_equal</span><span class="p">(</span><span class="n">relu_ref</span><span class="p">,</span> <span class="n">tensor_D_relu</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="Other-element-wise-activation-functions">
|
||||
<h2>Other element-wise activation functions<a class="headerlink" href="#Other-element-wise-activation-functions" title="Permalink to this heading">#</a></h2>
|
||||
<p>CUTLASS supports a variety of widely-used element-wise activation functions. We can obtain a list of these functions via the <code class="docutils literal notranslate"><span class="pre">get_activations()</span></code> method.</p>
|
||||
<div class="nbinput docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[5]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">activations</span> <span class="o">=</span> <span class="n">plan</span><span class="o">.</span><span class="n">activations</span><span class="p">()</span>
|
||||
<span class="k">for</span> <span class="n">activation</span> <span class="ow">in</span> <span class="n">activations</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="n">activation</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="nboutput nblast docutils container">
|
||||
<div class="prompt empty docutils container">
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
<class 'cutlass.backend.epilogue.gelu'>
|
||||
<class 'cutlass.backend.epilogue.hardswish'>
|
||||
<class 'cutlass.backend.epilogue.identity'>
|
||||
<class 'cutlass.backend.epilogue.leaky_relu'>
|
||||
<class 'cutlass.backend.epilogue.relu'>
|
||||
<class 'cutlass.backend.epilogue.sigmoid'>
|
||||
<class 'cutlass.backend.epilogue.silu'>
|
||||
<class 'cutlass.backend.epilogue.tanh'>
|
||||
</pre></div></div>
|
||||
</div>
|
||||
<p>We can then run each of them:</p>
|
||||
<div class="nbinput docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[6]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="k">for</span> <span class="n">activation</span> <span class="ow">in</span> <span class="n">activations</span><span class="p">:</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'============================================================================================='</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s1">'Compiling and running activation </span><span class="si">{</span><span class="n">activation</span><span class="si">}</span><span class="s1">'</span><span class="p">)</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'============================================================================================='</span><span class="p">)</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">activation</span> <span class="o">=</span> <span class="n">activation</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">tensor_A</span><span class="p">,</span> <span class="n">tensor_B</span><span class="p">,</span> <span class="n">tensor_C</span><span class="p">,</span> <span class="n">tensor_D</span><span class="p">,</span> <span class="n">print_module</span><span class="o">=</span><span class="n">print_module</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="nboutput nblast docutils container">
|
||||
<div class="prompt empty docutils container">
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
=============================================================================================
|
||||
Compiling and running activation <class 'cutlass.backend.epilogue.gelu'>
|
||||
=============================================================================================
|
||||
|
||||
// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8
|
||||
using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =
|
||||
typename cutlass::gemm::kernel::DefaultGemmUniversal<
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor,
|
||||
cutlass::half_t,
|
||||
cutlass::arch::OpClassTensorOp,
|
||||
cutlass::arch::Sm80,
|
||||
cutlass::gemm::GemmShape<256, 128, 64>,
|
||||
cutlass::gemm::GemmShape<64, 64, 64>,
|
||||
cutlass::gemm::GemmShape<16, 8, 16>,
|
||||
cutlass::epilogue::thread::LinearCombinationGeneric<cutlass::epilogue::thread::GELU, cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,
|
||||
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,
|
||||
3,
|
||||
cutlass::arch::OpMultiplyAdd
|
||||
>::GemmKernel;
|
||||
|
||||
// Define named type
|
||||
struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type :
|
||||
public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };
|
||||
|
||||
=============================================================================================
|
||||
Compiling and running activation <class 'cutlass.backend.epilogue.hardswish'>
|
||||
=============================================================================================
|
||||
|
||||
// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8
|
||||
using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =
|
||||
typename cutlass::gemm::kernel::DefaultGemmUniversal<
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor,
|
||||
cutlass::half_t,
|
||||
cutlass::arch::OpClassTensorOp,
|
||||
cutlass::arch::Sm80,
|
||||
cutlass::gemm::GemmShape<256, 128, 64>,
|
||||
cutlass::gemm::GemmShape<64, 64, 64>,
|
||||
cutlass::gemm::GemmShape<16, 8, 16>,
|
||||
cutlass::epilogue::thread::LinearCombinationGeneric<cutlass::epilogue::thread::HardSwish, cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,
|
||||
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,
|
||||
3,
|
||||
cutlass::arch::OpMultiplyAdd
|
||||
>::GemmKernel;
|
||||
|
||||
// Define named type
|
||||
struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type :
|
||||
public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };
|
||||
|
||||
=============================================================================================
|
||||
Compiling and running activation <class 'cutlass.backend.epilogue.identity'>
|
||||
=============================================================================================
|
||||
|
||||
// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8
|
||||
using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =
|
||||
typename cutlass::gemm::kernel::DefaultGemmUniversal<
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor,
|
||||
cutlass::half_t,
|
||||
cutlass::arch::OpClassTensorOp,
|
||||
cutlass::arch::Sm80,
|
||||
cutlass::gemm::GemmShape<256, 128, 64>,
|
||||
cutlass::gemm::GemmShape<64, 64, 64>,
|
||||
cutlass::gemm::GemmShape<16, 8, 16>,
|
||||
cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,
|
||||
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,
|
||||
3,
|
||||
cutlass::arch::OpMultiplyAdd
|
||||
>::GemmKernel;
|
||||
|
||||
// Define named type
|
||||
struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type :
|
||||
public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };
|
||||
|
||||
=============================================================================================
|
||||
Compiling and running activation <class 'cutlass.backend.epilogue.leaky_relu'>
|
||||
=============================================================================================
|
||||
|
||||
// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8
|
||||
using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =
|
||||
typename cutlass::gemm::kernel::DefaultGemmUniversal<
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor,
|
||||
cutlass::half_t,
|
||||
cutlass::arch::OpClassTensorOp,
|
||||
cutlass::arch::Sm80,
|
||||
cutlass::gemm::GemmShape<256, 128, 64>,
|
||||
cutlass::gemm::GemmShape<64, 64, 64>,
|
||||
cutlass::gemm::GemmShape<16, 8, 16>,
|
||||
cutlass::epilogue::thread::LinearCombinationGeneric<cutlass::epilogue::thread::LeakyReLU, cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,
|
||||
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,
|
||||
3,
|
||||
cutlass::arch::OpMultiplyAdd
|
||||
>::GemmKernel;
|
||||
|
||||
// Define named type
|
||||
struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type :
|
||||
public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };
|
||||
|
||||
=============================================================================================
|
||||
Compiling and running activation <class 'cutlass.backend.epilogue.relu'>
|
||||
=============================================================================================
|
||||
|
||||
// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8
|
||||
using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =
|
||||
typename cutlass::gemm::kernel::DefaultGemmUniversal<
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor,
|
||||
cutlass::half_t,
|
||||
cutlass::arch::OpClassTensorOp,
|
||||
cutlass::arch::Sm80,
|
||||
cutlass::gemm::GemmShape<256, 128, 64>,
|
||||
cutlass::gemm::GemmShape<64, 64, 64>,
|
||||
cutlass::gemm::GemmShape<16, 8, 16>,
|
||||
cutlass::epilogue::thread::LinearCombinationGeneric<cutlass::epilogue::thread::ReLu, cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,
|
||||
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,
|
||||
3,
|
||||
cutlass::arch::OpMultiplyAdd
|
||||
>::GemmKernel;
|
||||
|
||||
// Define named type
|
||||
struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type :
|
||||
public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };
|
||||
|
||||
=============================================================================================
|
||||
Compiling and running activation <class 'cutlass.backend.epilogue.sigmoid'>
|
||||
=============================================================================================
|
||||
|
||||
// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8
|
||||
using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =
|
||||
typename cutlass::gemm::kernel::DefaultGemmUniversal<
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor,
|
||||
cutlass::half_t,
|
||||
cutlass::arch::OpClassTensorOp,
|
||||
cutlass::arch::Sm80,
|
||||
cutlass::gemm::GemmShape<256, 128, 64>,
|
||||
cutlass::gemm::GemmShape<64, 64, 64>,
|
||||
cutlass::gemm::GemmShape<16, 8, 16>,
|
||||
cutlass::epilogue::thread::LinearCombinationGeneric<cutlass::epilogue::thread::Sigmoid, cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,
|
||||
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,
|
||||
3,
|
||||
cutlass::arch::OpMultiplyAdd
|
||||
>::GemmKernel;
|
||||
|
||||
// Define named type
|
||||
struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type :
|
||||
public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };
|
||||
|
||||
=============================================================================================
|
||||
Compiling and running activation <class 'cutlass.backend.epilogue.silu'>
|
||||
=============================================================================================
|
||||
|
||||
// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8
|
||||
using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =
|
||||
typename cutlass::gemm::kernel::DefaultGemmUniversal<
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor,
|
||||
cutlass::half_t,
|
||||
cutlass::arch::OpClassTensorOp,
|
||||
cutlass::arch::Sm80,
|
||||
cutlass::gemm::GemmShape<256, 128, 64>,
|
||||
cutlass::gemm::GemmShape<64, 64, 64>,
|
||||
cutlass::gemm::GemmShape<16, 8, 16>,
|
||||
cutlass::epilogue::thread::LinearCombinationGeneric<cutlass::epilogue::thread::SiLu, cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,
|
||||
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,
|
||||
3,
|
||||
cutlass::arch::OpMultiplyAdd
|
||||
>::GemmKernel;
|
||||
|
||||
// Define named type
|
||||
struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type :
|
||||
public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };
|
||||
|
||||
=============================================================================================
|
||||
Compiling and running activation <class 'cutlass.backend.epilogue.tanh'>
|
||||
=============================================================================================
|
||||
|
||||
// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8
|
||||
using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =
|
||||
typename cutlass::gemm::kernel::DefaultGemmUniversal<
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor,
|
||||
cutlass::half_t,
|
||||
cutlass::arch::OpClassTensorOp,
|
||||
cutlass::arch::Sm80,
|
||||
cutlass::gemm::GemmShape<256, 128, 64>,
|
||||
cutlass::gemm::GemmShape<64, 64, 64>,
|
||||
cutlass::gemm::GemmShape<16, 8, 16>,
|
||||
cutlass::epilogue::thread::LinearCombinationGeneric<cutlass::epilogue::thread::Tanh, cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,
|
||||
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,
|
||||
3,
|
||||
cutlass::arch::OpMultiplyAdd
|
||||
>::GemmKernel;
|
||||
|
||||
// Define named type
|
||||
struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type :
|
||||
public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };
|
||||
|
||||
</pre></div></div>
|
||||
</div>
|
||||
<div class="nbinput nblast docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[ ]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
<a class="next-page" href="02_pytorch_extension_grouped_gemm.html">
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Next</span>
|
||||
</div>
|
||||
<div class="title">Exporting a CUTLASS grouped GEMM kernel to a PyTorch CUDA extension</div>
|
||||
</div>
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
</a>
|
||||
<a class="prev-page" href="../examples.html">
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Previous</span>
|
||||
</div>
|
||||
|
||||
<div class="title">Examples</div>
|
||||
|
||||
</div>
|
||||
</a>
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer">
|
||||
|
||||
|
||||
<div class="toc-sticky toc-scroll">
|
||||
<div class="toc-title-container">
|
||||
<span class="toc-title">
|
||||
On this page
|
||||
</span>
|
||||
</div>
|
||||
<div class="toc-tree-container">
|
||||
<div class="toc-tree">
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Example of using elementwise activation functions in the CUTLASS Python interface</a><ul>
|
||||
<li><a class="reference internal" href="#Run-a-GEMM-with-an-identity-activation-function">Run a GEMM with an identity activation function</a></li>
|
||||
<li><a class="reference internal" href="#Run-a-GEMM-with-a-ReLU-element-wise-activation-function">Run a GEMM with a ReLU element-wise activation function</a></li>
|
||||
<li><a class="reference internal" href="#Other-element-wise-activation-functions">Other element-wise activation functions</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
|
||||
<script src="../_static/doctools.js"></script>
|
||||
<script src="../_static/sphinx_highlight.js"></script>
|
||||
<script src="../_static/scripts/furo.js"></script>
|
||||
<script src="../_static/clipboard.min.js"></script>
|
||||
<script src="../_static/copybutton.js"></script>
|
||||
<script src="../_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
<script>window.MathJax = {"tex": {"inlineMath": [["$", "$"], ["\\(", "\\)"]], "processEscapes": true}, "options": {"ignoreHtmlClass": "tex2jax_ignore|mathjax_ignore|document", "processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
|
||||
<script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
593
python/docs/externals/01_epilogue.ipynb
vendored
Normal file
@ -0,0 +1,593 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "5d24a692",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Example of using elementwise activation functions in the CUTLASS Python interface\n",
|
||||
"This notebook walks through a basic example of using the CUTLASS Python interface to declare, compile, and run GEMMs with different epilogues.\n",
|
||||
"\n",
|
||||
"[](https://colab.research.google.com/github/NVIDIA/cutlass/tree/master/examples/00_basic_gemm.ipynb)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "3ca993fe",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We first import various packages needed for the example and construct the input and output tensors that will be used in our example."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "63a70a3c",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T18:00:09.148380Z",
|
||||
"iopub.status.busy": "2023-04-18T18:00:09.148011Z",
|
||||
"iopub.status.idle": "2023-04-18T18:00:13.281937Z",
|
||||
"shell.execute_reply": "2023-04-18T18:00:13.281256Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/usr/local/lib/python3.8/dist-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
|
||||
" from .autonotebook import tqdm as notebook_tqdm\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import numpy as np\n",
|
||||
"\n",
|
||||
"import cutlass\n",
|
||||
"\n",
|
||||
"# This controls whether ther C++ GEMM declaration will be printed at each step. Set to `false` to\n",
|
||||
"# omit this information.\n",
|
||||
"print_module = True\n",
|
||||
"\n",
|
||||
"m = 256\n",
|
||||
"n = m\n",
|
||||
"k = m\n",
|
||||
"\n",
|
||||
"type_A = np.float16\n",
|
||||
"type_B = np.float16\n",
|
||||
"type_C = np.float16\n",
|
||||
"type_D = np.float16\n",
|
||||
"\n",
|
||||
"np.random.seed(1234)\n",
|
||||
"scope_min = -4\n",
|
||||
"scope_max = 4\n",
|
||||
"tensor_A = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(m, k)).astype(type_A))\n",
|
||||
"tensor_B = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(k, n)).astype(type_B))\n",
|
||||
"tensor_C = np.ceil(np.random.uniform(low=scope_min, high=scope_max, size=(m, n)).astype(type_C))\n",
|
||||
"\n",
|
||||
"alpha = np.float16(1.)\n",
|
||||
"beta = np.float16(0.)\n",
|
||||
"\n",
|
||||
"tensor_D = np.zeros(tensor_C.shape).astype(type_D)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "1eb0d95b",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Run a GEMM with an identity activation function\n",
|
||||
"To begin, we simply run a default GEMM with an identity activation function. This performs the well-known operation `D = alpha * (A @ B) + beta * C`. This is the default activation function used, and does not need to be specified."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "8d257833",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T18:00:13.284650Z",
|
||||
"iopub.status.busy": "2023-04-18T18:00:13.284425Z",
|
||||
"iopub.status.idle": "2023-04-18T18:00:18.333867Z",
|
||||
"shell.execute_reply": "2023-04-18T18:00:18.333187Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8\n",
|
||||
"using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =\n",
|
||||
" typename cutlass::gemm::kernel::DefaultGemmUniversal<\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor,\n",
|
||||
" cutlass::half_t,\n",
|
||||
" cutlass::arch::OpClassTensorOp,\n",
|
||||
" cutlass::arch::Sm80,\n",
|
||||
" cutlass::gemm::GemmShape<256, 128, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<64, 64, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<16, 8, 16>,\n",
|
||||
" cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,\n",
|
||||
" cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,\n",
|
||||
" 3,\n",
|
||||
" cutlass::arch::OpMultiplyAdd\n",
|
||||
">::GemmKernel;\n",
|
||||
"\n",
|
||||
"// Define named type\n",
|
||||
"struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type : \n",
|
||||
" public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"<cutlass.backend.gemm_operation.GemmArguments2x at 0x7fed907287c0>"
|
||||
]
|
||||
},
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"plan = cutlass.op.Gemm(element=np.float16, layout=cutlass.LayoutType.RowMajor)\n",
|
||||
"plan.run(tensor_A, tensor_B, tensor_C, tensor_D, print_module=print_module)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "54961694",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Run a GEMM with a ReLU element-wise activation function\n",
|
||||
"CUTLASS makes it easy to support other element-wise activation functions. This results in performing an element-wise after the generic linear combination performed in a GEMM. If we call such an activation function `act`, the resulting formulation is:\n",
|
||||
"```\n",
|
||||
"D = alpha * (A @ B) + beta * C\n",
|
||||
"D = act(D)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"Here, we will add a ReLU activation function. Given an input `x`, ReLU returns `max(x, 0)`.\n",
|
||||
"\n",
|
||||
"This is easy to do in CUTLASS. One only needs to set the plan's `activation` field."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "5fe49443",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T18:00:18.337036Z",
|
||||
"iopub.status.busy": "2023-04-18T18:00:18.336833Z",
|
||||
"iopub.status.idle": "2023-04-18T18:00:23.482072Z",
|
||||
"shell.execute_reply": "2023-04-18T18:00:23.481125Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8\n",
|
||||
"using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =\n",
|
||||
" typename cutlass::gemm::kernel::DefaultGemmUniversal<\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor,\n",
|
||||
" cutlass::half_t,\n",
|
||||
" cutlass::arch::OpClassTensorOp,\n",
|
||||
" cutlass::arch::Sm80,\n",
|
||||
" cutlass::gemm::GemmShape<256, 128, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<64, 64, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<16, 8, 16>,\n",
|
||||
" cutlass::epilogue::thread::LinearCombinationGeneric<cutlass::epilogue::thread::ReLu, cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,\n",
|
||||
" cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,\n",
|
||||
" 3,\n",
|
||||
" cutlass::arch::OpMultiplyAdd\n",
|
||||
">::GemmKernel;\n",
|
||||
"\n",
|
||||
"// Define named type\n",
|
||||
"struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type : \n",
|
||||
" public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"<cutlass.backend.gemm_operation.GemmArguments2x at 0x7fed906f2460>"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"tensor_D_relu = np.zeros(tensor_C.shape).astype(type_D)\n",
|
||||
"plan.activation = cutlass.epilogue.relu\n",
|
||||
"plan.run(tensor_A, tensor_B, tensor_C, tensor_D_relu, print_module=print_module)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "455d0a37",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can now verify that the result of the GEMM that used a ReLU activation function:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "e32e7798",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T18:00:23.486042Z",
|
||||
"iopub.status.busy": "2023-04-18T18:00:23.485342Z",
|
||||
"iopub.status.idle": "2023-04-18T18:00:23.497444Z",
|
||||
"shell.execute_reply": "2023-04-18T18:00:23.496668Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"relu_ref = (tensor_D >= 0).astype(type_D) * tensor_D\n",
|
||||
"np.testing.assert_array_equal(relu_ref, tensor_D_relu)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "cf959171",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Other element-wise activation functions\n",
|
||||
"CUTLASS supports a variety of widely-used element-wise activation functions. We can obtain a list of these functions via the `get_activations()` method."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "9e17d730",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T18:00:23.500102Z",
|
||||
"iopub.status.busy": "2023-04-18T18:00:23.499944Z",
|
||||
"iopub.status.idle": "2023-04-18T18:00:23.504562Z",
|
||||
"shell.execute_reply": "2023-04-18T18:00:23.503793Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"<class 'cutlass.backend.epilogue.gelu'>\n",
|
||||
"<class 'cutlass.backend.epilogue.hardswish'>\n",
|
||||
"<class 'cutlass.backend.epilogue.identity'>\n",
|
||||
"<class 'cutlass.backend.epilogue.leaky_relu'>\n",
|
||||
"<class 'cutlass.backend.epilogue.relu'>\n",
|
||||
"<class 'cutlass.backend.epilogue.sigmoid'>\n",
|
||||
"<class 'cutlass.backend.epilogue.silu'>\n",
|
||||
"<class 'cutlass.backend.epilogue.tanh'>\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"activations = plan.activations()\n",
|
||||
"for activation in activations:\n",
|
||||
" print(activation)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "0e4599fa",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can then run each of them:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "9c3598c9",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T18:00:23.507538Z",
|
||||
"iopub.status.busy": "2023-04-18T18:00:23.507257Z",
|
||||
"iopub.status.idle": "2023-04-18T18:00:59.414765Z",
|
||||
"shell.execute_reply": "2023-04-18T18:00:59.414116Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"=============================================================================================\n",
|
||||
"Compiling and running activation <class 'cutlass.backend.epilogue.gelu'>\n",
|
||||
"=============================================================================================\n",
|
||||
"\n",
|
||||
"// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8\n",
|
||||
"using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =\n",
|
||||
" typename cutlass::gemm::kernel::DefaultGemmUniversal<\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor,\n",
|
||||
" cutlass::half_t,\n",
|
||||
" cutlass::arch::OpClassTensorOp,\n",
|
||||
" cutlass::arch::Sm80,\n",
|
||||
" cutlass::gemm::GemmShape<256, 128, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<64, 64, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<16, 8, 16>,\n",
|
||||
" cutlass::epilogue::thread::LinearCombinationGeneric<cutlass::epilogue::thread::GELU, cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,\n",
|
||||
" cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,\n",
|
||||
" 3,\n",
|
||||
" cutlass::arch::OpMultiplyAdd\n",
|
||||
">::GemmKernel;\n",
|
||||
"\n",
|
||||
"// Define named type\n",
|
||||
"struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type : \n",
|
||||
" public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"=============================================================================================\n",
|
||||
"Compiling and running activation <class 'cutlass.backend.epilogue.hardswish'>\n",
|
||||
"=============================================================================================\n",
|
||||
"\n",
|
||||
"// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8\n",
|
||||
"using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =\n",
|
||||
" typename cutlass::gemm::kernel::DefaultGemmUniversal<\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor,\n",
|
||||
" cutlass::half_t,\n",
|
||||
" cutlass::arch::OpClassTensorOp,\n",
|
||||
" cutlass::arch::Sm80,\n",
|
||||
" cutlass::gemm::GemmShape<256, 128, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<64, 64, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<16, 8, 16>,\n",
|
||||
" cutlass::epilogue::thread::LinearCombinationGeneric<cutlass::epilogue::thread::HardSwish, cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,\n",
|
||||
" cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,\n",
|
||||
" 3,\n",
|
||||
" cutlass::arch::OpMultiplyAdd\n",
|
||||
">::GemmKernel;\n",
|
||||
"\n",
|
||||
"// Define named type\n",
|
||||
"struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type : \n",
|
||||
" public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"=============================================================================================\n",
|
||||
"Compiling and running activation <class 'cutlass.backend.epilogue.identity'>\n",
|
||||
"=============================================================================================\n",
|
||||
"\n",
|
||||
"// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8\n",
|
||||
"using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =\n",
|
||||
" typename cutlass::gemm::kernel::DefaultGemmUniversal<\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor,\n",
|
||||
" cutlass::half_t,\n",
|
||||
" cutlass::arch::OpClassTensorOp,\n",
|
||||
" cutlass::arch::Sm80,\n",
|
||||
" cutlass::gemm::GemmShape<256, 128, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<64, 64, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<16, 8, 16>,\n",
|
||||
" cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,\n",
|
||||
" cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,\n",
|
||||
" 3,\n",
|
||||
" cutlass::arch::OpMultiplyAdd\n",
|
||||
">::GemmKernel;\n",
|
||||
"\n",
|
||||
"// Define named type\n",
|
||||
"struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type : \n",
|
||||
" public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };\n",
|
||||
"\n",
|
||||
"=============================================================================================\n",
|
||||
"Compiling and running activation <class 'cutlass.backend.epilogue.leaky_relu'>\n",
|
||||
"=============================================================================================\n",
|
||||
"\n",
|
||||
"// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8\n",
|
||||
"using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =\n",
|
||||
" typename cutlass::gemm::kernel::DefaultGemmUniversal<\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor,\n",
|
||||
" cutlass::half_t,\n",
|
||||
" cutlass::arch::OpClassTensorOp,\n",
|
||||
" cutlass::arch::Sm80,\n",
|
||||
" cutlass::gemm::GemmShape<256, 128, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<64, 64, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<16, 8, 16>,\n",
|
||||
" cutlass::epilogue::thread::LinearCombinationGeneric<cutlass::epilogue::thread::LeakyReLU, cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,\n",
|
||||
" cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,\n",
|
||||
" 3,\n",
|
||||
" cutlass::arch::OpMultiplyAdd\n",
|
||||
">::GemmKernel;\n",
|
||||
"\n",
|
||||
"// Define named type\n",
|
||||
"struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type : \n",
|
||||
" public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"=============================================================================================\n",
|
||||
"Compiling and running activation <class 'cutlass.backend.epilogue.relu'>\n",
|
||||
"=============================================================================================\n",
|
||||
"\n",
|
||||
"// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8\n",
|
||||
"using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =\n",
|
||||
" typename cutlass::gemm::kernel::DefaultGemmUniversal<\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor,\n",
|
||||
" cutlass::half_t,\n",
|
||||
" cutlass::arch::OpClassTensorOp,\n",
|
||||
" cutlass::arch::Sm80,\n",
|
||||
" cutlass::gemm::GemmShape<256, 128, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<64, 64, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<16, 8, 16>,\n",
|
||||
" cutlass::epilogue::thread::LinearCombinationGeneric<cutlass::epilogue::thread::ReLu, cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,\n",
|
||||
" cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,\n",
|
||||
" 3,\n",
|
||||
" cutlass::arch::OpMultiplyAdd\n",
|
||||
">::GemmKernel;\n",
|
||||
"\n",
|
||||
"// Define named type\n",
|
||||
"struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type : \n",
|
||||
" public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };\n",
|
||||
"\n",
|
||||
"=============================================================================================\n",
|
||||
"Compiling and running activation <class 'cutlass.backend.epilogue.sigmoid'>\n",
|
||||
"=============================================================================================\n",
|
||||
"\n",
|
||||
"// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8\n",
|
||||
"using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =\n",
|
||||
" typename cutlass::gemm::kernel::DefaultGemmUniversal<\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor,\n",
|
||||
" cutlass::half_t,\n",
|
||||
" cutlass::arch::OpClassTensorOp,\n",
|
||||
" cutlass::arch::Sm80,\n",
|
||||
" cutlass::gemm::GemmShape<256, 128, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<64, 64, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<16, 8, 16>,\n",
|
||||
" cutlass::epilogue::thread::LinearCombinationGeneric<cutlass::epilogue::thread::Sigmoid, cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,\n",
|
||||
" cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,\n",
|
||||
" 3,\n",
|
||||
" cutlass::arch::OpMultiplyAdd\n",
|
||||
">::GemmKernel;\n",
|
||||
"\n",
|
||||
"// Define named type\n",
|
||||
"struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type : \n",
|
||||
" public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"=============================================================================================\n",
|
||||
"Compiling and running activation <class 'cutlass.backend.epilogue.silu'>\n",
|
||||
"=============================================================================================\n",
|
||||
"\n",
|
||||
"// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8\n",
|
||||
"using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =\n",
|
||||
" typename cutlass::gemm::kernel::DefaultGemmUniversal<\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor,\n",
|
||||
" cutlass::half_t,\n",
|
||||
" cutlass::arch::OpClassTensorOp,\n",
|
||||
" cutlass::arch::Sm80,\n",
|
||||
" cutlass::gemm::GemmShape<256, 128, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<64, 64, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<16, 8, 16>,\n",
|
||||
" cutlass::epilogue::thread::LinearCombinationGeneric<cutlass::epilogue::thread::SiLu, cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,\n",
|
||||
" cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,\n",
|
||||
" 3,\n",
|
||||
" cutlass::arch::OpMultiplyAdd\n",
|
||||
">::GemmKernel;\n",
|
||||
"\n",
|
||||
"// Define named type\n",
|
||||
"struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type : \n",
|
||||
" public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"=============================================================================================\n",
|
||||
"Compiling and running activation <class 'cutlass.backend.epilogue.tanh'>\n",
|
||||
"=============================================================================================\n",
|
||||
"\n",
|
||||
"// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8\n",
|
||||
"using cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base =\n",
|
||||
" typename cutlass::gemm::kernel::DefaultGemmUniversal<\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor,\n",
|
||||
" cutlass::half_t,\n",
|
||||
" cutlass::arch::OpClassTensorOp,\n",
|
||||
" cutlass::arch::Sm80,\n",
|
||||
" cutlass::gemm::GemmShape<256, 128, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<64, 64, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<16, 8, 16>,\n",
|
||||
" cutlass::epilogue::thread::LinearCombinationGeneric<cutlass::epilogue::thread::Tanh, cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,\n",
|
||||
" cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,\n",
|
||||
" 3,\n",
|
||||
" cutlass::arch::OpMultiplyAdd\n",
|
||||
">::GemmKernel;\n",
|
||||
"\n",
|
||||
"// Define named type\n",
|
||||
"struct cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_type : \n",
|
||||
" public cutlass_sm80_tensorop_h16x8x16gemm_1x1x1_256x128_64x3_tt_align8_base { };\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"for activation in activations:\n",
|
||||
" print('=============================================================================================')\n",
|
||||
" print(f'Compiling and running activation {activation}')\n",
|
||||
" print('=============================================================================================')\n",
|
||||
" plan.activation = activation\n",
|
||||
" plan.run(tensor_A, tensor_B, tensor_C, tensor_D, print_module=print_module)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "751f8d92",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
537
python/docs/externals/02_pytorch_extension_grouped_gemm.html
vendored
Normal file
@ -0,0 +1,537 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
<link rel="index" title="Index" href="../genindex.html" /><link rel="search" title="Search" href="../search.html" /><link rel="prev" title="Example of using elementwise activation functions in the CUTLASS Python interface" href="01_epilogue.html" />
|
||||
<link rel="canonical" href="docs/externals/02_pytorch_extension_grouped_gemm.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>Exporting a CUTLASS grouped GEMM kernel to a PyTorch CUDA extension - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="../_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/nbsphinx-code-cells.css" />
|
||||
<link rel="stylesheet" type="text/css" href="../_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="../index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="../index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="../_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="../_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="../search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="../contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="../modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="../cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="../cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul class="current">
|
||||
<li class="toctree-l1 current has-children"><a class="reference internal" href="../examples.html">Examples</a><input checked="" class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul class="current">
|
||||
<li class="toctree-l2"><a class="reference internal" href="00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2 current current-page"><a class="current reference internal" href="#">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<section id="Exporting-a-CUTLASS-grouped-GEMM-kernel-to-a-PyTorch-CUDA-extension">
|
||||
<h1>Exporting a CUTLASS grouped GEMM kernel to a PyTorch CUDA extension<a class="headerlink" href="#Exporting-a-CUTLASS-grouped-GEMM-kernel-to-a-PyTorch-CUDA-extension" title="Permalink to this heading">#</a></h1>
|
||||
<p>This notebook walks through a basic example of using the CUTLASS Python interface to declare a grouped GEMM kernel and export it as a PyTorch CUDA extension.</p>
|
||||
<p><a class="reference external" href="https://colab.research.google.com/github/NVIDIA/cutlass/tree/master/examples/00_basic_gemm.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a></p>
|
||||
<section id="Background-on-grouped-GEMM">
|
||||
<h2>Background on grouped GEMM<a class="headerlink" href="#Background-on-grouped-GEMM" title="Permalink to this heading">#</a></h2>
|
||||
<p>Grouped GEMM enables one to execute a set of GEMMs (each with potentially different sizes and strides) in a single CUDA kernel. It can be thought of as a generalized version of a pointer-array GEMM, without the requirement that the sizes and strides of each GEMM be the same.</p>
|
||||
<p>For example, if one has <code class="docutils literal notranslate"><span class="pre">p</span></code> GEMMs with sizes:</p>
|
||||
<div class="highlight-text notranslate"><div class="highlight"><pre><span></span>M_1 x N_1 x K_1
|
||||
M_2 x N_2 x K_2
|
||||
...
|
||||
M_p x N_p x K_p
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>CUTLASS’s grouped GEMM will execute these in a single CUDA kernel.</p>
|
||||
<p>Grouped GEMM is particularly beneficial for saturating the GPU with many small problems that would insufficiently utilize the device in isolation.</p>
|
||||
</section>
|
||||
<section id="Declaring-a-grouped-GEMM-via-the-CUTLASS-Python-interface">
|
||||
<h2>Declaring a grouped GEMM via the CUTLASS Python interface<a class="headerlink" href="#Declaring-a-grouped-GEMM-via-the-CUTLASS-Python-interface" title="Permalink to this heading">#</a></h2>
|
||||
<p>A grouped GEMM operation is declared similarly to a GEMM operation in the CUTLASS Python interface: one simply calls <code class="docutils literal notranslate"><span class="pre">cutlass.op.GroupedGemm</span></code>.</p>
|
||||
<div class="nbinput docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[1]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">cutlass</span>
|
||||
<span class="kn">import</span> <span class="nn">torch</span>
|
||||
|
||||
<span class="n">dtype</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">float16</span>
|
||||
<span class="n">plan</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">GroupedGemm</span><span class="p">(</span><span class="n">element</span><span class="o">=</span><span class="n">dtype</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="nboutput nblast docutils container">
|
||||
<div class="prompt empty docutils container">
|
||||
</div>
|
||||
<div class="output_area stderr docutils container">
|
||||
<div class="highlight"><pre>
|
||||
/usr/local/lib/python3.8/dist-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
|
||||
from .autonotebook import tqdm as notebook_tqdm
|
||||
</pre></div></div>
|
||||
</div>
|
||||
<p>We can then compile and run this operation on a group of GEMMs. We’ll first set up some utility functions to initialize GEMMs.</p>
|
||||
<div class="nbinput nblast docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[2]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">random</span>
|
||||
<span class="n">random</span><span class="o">.</span><span class="n">seed</span><span class="p">(</span><span class="mi">2023</span><span class="p">)</span>
|
||||
|
||||
<span class="c1"># Utility function to initialize A, B, C, and D matrices corresponding to dimensions M, N, and K</span>
|
||||
<span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">dtype</span><span class="p">,</span> <span class="n">M</span><span class="p">,</span> <span class="n">N</span><span class="p">,</span> <span class="n">K</span><span class="p">):</span>
|
||||
<span class="n">sizes</span> <span class="o">=</span> <span class="p">[(</span><span class="n">M</span><span class="p">,</span> <span class="n">K</span><span class="p">),</span> <span class="p">(</span><span class="n">K</span><span class="p">,</span> <span class="n">N</span><span class="p">),</span> <span class="p">(</span><span class="n">M</span><span class="p">,</span> <span class="n">N</span><span class="p">),</span> <span class="p">(</span><span class="n">M</span><span class="p">,</span> <span class="n">N</span><span class="p">)]</span>
|
||||
<span class="k">return</span> <span class="p">[</span><span class="n">torch</span><span class="o">.</span><span class="n">randint</span><span class="p">(</span><span class="o">-</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="n">size</span><span class="p">,</span> <span class="n">device</span><span class="o">=</span><span class="s1">'cuda'</span><span class="p">)</span><span class="o">.</span><span class="n">to</span><span class="p">(</span><span class="n">dtype</span><span class="p">)</span> <span class="k">for</span> <span class="n">size</span> <span class="ow">in</span> <span class="n">sizes</span><span class="p">]</span>
|
||||
|
||||
<span class="c1"># Utility function to generate `problems` GEMMs of random sizes</span>
|
||||
<span class="k">def</span> <span class="nf">generate_problems</span><span class="p">(</span><span class="n">problems</span><span class="p">):</span>
|
||||
<span class="n">valid_sizes</span> <span class="o">=</span> <span class="p">[</span><span class="mi">128</span><span class="p">,</span> <span class="mi">256</span><span class="p">,</span> <span class="mi">512</span><span class="p">,</span> <span class="mi">1024</span><span class="p">]</span>
|
||||
<span class="n">As</span><span class="p">,</span> <span class="n">Bs</span><span class="p">,</span> <span class="n">Cs</span><span class="p">,</span> <span class="n">Ds</span> <span class="o">=</span> <span class="p">[],</span> <span class="p">[],</span> <span class="p">[],</span> <span class="p">[]</span>
|
||||
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">problems</span><span class="p">):</span>
|
||||
<span class="n">M</span><span class="p">,</span> <span class="n">N</span><span class="p">,</span> <span class="n">K</span> <span class="o">=</span> <span class="p">[</span><span class="n">random</span><span class="o">.</span><span class="n">choice</span><span class="p">(</span><span class="n">valid_sizes</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">3</span><span class="p">)]</span>
|
||||
<span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span> <span class="o">=</span> <span class="n">initialize</span><span class="p">(</span><span class="n">dtype</span><span class="p">,</span> <span class="n">M</span><span class="p">,</span> <span class="n">N</span><span class="p">,</span> <span class="n">K</span><span class="p">)</span>
|
||||
<span class="n">As</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">A</span><span class="p">)</span>
|
||||
<span class="n">Bs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">B</span><span class="p">)</span>
|
||||
<span class="n">Cs</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">C</span><span class="p">)</span>
|
||||
<span class="n">Ds</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">D</span><span class="p">)</span>
|
||||
<span class="k">return</span> <span class="n">As</span><span class="p">,</span> <span class="n">Bs</span><span class="p">,</span> <span class="n">Cs</span><span class="p">,</span> <span class="n">Ds</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<p>We’ll next run a group of 50 GEMMs via the CUTLASS Python interface and via PyTorch.</p>
|
||||
<div class="nbinput docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[3]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">As</span><span class="p">,</span> <span class="n">Bs</span><span class="p">,</span> <span class="n">Cs</span><span class="p">,</span> <span class="n">Ds</span><span class="p">,</span> <span class="o">=</span> <span class="n">generate_problems</span><span class="p">(</span><span class="mi">50</span><span class="p">)</span>
|
||||
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">As</span><span class="p">,</span> <span class="n">Bs</span><span class="p">,</span> <span class="n">Cs</span><span class="p">,</span> <span class="n">Ds</span><span class="p">,</span> <span class="n">print_module</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
<span class="n">Ds_torch</span> <span class="o">=</span> <span class="p">[</span><span class="n">a</span> <span class="o">@</span> <span class="n">b</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">As</span><span class="p">,</span> <span class="n">Bs</span><span class="p">)]</span>
|
||||
|
||||
<span class="k">for</span> <span class="n">d</span><span class="p">,</span> <span class="n">d_torch</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">Ds</span><span class="p">,</span> <span class="n">Ds_torch</span><span class="p">):</span>
|
||||
<span class="k">assert</span> <span class="n">torch</span><span class="o">.</span><span class="n">allclose</span><span class="p">(</span><span class="n">d</span><span class="p">,</span> <span class="n">d_torch</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="nboutput nblast docutils container">
|
||||
<div class="prompt empty docutils container">
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
|
||||
// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_grouped_1x1x1_256x128_64x3_tt_align8
|
||||
using cutlass_sm80_tensorop_h16x8x16gemm_grouped_1x1x1_256x128_64x3_tt_align8_base =
|
||||
typename cutlass::gemm::kernel::DefaultGemmGrouped<
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,
|
||||
cutlass::half_t, cutlass::layout::RowMajor,
|
||||
cutlass::half_t,
|
||||
cutlass::arch::OpClassTensorOp,
|
||||
cutlass::arch::Sm80,
|
||||
cutlass::gemm::GemmShape<256, 128, 64>,
|
||||
cutlass::gemm::GemmShape<64, 64, 64>,
|
||||
cutlass::gemm::GemmShape<16, 8, 16>,
|
||||
cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,
|
||||
cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,
|
||||
3,
|
||||
cutlass::gemm::kernel::GroupScheduleMode::kDeviceOnly,
|
||||
cutlass::arch::OpMultiplyAdd
|
||||
>::GemmKernel;
|
||||
|
||||
// Define named type
|
||||
struct cutlass_sm80_tensorop_h16x8x16gemm_grouped_1x1x1_256x128_64x3_tt_align8_type :
|
||||
public cutlass_sm80_tensorop_h16x8x16gemm_grouped_1x1x1_256x128_64x3_tt_align8_base { };
|
||||
|
||||
</pre></div></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="Exporting-the-CUTLASS-kernel-to-a-PyTorch-CUDA-extension">
|
||||
<h2>Exporting the CUTLASS kernel to a PyTorch CUDA extension<a class="headerlink" href="#Exporting-the-CUTLASS-kernel-to-a-PyTorch-CUDA-extension" title="Permalink to this heading">#</a></h2>
|
||||
<p>The procedure above allows one to quickly experiment with using a CUTLASS kernels However, one might prefer to use the CUTLASS kernel via a <a class="reference external" href="https://pytorch.org/tutorials/advanced/cpp_extension.html">PyTorch CUDA extension</a>. This will avoids adding any runtime overheads associated with the Python portions of the CUTLASS Python interface.</p>
|
||||
<p>The CUTLASS Python interface provides simple solutions for creating PyTorch CUDA extensions for a CUTLASS kernel. These extensions can either be written out for a later “ahead-of-time” compilation, or be just-in-time compiled and returned to the user.</p>
|
||||
<p>To create a JIT-compiled module from the CUTLASS kernel we defined above, simply call the following:</p>
|
||||
<div class="nbinput nblast docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[4]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">op</span> <span class="o">=</span> <span class="n">plan</span><span class="o">.</span><span class="n">construct</span><span class="p">()</span>
|
||||
<span class="n">grouped_gemm</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">emit</span><span class="o">.</span><span class="n">pytorch</span><span class="p">(</span><span class="n">op</span><span class="p">,</span> <span class="n">name</span><span class="o">=</span><span class="s1">'grouped_gemm'</span><span class="p">,</span> <span class="n">cc</span><span class="o">=</span><span class="n">plan</span><span class="o">.</span><span class="n">cc</span><span class="p">,</span> <span class="n">sourcedir</span><span class="o">=</span><span class="s1">'out'</span><span class="p">,</span> <span class="n">jit</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<p>The <code class="docutils literal notranslate"><span class="pre">cutlass.emit.pytorch</span></code> function emits: * <code class="docutils literal notranslate"><span class="pre">out/grouped_gemm_kernel.cu</span></code>: This file contains the declaration of the CUTLASS kernel and a method to call it from PyTorch tensors * <code class="docutils literal notranslate"><span class="pre">out/grouped_gemm.cpp</span></code>: This file contains a C++ wrapper around the aforementioned CUTLASS kernel * <code class="docutils literal notranslate"><span class="pre">setup.py</span></code>: This file contains the <code class="docutils literal notranslate"><span class="pre">setuptools</span></code> script for building and installing the generated extension</p>
|
||||
<p>The extension can be build from within the <code class="docutils literal notranslate"><span class="pre">module_output</span></code> directory by running:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nv">TORCH_CUDA_ARCH_LIST</span><span class="o">=</span><span class="s2">"8.0"</span><span class="w"> </span>python<span class="w"> </span>setup.py<span class="w"> </span>install
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>Where <code class="docutils literal notranslate"><span class="pre">TORCH_ARCH_LIST</span></code> is set to the compute capability of the device on which the kernel will be run.</p>
|
||||
<p>See the PyTorch <a class="reference external" href="https://pytorch.org/tutorials/advanced/cpp_extension.html">“Custom C++ and CUDA Extensions”</a> tutorial for more details on this.</p>
|
||||
<p>The PyTorch CUDA extension could be built for this module by running:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span><span class="nb">cd</span><span class="w"> </span>out
|
||||
<span class="nv">TORCH_CUDA_ARCH_LIST</span><span class="o">=</span><span class="s2">"8.0"</span><span class="w"> </span>python<span class="w"> </span>setup.py
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>(assuming that one is building for SM80)</p>
|
||||
<p>One could then use the kernel in a later PyTorch module by running:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">torch</span>
|
||||
<span class="kn">import</span> <span class="nn">grouped_gemm</span>
|
||||
|
||||
<span class="n">grouped_gemm</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">As</span><span class="p">,</span> <span class="n">Bs</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>In this case, however, we set <code class="docutils literal notranslate"><span class="pre">jit=True</span></code>, which specifies that we would like to compile and load the PyTorch CUDA extension on the fly. Under the hood, this leverages the <a class="reference external" href="https://pytorch.org/tutorials/advanced/cpp_extension.html">torch.utils.cpp_extension.load</a> method and returns back the loaded extension.</p>
|
||||
<p>We can then use the extension and compare its results to running the GEMMs via vanilla PyTorch GEMMs:</p>
|
||||
<div class="nbinput nblast docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[5]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">Ds</span> <span class="o">=</span> <span class="n">grouped_gemm</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">As</span><span class="p">,</span> <span class="n">Bs</span><span class="p">)</span>
|
||||
<span class="n">Ds_torch</span> <span class="o">=</span> <span class="p">[</span><span class="n">a</span> <span class="o">@</span> <span class="n">b</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">As</span><span class="p">,</span> <span class="n">Bs</span><span class="p">)]</span>
|
||||
<span class="k">for</span> <span class="n">d</span><span class="p">,</span> <span class="n">d_torch</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">Ds</span><span class="p">,</span> <span class="n">Ds_torch</span><span class="p">):</span>
|
||||
<span class="k">assert</span> <span class="n">torch</span><span class="o">.</span><span class="n">allclose</span><span class="p">(</span><span class="n">d</span><span class="p">,</span> <span class="n">d_torch</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<p>Finally, we can profile our grouped GEMM extension:</p>
|
||||
<div class="nbinput docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[6]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span><span class="n">num_warmup</span> <span class="o">=</span> <span class="mi">20</span>
|
||||
<span class="n">num_profile</span> <span class="o">=</span> <span class="mi">100</span>
|
||||
|
||||
<span class="c1"># Warmup iterations</span>
|
||||
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_warmup</span><span class="p">):</span>
|
||||
<span class="n">Ds</span> <span class="o">=</span> <span class="n">grouped_gemm</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">As</span><span class="p">,</span> <span class="n">Bs</span><span class="p">)</span>
|
||||
<span class="n">Ds_torch</span> <span class="o">=</span> <span class="p">[</span><span class="n">a</span> <span class="o">@</span> <span class="n">b</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">As</span><span class="p">,</span> <span class="n">Bs</span><span class="p">)]</span>
|
||||
<span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">synchronize</span><span class="p">()</span>
|
||||
|
||||
<span class="c1"># Timing iterations</span>
|
||||
<span class="kn">import</span> <span class="nn">time</span>
|
||||
<span class="n">grouped</span> <span class="o">=</span> <span class="mi">0</span>
|
||||
<span class="n">nongrouped</span> <span class="o">=</span> <span class="mi">0</span>
|
||||
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_profile</span><span class="p">):</span>
|
||||
<span class="n">start</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
|
||||
<span class="n">Ds</span> <span class="o">=</span> <span class="n">grouped_gemm</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">As</span><span class="p">,</span> <span class="n">Bs</span><span class="p">)</span>
|
||||
<span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">synchronize</span><span class="p">()</span>
|
||||
<span class="n">grouped</span> <span class="o">+=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">start</span>
|
||||
|
||||
<span class="n">start</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span>
|
||||
<span class="n">Ds_torch</span> <span class="o">=</span> <span class="p">[</span><span class="n">a</span> <span class="o">@</span> <span class="n">b</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">As</span><span class="p">,</span> <span class="n">Bs</span><span class="p">)]</span>
|
||||
<span class="n">torch</span><span class="o">.</span><span class="n">cuda</span><span class="o">.</span><span class="n">synchronize</span><span class="p">()</span>
|
||||
<span class="n">nongrouped</span> <span class="o">+=</span> <span class="n">time</span><span class="o">.</span><span class="n">time</span><span class="p">()</span> <span class="o">-</span> <span class="n">start</span>
|
||||
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'Grouped: </span><span class="si">{:.3f}</span><span class="s1"> us'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">grouped</span> <span class="o">*</span> <span class="mf">1e6</span><span class="o">/</span><span class="n">num_profile</span><span class="p">))</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'Non-Grouped: </span><span class="si">{:.3f}</span><span class="s1"> us'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">nongrouped</span> <span class="o">*</span> <span class="mf">1e6</span><span class="o">/</span><span class="n">num_profile</span><span class="p">))</span>
|
||||
<span class="nb">print</span><span class="p">(</span><span class="s1">'Speedup: </span><span class="si">{:.3f}</span><span class="s1">'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">nongrouped</span> <span class="o">/</span> <span class="n">grouped</span><span class="p">))</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="nboutput nblast docutils container">
|
||||
<div class="prompt empty docutils container">
|
||||
</div>
|
||||
<div class="output_area docutils container">
|
||||
<div class="highlight"><pre>
|
||||
Grouped: 400.696 us
|
||||
Non-Grouped: 646.670 us
|
||||
Speedup: 1.614
|
||||
</pre></div></div>
|
||||
</div>
|
||||
<div class="nbinput nblast docutils container">
|
||||
<div class="prompt highlight-none notranslate"><div class="highlight"><pre><span></span>[ ]:
|
||||
</pre></div>
|
||||
</div>
|
||||
<div class="input_area highlight-ipython3 notranslate"><div class="highlight"><pre><span></span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
|
||||
<a class="prev-page" href="01_epilogue.html">
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Previous</span>
|
||||
</div>
|
||||
|
||||
<div class="title">Example of using elementwise activation functions in the CUTLASS Python interface</div>
|
||||
|
||||
</div>
|
||||
</a>
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer">
|
||||
|
||||
|
||||
<div class="toc-sticky toc-scroll">
|
||||
<div class="toc-title-container">
|
||||
<span class="toc-title">
|
||||
On this page
|
||||
</span>
|
||||
</div>
|
||||
<div class="toc-tree-container">
|
||||
<div class="toc-tree">
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Exporting a CUTLASS grouped GEMM kernel to a PyTorch CUDA extension</a><ul>
|
||||
<li><a class="reference internal" href="#Background-on-grouped-GEMM">Background on grouped GEMM</a></li>
|
||||
<li><a class="reference internal" href="#Declaring-a-grouped-GEMM-via-the-CUTLASS-Python-interface">Declaring a grouped GEMM via the CUTLASS Python interface</a></li>
|
||||
<li><a class="reference internal" href="#Exporting-the-CUTLASS-kernel-to-a-PyTorch-CUDA-extension">Exporting the CUTLASS kernel to a PyTorch CUDA extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="../" id="documentation_options" src="../_static/documentation_options.js"></script>
|
||||
<script src="../_static/doctools.js"></script>
|
||||
<script src="../_static/sphinx_highlight.js"></script>
|
||||
<script src="../_static/scripts/furo.js"></script>
|
||||
<script src="../_static/clipboard.min.js"></script>
|
||||
<script src="../_static/copybutton.js"></script>
|
||||
<script src="../_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
<script>window.MathJax = {"tex": {"inlineMath": [["$", "$"], ["\\(", "\\)"]], "processEscapes": true}, "options": {"ignoreHtmlClass": "tex2jax_ignore|mathjax_ignore|document", "processHtmlClass": "tex2jax_process|mathjax_process|math|output_area"}}</script>
|
||||
<script defer="defer" src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
356
python/docs/externals/02_pytorch_extension_grouped_gemm.ipynb
vendored
Normal file
@ -0,0 +1,356 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"attachments": {},
|
||||
"cell_type": "markdown",
|
||||
"id": "6acbea5d",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Exporting a CUTLASS grouped GEMM kernel to a PyTorch CUDA extension\n",
|
||||
"This notebook walks through a basic example of using the CUTLASS Python interface to declare\n",
|
||||
"a grouped GEMM kernel and export it as a PyTorch CUDA extension.\n",
|
||||
"\n",
|
||||
"[](https://colab.research.google.com/github/NVIDIA/cutlass/tree/master/examples/00_basic_gemm.ipynb)\n",
|
||||
"\n",
|
||||
"## Background on grouped GEMM\n",
|
||||
"Grouped GEMM enables one to execute a set of GEMMs (each with potentially different sizes and strides)\n",
|
||||
"in a single CUDA kernel. It can be thought of as a generalized version of a pointer-array GEMM,\n",
|
||||
"without the requirement that the sizes and strides of each GEMM be the same.\n",
|
||||
"\n",
|
||||
"For example, if one has `p` GEMMs with sizes:\n",
|
||||
"```text\n",
|
||||
"M_1 x N_1 x K_1\n",
|
||||
"M_2 x N_2 x K_2\n",
|
||||
"...\n",
|
||||
"M_p x N_p x K_p\n",
|
||||
"```\n",
|
||||
"CUTLASS's grouped GEMM will execute these in a single CUDA kernel.\n",
|
||||
"\n",
|
||||
"Grouped GEMM is particularly beneficial for saturating the GPU with many small problems that would\n",
|
||||
"insufficiently utilize the device in isolation.\n",
|
||||
"\n",
|
||||
"## Declaring a grouped GEMM via the CUTLASS Python interface\n",
|
||||
"A grouped GEMM operation is declared similarly to a GEMM operation in the CUTLASS Python interface: one\n",
|
||||
"simply calls `cutlass.op.GroupedGemm`."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"id": "fdcf21d8",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T18:01:01.888030Z",
|
||||
"iopub.status.busy": "2023-04-18T18:01:01.887634Z",
|
||||
"iopub.status.idle": "2023-04-18T18:01:06.069481Z",
|
||||
"shell.execute_reply": "2023-04-18T18:01:06.068513Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stderr",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"/usr/local/lib/python3.8/dist-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
|
||||
" from .autonotebook import tqdm as notebook_tqdm\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"import cutlass\n",
|
||||
"import torch\n",
|
||||
"\n",
|
||||
"dtype = torch.float16\n",
|
||||
"plan = cutlass.op.GroupedGemm(element=dtype, layout=cutlass.LayoutType.RowMajor)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "514f40a4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We can then compile and run this operation on a group of GEMMs. We'll first set up some utility functions to initialize GEMMs."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"id": "c2a7371e",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T18:01:06.073326Z",
|
||||
"iopub.status.busy": "2023-04-18T18:01:06.073092Z",
|
||||
"iopub.status.idle": "2023-04-18T18:01:06.080337Z",
|
||||
"shell.execute_reply": "2023-04-18T18:01:06.079517Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import random\n",
|
||||
"random.seed(2023)\n",
|
||||
"\n",
|
||||
"# Utility function to initialize A, B, C, and D matrices corresponding to dimensions M, N, and K\n",
|
||||
"def initialize(dtype, M, N, K):\n",
|
||||
" sizes = [(M, K), (K, N), (M, N), (M, N)]\n",
|
||||
" return [torch.randint(-3, 3, size, device='cuda').to(dtype) for size in sizes]\n",
|
||||
"\n",
|
||||
"# Utility function to generate `problems` GEMMs of random sizes\n",
|
||||
"def generate_problems(problems):\n",
|
||||
" valid_sizes = [128, 256, 512, 1024]\n",
|
||||
" As, Bs, Cs, Ds = [], [], [], []\n",
|
||||
" for _ in range(problems):\n",
|
||||
" M, N, K = [random.choice(valid_sizes) for _ in range(3)]\n",
|
||||
" A, B, C, D = initialize(dtype, M, N, K)\n",
|
||||
" As.append(A)\n",
|
||||
" Bs.append(B)\n",
|
||||
" Cs.append(C)\n",
|
||||
" Ds.append(D)\n",
|
||||
" return As, Bs, Cs, Ds"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "590a3bc5",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We'll next run a group of 50 GEMMs via the CUTLASS Python interface and via PyTorch."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"id": "776c9233",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T18:01:06.083288Z",
|
||||
"iopub.status.busy": "2023-04-18T18:01:06.083082Z",
|
||||
"iopub.status.idle": "2023-04-18T18:01:10.783577Z",
|
||||
"shell.execute_reply": "2023-04-18T18:01:10.782798Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"\n",
|
||||
"// Gemm operator cutlass_sm80_tensorop_h16x8x16gemm_grouped_1x1x1_256x128_64x3_tt_align8\n",
|
||||
"using cutlass_sm80_tensorop_h16x8x16gemm_grouped_1x1x1_256x128_64x3_tt_align8_base =\n",
|
||||
" typename cutlass::gemm::kernel::DefaultGemmGrouped<\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor, cutlass::ComplexTransform::kNone, 8,\n",
|
||||
" cutlass::half_t, cutlass::layout::RowMajor,\n",
|
||||
" cutlass::half_t,\n",
|
||||
" cutlass::arch::OpClassTensorOp,\n",
|
||||
" cutlass::arch::Sm80,\n",
|
||||
" cutlass::gemm::GemmShape<256, 128, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<64, 64, 64>,\n",
|
||||
" cutlass::gemm::GemmShape<16, 8, 16>,\n",
|
||||
" cutlass::epilogue::thread::LinearCombination<cutlass::half_t, 8, cutlass::half_t, cutlass::half_t>,\n",
|
||||
" cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>,\n",
|
||||
" 3,\n",
|
||||
" cutlass::gemm::kernel::GroupScheduleMode::kDeviceOnly,\n",
|
||||
" cutlass::arch::OpMultiplyAdd\n",
|
||||
">::GemmKernel;\n",
|
||||
"\n",
|
||||
"// Define named type\n",
|
||||
"struct cutlass_sm80_tensorop_h16x8x16gemm_grouped_1x1x1_256x128_64x3_tt_align8_type :\n",
|
||||
" public cutlass_sm80_tensorop_h16x8x16gemm_grouped_1x1x1_256x128_64x3_tt_align8_base { };\n",
|
||||
"\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"As, Bs, Cs, Ds, = generate_problems(50)\n",
|
||||
"\n",
|
||||
"plan.run(As, Bs, Cs, Ds, print_module=True)\n",
|
||||
"Ds_torch = [a @ b for a, b in zip(As, Bs)]\n",
|
||||
"\n",
|
||||
"for d, d_torch in zip(Ds, Ds_torch):\n",
|
||||
" assert torch.allclose(d, d_torch)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "766e4f03",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Exporting the CUTLASS kernel to a PyTorch CUDA extension\n",
|
||||
"The procedure above allows one to quickly experiment with using a CUTLASS kernels However, one might prefer to use the CUTLASS kernel via a [PyTorch CUDA extension](https://pytorch.org/tutorials/advanced/cpp_extension.html). This will avoids adding any runtime overheads associated with the Python portions of the CUTLASS Python interface.\n",
|
||||
"\n",
|
||||
"The CUTLASS Python interface provides simple solutions for creating PyTorch CUDA extensions for a CUTLASS kernel. These extensions can either be written out for a later \"ahead-of-time\" compilation, or be just-in-time compiled and returned to the user.\n",
|
||||
"\n",
|
||||
"To create a JIT-compiled module from the CUTLASS kernel we defined above, simply call the following:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 4,
|
||||
"id": "3a98dee6",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T18:01:10.787020Z",
|
||||
"iopub.status.busy": "2023-04-18T18:01:10.786862Z",
|
||||
"iopub.status.idle": "2023-04-18T18:02:08.445210Z",
|
||||
"shell.execute_reply": "2023-04-18T18:02:08.443997Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"op = plan.construct()\n",
|
||||
"grouped_gemm = cutlass.emit.pytorch(op, name='grouped_gemm', cc=plan.cc, sourcedir='out', jit=True)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "c8ca3991",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The `cutlass.emit.pytorch` function emits:\n",
|
||||
"* `out/grouped_gemm_kernel.cu`: This file contains the declaration of the CUTLASS kernel and a method to call it from PyTorch tensors\n",
|
||||
"* `out/grouped_gemm.cpp`: This file contains a C++ wrapper around the aforementioned CUTLASS kernel\n",
|
||||
"* `setup.py`: This file contains the `setuptools` script for building and installing the generated extension\n",
|
||||
"\n",
|
||||
"The extension can be build from within the `module_output` directory by running:\n",
|
||||
"```bash\n",
|
||||
"TORCH_CUDA_ARCH_LIST=\"8.0\" python setup.py install\n",
|
||||
"```\n",
|
||||
"Where `TORCH_ARCH_LIST` is set to the compute capability of the device on which the kernel will be run.\n",
|
||||
"\n",
|
||||
"See the PyTorch [\"Custom C++ and CUDA Extensions\"](https://pytorch.org/tutorials/advanced/cpp_extension.html) tutorial for more details on this.\n",
|
||||
"\n",
|
||||
"The PyTorch CUDA extension could be built for this module by running:\n",
|
||||
"```bash\n",
|
||||
"cd out\n",
|
||||
"TORCH_CUDA_ARCH_LIST=\"8.0\" python setup.py\n",
|
||||
"```\n",
|
||||
"(assuming that one is building for SM80)\n",
|
||||
"\n",
|
||||
"One could then use the kernel in a later PyTorch module by running:\n",
|
||||
"\n",
|
||||
"```python\n",
|
||||
"import torch\n",
|
||||
"import grouped_gemm\n",
|
||||
"\n",
|
||||
"grouped_gemm.run(As, Bs)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"In this case, however, we set `jit=True`, which specifies that we would like to compile and load the PyTorch CUDA extension on the fly.\n",
|
||||
"Under the hood, this leverages the [torch.utils.cpp_extension.load](https://pytorch.org/tutorials/advanced/cpp_extension.html) method\n",
|
||||
"and returns back the loaded extension.\n",
|
||||
"\n",
|
||||
"We can then use the extension and compare its results to running the GEMMs via vanilla PyTorch GEMMs:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 5,
|
||||
"id": "cecb26a4",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T18:02:08.449530Z",
|
||||
"iopub.status.busy": "2023-04-18T18:02:08.449077Z",
|
||||
"iopub.status.idle": "2023-04-18T18:02:08.464755Z",
|
||||
"shell.execute_reply": "2023-04-18T18:02:08.464200Z"
|
||||
}
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"Ds = grouped_gemm.run(As, Bs)\n",
|
||||
"Ds_torch = [a @ b for a, b in zip(As, Bs)]\n",
|
||||
"for d, d_torch in zip(Ds, Ds_torch):\n",
|
||||
" assert torch.allclose(d, d_torch)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"id": "50db80e4",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Finally, we can profile our grouped GEMM extension:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 6,
|
||||
"id": "b76805d3",
|
||||
"metadata": {
|
||||
"execution": {
|
||||
"iopub.execute_input": "2023-04-18T18:02:08.467087Z",
|
||||
"iopub.status.busy": "2023-04-18T18:02:08.466879Z",
|
||||
"iopub.status.idle": "2023-04-18T18:02:08.603689Z",
|
||||
"shell.execute_reply": "2023-04-18T18:02:08.603085Z"
|
||||
}
|
||||
},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Grouped: 400.696 us\n",
|
||||
"Non-Grouped: 646.670 us\n",
|
||||
"Speedup: 1.614\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"num_warmup = 20\n",
|
||||
"num_profile = 100\n",
|
||||
"\n",
|
||||
"# Warmup iterations\n",
|
||||
"for _ in range(num_warmup):\n",
|
||||
" Ds = grouped_gemm.run(As, Bs)\n",
|
||||
" Ds_torch = [a @ b for a, b in zip(As, Bs)]\n",
|
||||
" torch.cuda.synchronize()\n",
|
||||
"\n",
|
||||
"# Timing iterations\n",
|
||||
"import time\n",
|
||||
"grouped = 0\n",
|
||||
"nongrouped = 0\n",
|
||||
"for _ in range(num_profile):\n",
|
||||
" start = time.time()\n",
|
||||
" Ds = grouped_gemm.run(As, Bs)\n",
|
||||
" torch.cuda.synchronize()\n",
|
||||
" grouped += time.time() - start\n",
|
||||
"\n",
|
||||
" start = time.time()\n",
|
||||
" Ds_torch = [a @ b for a, b in zip(As, Bs)]\n",
|
||||
" torch.cuda.synchronize()\n",
|
||||
" nongrouped += time.time() - start\n",
|
||||
"\n",
|
||||
"print('Grouped: {:.3f} us'.format(grouped * 1e6/num_profile))\n",
|
||||
"print('Non-Grouped: {:.3f} us'.format(nongrouped * 1e6/num_profile))\n",
|
||||
"print('Speedup: {:.3f}'.format(nongrouped / grouped))"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": null,
|
||||
"id": "f22fc696",
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": []
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.8.10"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 5
|
||||
}
|
||||
656
python/docs/genindex.html
Normal file
@ -0,0 +1,656 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><link rel="index" title="Index" href="#" /><link rel="search" title="Search" href="search.html" />
|
||||
<link rel="canonical" href="docs/genindex.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 --><title>Index - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
|
||||
<section class="genindex-section">
|
||||
<h1 id="index">Index</h1>
|
||||
<div class="genindex-jumpbox"><a href="#A"><strong>A</strong></a> | <a href="#B"><strong>B</strong></a> | <a href="#C"><strong>C</strong></a> | <a href="#F"><strong>F</strong></a> | <a href="#G"><strong>G</strong></a> | <a href="#H"><strong>H</strong></a> | <a href="#K"><strong>K</strong></a> | <a href="#L"><strong>L</strong></a> | <a href="#M"><strong>M</strong></a> | <a href="#N"><strong>N</strong></a> | <a href="#O"><strong>O</strong></a> | <a href="#P"><strong>P</strong></a> | <a href="#R"><strong>R</strong></a> | <a href="#S"><strong>S</strong></a> | <a href="#T"><strong>T</strong></a> | <a href="#V"><strong>V</strong></a></div>
|
||||
</section>
|
||||
<section id="A" class="genindex-section">
|
||||
<h2>A</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.op.html#cutlass.op.gemm.Gemm.activation">activation (cutlass.op.gemm.Gemm property)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.op.html#cutlass.op.op.OperationBase.activations">activations() (cutlass.op.op.OperationBase method)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.html#cutlass.library_defaults.KernelsForDataType.add">add() (cutlass.library_defaults.KernelsForDataType method)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.check.alignment_or_default">alignment_or_default() (in module cutlass.utils.check)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.html#cutlass.library_defaults.KernelsForDataType.alignments">alignments (cutlass.library_defaults.KernelsForDataType property)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.html#cutlass.library_defaults.KernelsForDataType.all_operations">all_operations (cutlass.library_defaults.KernelsForDataType property)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.html#cutlass.library_defaults.ArchOptions">ArchOptions (class in cutlass.library_defaults)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
</section>
|
||||
|
||||
<section id="B" class="genindex-section">
|
||||
<h2>B</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.backend_math_operation">backend_math_operation() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.bfloat16_library_type">bfloat16_library_type() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.bfloat16_type">bfloat16_type() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.binding_layout">binding_layout() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.binding_library_type">binding_library_type() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.binding_opclass">binding_opclass() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.binding_type">binding_type() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
</section>
|
||||
|
||||
<section id="C" class="genindex-section">
|
||||
<h2>C</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.check.calculate_smem_usage">calculate_smem_usage() (in module cutlass.utils.check)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.check.calculate_smem_usage_per_stage">calculate_smem_usage_per_stage() (in module cutlass.utils.check)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.op.html#cutlass.op.gemm.Gemm.compile">compile() (cutlass.op.gemm.Gemm method)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.op.html#cutlass.op.gemm.Gemm.construct">construct() (cutlass.op.gemm.Gemm method)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="cutlass.op.html#cutlass.op.gemm_grouped.GroupedGemm.construct">(cutlass.op.gemm_grouped.GroupedGemm method)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.construct_backend_td">construct_backend_td() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.cupy_library_type">cupy_library_type() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.cupy_type">cupy_type() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
<li>
|
||||
cutlass.emit.common
|
||||
|
||||
<ul>
|
||||
<li><a href="cutlass.emit.html#module-cutlass.emit.common">module</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li>
|
||||
cutlass.emit.pytorch
|
||||
|
||||
<ul>
|
||||
<li><a href="cutlass.emit.html#module-cutlass.emit.pytorch">module</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li>
|
||||
cutlass.epilogue
|
||||
|
||||
<ul>
|
||||
<li><a href="cutlass.html#module-cutlass.epilogue">module</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li>
|
||||
cutlass.library_defaults
|
||||
|
||||
<ul>
|
||||
<li><a href="cutlass.html#module-cutlass.library_defaults">module</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li>
|
||||
cutlass.op.gemm
|
||||
|
||||
<ul>
|
||||
<li><a href="cutlass.op.html#module-cutlass.op.gemm">module</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li>
|
||||
cutlass.op.gemm_grouped
|
||||
|
||||
<ul>
|
||||
<li><a href="cutlass.op.html#module-cutlass.op.gemm_grouped">module</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li>
|
||||
cutlass.op.op
|
||||
|
||||
<ul>
|
||||
<li><a href="cutlass.op.html#module-cutlass.op.op">module</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li>
|
||||
cutlass.swizzle
|
||||
|
||||
<ul>
|
||||
<li><a href="cutlass.html#module-cutlass.swizzle">module</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li>
|
||||
cutlass.utils.check
|
||||
|
||||
<ul>
|
||||
<li><a href="cutlass.utils.html#module-cutlass.utils.check">module</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li>
|
||||
cutlass.utils.datatypes
|
||||
|
||||
<ul>
|
||||
<li><a href="cutlass.utils.html#module-cutlass.utils.datatypes">module</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
</section>
|
||||
|
||||
<section id="F" class="genindex-section">
|
||||
<h2>F</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.html#cutlass.library_defaults.KernelsForDataType.find_alignment">find_alignment() (cutlass.library_defaults.KernelsForDataType method)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
</section>
|
||||
|
||||
<section id="G" class="genindex-section">
|
||||
<h2>G</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.op.html#cutlass.op.gemm.Gemm">Gemm (class in cutlass.op.gemm)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.html#cutlass.epilogue.get_activation_epilogue">get_activation_epilogue() (in module cutlass.epilogue)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.html#cutlass.epilogue.get_activations">get_activations() (in module cutlass.epilogue)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.get_datatype_and_layout">get_datatype_and_layout() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.html#cutlass.swizzle.get_swizzling_functors">get_swizzling_functors() (in module cutlass.swizzle)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.op.html#cutlass.op.gemm_grouped.GroupedGemm">GroupedGemm (class in cutlass.op.gemm_grouped)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
</section>
|
||||
|
||||
<section id="H" class="genindex-section">
|
||||
<h2>H</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.has_binding_type">has_binding_type() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
</section>
|
||||
|
||||
<section id="K" class="genindex-section">
|
||||
<h2>K</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.html#cutlass.library_defaults.KernelsForDataType">KernelsForDataType (class in cutlass.library_defaults)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
</section>
|
||||
|
||||
<section id="L" class="genindex-section">
|
||||
<h2>L</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.library_layout">library_layout() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.library_to_binding">library_to_binding() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.library_type">library_type() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
</section>
|
||||
|
||||
<section id="M" class="genindex-section">
|
||||
<h2>M</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li>
|
||||
module
|
||||
|
||||
<ul>
|
||||
<li><a href="cutlass.emit.html#module-cutlass.emit.common">cutlass.emit.common</a>
|
||||
</li>
|
||||
<li><a href="cutlass.emit.html#module-cutlass.emit.pytorch">cutlass.emit.pytorch</a>
|
||||
</li>
|
||||
<li><a href="cutlass.html#module-cutlass.epilogue">cutlass.epilogue</a>
|
||||
</li>
|
||||
<li><a href="cutlass.html#module-cutlass.library_defaults">cutlass.library_defaults</a>
|
||||
</li>
|
||||
<li><a href="cutlass.op.html#module-cutlass.op.gemm">cutlass.op.gemm</a>
|
||||
</li>
|
||||
<li><a href="cutlass.op.html#module-cutlass.op.gemm_grouped">cutlass.op.gemm_grouped</a>
|
||||
</li>
|
||||
<li><a href="cutlass.op.html#module-cutlass.op.op">cutlass.op.op</a>
|
||||
</li>
|
||||
<li><a href="cutlass.html#module-cutlass.swizzle">cutlass.swizzle</a>
|
||||
</li>
|
||||
<li><a href="cutlass.utils.html#module-cutlass.utils.check">cutlass.utils.check</a>
|
||||
</li>
|
||||
<li><a href="cutlass.utils.html#module-cutlass.utils.datatypes">cutlass.utils.datatypes</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
</section>
|
||||
|
||||
<section id="N" class="genindex-section">
|
||||
<h2>N</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.numpy_library_type">numpy_library_type() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.numpy_type">numpy_type() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
</section>
|
||||
|
||||
<section id="O" class="genindex-section">
|
||||
<h2>O</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.op.html#cutlass.op.gemm.Gemm.opclass">opclass (cutlass.op.gemm.Gemm property)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.html#cutlass.library_defaults.ArchOptions.opclass_supports_combination">opclass_supports_combination() (cutlass.library_defaults.ArchOptions method)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.op.html#cutlass.op.op.OperationBase">OperationBase (class in cutlass.op.op)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.html#cutlass.library_defaults.ArchOptions.operations">operations() (cutlass.library_defaults.ArchOptions method)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="cutlass.html#cutlass.library_defaults.KernelsForDataType.operations">(cutlass.library_defaults.KernelsForDataType method)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li><a href="cutlass.html#cutlass.library_defaults.OptionRegistry">OptionRegistry (class in cutlass.library_defaults)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.html#cutlass.library_defaults.OptionRegistry.options_for_cc">options_for_cc() (cutlass.library_defaults.OptionRegistry method)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
</section>
|
||||
|
||||
<section id="P" class="genindex-section">
|
||||
<h2>P</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.emit.html#cutlass.emit.pytorch.pytorch">pytorch() (in module cutlass.emit.pytorch)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
</section>
|
||||
|
||||
<section id="R" class="genindex-section">
|
||||
<h2>R</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.op.html#cutlass.op.gemm.Gemm.run">run() (cutlass.op.gemm.Gemm method)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="cutlass.op.html#cutlass.op.gemm_grouped.GroupedGemm.run">(cutlass.op.gemm_grouped.GroupedGemm method)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
</section>
|
||||
|
||||
<section id="S" class="genindex-section">
|
||||
<h2>S</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.html#cutlass.library_defaults.KernelsForDataType.sort">sort() (cutlass.library_defaults.KernelsForDataType method)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.html#cutlass.library_defaults.ArchOptions.supporting_opclasses">supporting_opclasses() (cutlass.library_defaults.ArchOptions method)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.op.html#cutlass.op.gemm.Gemm.swizzling_functor">swizzling_functor (cutlass.op.gemm.Gemm property)</a>
|
||||
|
||||
<ul>
|
||||
<li><a href="cutlass.op.html#cutlass.op.gemm_grouped.GroupedGemm.swizzling_functor">(cutlass.op.gemm_grouped.GroupedGemm property)</a>
|
||||
</li>
|
||||
</ul></li>
|
||||
<li><a href="cutlass.op.html#cutlass.op.op.OperationBase.swizzling_functors">swizzling_functors() (cutlass.op.op.OperationBase method)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
</section>
|
||||
|
||||
<section id="T" class="genindex-section">
|
||||
<h2>T</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.td_from_profiler_op">td_from_profiler_op() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.td_from_profiler_td">td_from_profiler_td() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.op.html#cutlass.op.gemm.Gemm.tile_descriptions">tile_descriptions() (cutlass.op.gemm.Gemm method)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.torch_library_type">torch_library_type() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.datatypes.torch_type">torch_type() (in module cutlass.utils.datatypes)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
</section>
|
||||
|
||||
<section id="V" class="genindex-section">
|
||||
<h2>V</h2>
|
||||
<table style="width: 100%" class="indextable genindextable"><tr>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.check.valid_cluster_shape">valid_cluster_shape() (in module cutlass.utils.check)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
<td style="width: 33%; vertical-align: top;"><ul>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.check.valid_kernel_schedule">valid_kernel_schedule() (in module cutlass.utils.check)</a>
|
||||
</li>
|
||||
<li><a href="cutlass.utils.html#cutlass.utils.check.valid_stage_count">valid_stage_count() (in module cutlass.utils.check)</a>
|
||||
</li>
|
||||
</ul></td>
|
||||
</tr></table>
|
||||
</section>
|
||||
|
||||
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
|
||||
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer no-toc">
|
||||
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/scripts/furo.js"></script>
|
||||
<script src="_static/clipboard.min.js"></script>
|
||||
<script src="_static/copybutton.js"></script>
|
||||
<script src="_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
529
python/docs/index.html
Normal file
@ -0,0 +1,529 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
<link rel="index" title="Index" href="genindex.html" /><link rel="search" title="Search" href="search.html" /><link rel="next" title="Installation" href="install.html" />
|
||||
<link rel="canonical" href="docs/index.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="#"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="#">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul class="current">
|
||||
<li class="toctree-l1 current current-page"><a class="current reference internal" href="#">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<section id="cutlass-python-interface">
|
||||
<h1>CUTLASS Python Interface<a class="headerlink" href="#cutlass-python-interface" title="Permalink to this heading">#</a></h1>
|
||||
<p>The CUTLASS Python interface enables one to compile and run CUTLASS operations from within Python.</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">cutlass</span>
|
||||
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
|
||||
|
||||
<span class="n">plan</span> <span class="o">=</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">op</span><span class="o">.</span><span class="n">Gemm</span><span class="p">(</span><span class="n">element</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">float16</span><span class="p">,</span> <span class="n">layout</span><span class="o">=</span><span class="n">cutlass</span><span class="o">.</span><span class="n">LayoutType</span><span class="o">.</span><span class="n">RowMajor</span><span class="p">)</span>
|
||||
<span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span> <span class="o">=</span> <span class="p">[</span><span class="n">np</span><span class="o">.</span><span class="n">ones</span><span class="p">((</span><span class="mi">4096</span><span class="p">,</span> <span class="mi">4096</span><span class="p">),</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">float16</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">)]</span>
|
||||
<span class="n">plan</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="p">,</span> <span class="n">C</span><span class="p">,</span> <span class="n">D</span><span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p><strong>NOTE:</strong> The CUTLASS Python interface is currently an experimental release. The API may change in the future.
|
||||
We welcome feedback from the community.</p>
|
||||
<section id="overview">
|
||||
<h2>Overview<a class="headerlink" href="#overview" title="Permalink to this heading">#</a></h2>
|
||||
<p>The CUTLASS Python interface aims to provide an ease-of-use interface for using CUTLASS via Python. Toward this goal,
|
||||
the CUTLASS Python interface attempts to:</p>
|
||||
<ul class="simple">
|
||||
<li><p>Present high-level interfaces for operators that require only few parameters</p></li>
|
||||
<li><p>Select sensible default configurations for an operator given the parameters that have been specified</p></li>
|
||||
<li><p>Enumerate configurations for users that are known to work in a given setting</p></li>
|
||||
<li><p>Reduce the occurrence of C++ compile-time errors in favor of descriptive Python exceptions</p></li>
|
||||
<li><p>Make it easy to export CUTLASS kernels to framework extensions (e.g., PyTorch CUDA extensions)</p></li>
|
||||
</ul>
|
||||
<section id="non-goals">
|
||||
<h3>Non-goals<a class="headerlink" href="#non-goals" title="Permalink to this heading">#</a></h3>
|
||||
<p>The CUTLASS Python interface does not intended to:</p>
|
||||
<p><strong>Select optimal kernel configurations.</strong>
|
||||
As an ease-of-use interface, the default selections for operator parameters made by the CUTLASS Python interface may
|
||||
not achieve the highest possible performance in all scenarios. Users wishing to achieve the highest performance possible
|
||||
should consider profile different combinations of configuration parameters, or use a library such as <a class="reference external" href="https://developer.nvidia.com/cublas">cuBLAS</a>
|
||||
that contains heuristics for selecting kernels.</p>
|
||||
<p><strong>Act as a fast container for CUTLASS kernels.</strong>
|
||||
The CUTLASS Python interface does not strive to minimize overhead in its Python functions surrounding the running of a kernel.
|
||||
Those wishing to deploy a CUTLASS kernel should consider either using the C++ emitted by the Python interface directly, or using
|
||||
one of the CUTLASS emitters for automatically creating a framework extension for the kernel (e.g., a PyTorch CUDA extension).</p>
|
||||
<p><strong>Act as a Python-to-CUDA-kernel JIT compilation engine.</strong>
|
||||
The CUTLASS Python interface intends to enable one to use CUTLASS via Python. It can be used by frameworks for JIT compiling
|
||||
Python to CUDA kernels, but does not set out to be such a framework.</p>
|
||||
</section>
|
||||
<section id="comparison-to-pycutlass">
|
||||
<h3>Comparison to PyCUTLASS<a class="headerlink" href="#comparison-to-pycutlass" title="Permalink to this heading">#</a></h3>
|
||||
<p>The CUTLASS Python interface builds atop CUTLASS’s <a class="reference external" href="https://github.com/NVIDIA/cutlass/tree/v3.0.0/tools/library/scripts/pycutlass">PyCUTLASS</a> library. PyCUTLASS enables
|
||||
one to declare, compile, and run GEMMs, convolutions, and grouped GEMM operators with nearly the same configuration
|
||||
space as CUTLASS’s C++ interface. While this flexibility enables one to achieve the similar levels of functionality
|
||||
as available in CUTLASS’s C++ interface, it comes with the burden of needing to specify many configuration parameters
|
||||
to operators – similar to what one must do in specifying template parameters to operations in CUTLASS’s C++ interface.</p>
|
||||
<p>In contrast, the CUTLASS Python interface aims to provide a higher-level API for declaring, emitting, and compiling
|
||||
kernels that does not require exhaustively defining template parameters.</p>
|
||||
<section id="transitioning-from-pycutlass">
|
||||
<h4>Transitioning from PyCUTLASS<a class="headerlink" href="#transitioning-from-pycutlass" title="Permalink to this heading">#</a></h4>
|
||||
<p>At present, existing PyCUTLASS functionality remains available via the CUTLASS Python interface. One can
|
||||
continue to use PyCUTLASS by replacing references to the PyCUTLASS <code class="docutils literal notranslate"><span class="pre">cutlass</span></code> module with <code class="docutils literal notranslate"><span class="pre">cutlass_bindings</span></code>
|
||||
and the PyCUTLASS <code class="docutils literal notranslate"><span class="pre">pycutlass</span></code> module with <code class="docutils literal notranslate"><span class="pre">cutlass.backend</span></code>.</p>
|
||||
<p>For example, the following code using PyCUTLASS:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">pycutlass</span>
|
||||
<span class="kn">import</span> <span class="nn">cutlass</span>
|
||||
|
||||
<span class="n">math_inst</span> <span class="o">=</span> <span class="n">pycutlass</span><span class="o">.</span><span class="n">MathInstruction</span><span class="p">(</span>
|
||||
<span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span> <span class="n">cutlass</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span>
|
||||
<span class="n">cutlass</span><span class="o">.</span><span class="n">OpClass</span><span class="o">.</span><span class="n">Simt</span><span class="p">,</span> <span class="n">pycutlass</span><span class="o">.</span><span class="n">MathOperation</span><span class="o">.</span><span class="n">multiply_add</span>
|
||||
<span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>can work with the Python interface via:</p>
|
||||
<div class="highlight-python notranslate"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">cutlass.backend</span> <span class="k">as</span> <span class="nn">pycutlass</span>
|
||||
<span class="kn">import</span> <span class="nn">cutlass_bindings</span>
|
||||
|
||||
<span class="n">math_inst</span> <span class="o">=</span> <span class="n">pycutlass</span><span class="o">.</span><span class="n">MathInstruction</span><span class="p">(</span>
|
||||
<span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span> <span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">float32</span><span class="p">,</span>
|
||||
<span class="n">cutlass_bindings</span><span class="o">.</span><span class="n">OpClass</span><span class="o">.</span><span class="n">Simt</span><span class="p">,</span> <span class="n">pycutlass</span><span class="o">.</span><span class="n">MathOperation</span><span class="o">.</span><span class="n">multiply_add</span>
|
||||
<span class="p">)</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
<p><strong>NOTE:</strong> backwards compatibility of <code class="docutils literal notranslate"><span class="pre">cutlass.backend</span></code> with <code class="docutils literal notranslate"><span class="pre">pycutlass</span></code> will not be maintained moving forward.</p>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
<section id="current-functionality">
|
||||
<h2>Current functionality<a class="headerlink" href="#current-functionality" title="Permalink to this heading">#</a></h2>
|
||||
<p>The CUTLASS Python interface currently supports the following operations:</p>
|
||||
<ul class="simple">
|
||||
<li><p>GEMMs</p></li>
|
||||
<li><p>GEMMs with fused elementwise epilogues (e.g., ReLU) (for pre-SM90 kernels)</p></li>
|
||||
<li><p>Stream K swizzling (for pre-SM90 kernels)</p></li>
|
||||
<li><p>Grouped GEMM (for pre-SM90 kernels)</p></li>
|
||||
</ul>
|
||||
</section>
|
||||
<section id="getting-started">
|
||||
<h2>Getting started<a class="headerlink" href="#getting-started" title="Permalink to this heading">#</a></h2>
|
||||
<p>We recommend using the CUTLASS Python interface via one of the Docker images located in the <a href="#id1"><span class="problematic" id="id2">docker</span></a> directory.</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>docker<span class="w"> </span>build<span class="w"> </span>-t<span class="w"> </span>cutlass-cuda12.1:latest<span class="w"> </span>-f<span class="w"> </span>docker/Dockerfile-cuda12.1-pytorch<span class="w"> </span>.
|
||||
docker<span class="w"> </span>run<span class="w"> </span>--gpus<span class="w"> </span>all<span class="w"> </span>-it<span class="w"> </span>--rm<span class="w"> </span>cutlass-cuda12.1:latest
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The CUTLASS Python interface has been tested with CUDA 11.8, 12.0, and 12.1 on Python 3.8.10 and 3.9.7.</p>
|
||||
<section id="optional-environment-variables">
|
||||
<h3>Optional environment variables<a class="headerlink" href="#optional-environment-variables" title="Permalink to this heading">#</a></h3>
|
||||
<p>Prior to installing the CUTLASS Python interface, one may optionally set the following environment variables:</p>
|
||||
<ul class="simple">
|
||||
<li><p><code class="docutils literal notranslate"><span class="pre">CUTLASS_PATH</span></code>: the path to the cloned CUTLASS repository</p></li>
|
||||
<li><p><code class="docutils literal notranslate"><span class="pre">CUDA_INSTALL_PATH</span></code>: the path to the installation of CUDA</p></li>
|
||||
</ul>
|
||||
<p>If these environment variables are not set, the installation process will infer them to be the following:</p>
|
||||
<ul class="simple">
|
||||
<li><p><code class="docutils literal notranslate"><span class="pre">CUTLASS_PATH</span></code>: one directory level above the current directory (i.e., <code class="docutils literal notranslate"><span class="pre">$(pwd)/..</span></code>)</p></li>
|
||||
<li><p><code class="docutils literal notranslate"><span class="pre">CUDA_INSTALL_PATH</span></code>: the directory holding <code class="docutils literal notranslate"><span class="pre">/bin/nvcc</span></code> for the first version of <code class="docutils literal notranslate"><span class="pre">nvcc</span></code> on <code class="docutils literal notranslate"><span class="pre">$PATH</span></code> (i.e., <code class="docutils literal notranslate"><span class="pre">which</span> <span class="pre">nvcc</span> <span class="pre">|</span> <span class="pre">awk</span> <span class="pre">-F'/bin/nvcc'</span> <span class="pre">'{print</span> <span class="pre">$1}'</span></code>)</p></li>
|
||||
</ul>
|
||||
<p><strong>NOTE:</strong> The version of <code class="docutils literal notranslate"><span class="pre">cuda-python</span></code> installed must match the CUDA version in <code class="docutils literal notranslate"><span class="pre">CUDA_INSTALL_PATH</span></code>.</p>
|
||||
</section>
|
||||
<section id="installation">
|
||||
<h3>Installation<a class="headerlink" href="#installation" title="Permalink to this heading">#</a></h3>
|
||||
<p>The CUTLASS Python interface can currently be installed via:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>setup.py<span class="w"> </span>develop<span class="w"> </span>--user
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>This will allow changes to the Python interface source to be reflected when using the Python interface.</p>
|
||||
<p>We plan to add support for installing via <code class="docutils literal notranslate"><span class="pre">python</span> <span class="pre">setup.py</span> <span class="pre">install</span></code> in a future release.</p>
|
||||
</section>
|
||||
</section>
|
||||
<section id="examples">
|
||||
<h2>Examples<a class="headerlink" href="#examples" title="Permalink to this heading">#</a></h2>
|
||||
<p>Jupyter notebook examples of using the CUTLASS Python interface are located in <a href="#id3"><span class="problematic" id="id4">examples/python</span></a>.</p>
|
||||
<p>To launch these notebooks from this directory, run:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>jupyter-lab<span class="w"> </span>../examples/python
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
<section id="building-documentation">
|
||||
<h2>Building documentation<a class="headerlink" href="#building-documentation" title="Permalink to this heading">#</a></h2>
|
||||
<p>The CUTLASS Python interface uses <a class="reference external" href="https://www.sphinx-doc.org/en/master/">Sphinx</a> for documentation.</p>
|
||||
<p>Building the documentation requires additional packages. These can be installed via:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sudo<span class="w"> </span>apt-get<span class="w"> </span>install<span class="w"> </span>pandoc
|
||||
pip<span class="w"> </span>install<span class="w"> </span>--upgrade<span class="w"> </span>Sphinx<span class="w"> </span>furo<span class="w"> </span>pandoc<span class="w"> </span>myst-parser<span class="w"> </span>sphinx-copybutton<span class="w"> </span>nbsphinx<span class="w"> </span>nbsphinx-link<span class="w"> </span>sphinx-inline-tabs
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>To build documentation, you must first have installed the CUTLASS Python interface via the
|
||||
<a class="reference internal" href="install.html#installation"><span class="std std-ref">installation instructions</span></a>.</p>
|
||||
<p>Documentation can then be built via the following commands:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>sphinx-apidoc<span class="w"> </span>-o<span class="w"> </span>docs_src/source/<span class="w"> </span>cutlass/<span class="w"> </span>cutlass/backend*
|
||||
<span class="nb">cd</span><span class="w"> </span>docs_src
|
||||
make<span class="w"> </span>html
|
||||
mv<span class="w"> </span>_build/*<span class="w"> </span>../docs
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
</section>
|
||||
<section id="copyright">
|
||||
<h1>Copyright<a class="headerlink" href="#copyright" title="Permalink to this heading">#</a></h1>
|
||||
<p>Copyright (c) 2023 - 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
|
||||
SPDX-License-Identifier: BSD-3-Clause</p>
|
||||
<div class="highlight-default notranslate"><div class="highlight"><pre><span></span> <span class="n">Redistribution</span> <span class="ow">and</span> <span class="n">use</span> <span class="ow">in</span> <span class="n">source</span> <span class="ow">and</span> <span class="n">binary</span> <span class="n">forms</span><span class="p">,</span> <span class="k">with</span> <span class="ow">or</span> <span class="n">without</span>
|
||||
<span class="n">modification</span><span class="p">,</span> <span class="n">are</span> <span class="n">permitted</span> <span class="n">provided</span> <span class="n">that</span> <span class="n">the</span> <span class="n">following</span> <span class="n">conditions</span> <span class="n">are</span> <span class="n">met</span><span class="p">:</span>
|
||||
|
||||
<span class="mf">1.</span> <span class="n">Redistributions</span> <span class="n">of</span> <span class="n">source</span> <span class="n">code</span> <span class="n">must</span> <span class="n">retain</span> <span class="n">the</span> <span class="n">above</span> <span class="n">copyright</span> <span class="n">notice</span><span class="p">,</span> <span class="n">this</span>
|
||||
<span class="nb">list</span> <span class="n">of</span> <span class="n">conditions</span> <span class="ow">and</span> <span class="n">the</span> <span class="n">following</span> <span class="n">disclaimer</span><span class="o">.</span>
|
||||
|
||||
<span class="mf">2.</span> <span class="n">Redistributions</span> <span class="ow">in</span> <span class="n">binary</span> <span class="n">form</span> <span class="n">must</span> <span class="n">reproduce</span> <span class="n">the</span> <span class="n">above</span> <span class="n">copyright</span> <span class="n">notice</span><span class="p">,</span>
|
||||
<span class="n">this</span> <span class="nb">list</span> <span class="n">of</span> <span class="n">conditions</span> <span class="ow">and</span> <span class="n">the</span> <span class="n">following</span> <span class="n">disclaimer</span> <span class="ow">in</span> <span class="n">the</span> <span class="n">documentation</span>
|
||||
<span class="ow">and</span><span class="o">/</span><span class="ow">or</span> <span class="n">other</span> <span class="n">materials</span> <span class="n">provided</span> <span class="k">with</span> <span class="n">the</span> <span class="n">distribution</span><span class="o">.</span>
|
||||
|
||||
<span class="mf">3.</span> <span class="n">Neither</span> <span class="n">the</span> <span class="n">name</span> <span class="n">of</span> <span class="n">the</span> <span class="n">copyright</span> <span class="n">holder</span> <span class="n">nor</span> <span class="n">the</span> <span class="n">names</span> <span class="n">of</span> <span class="n">its</span>
|
||||
<span class="n">contributors</span> <span class="n">may</span> <span class="n">be</span> <span class="n">used</span> <span class="n">to</span> <span class="n">endorse</span> <span class="ow">or</span> <span class="n">promote</span> <span class="n">products</span> <span class="n">derived</span> <span class="kn">from</span>
|
||||
<span class="nn">this</span> <span class="n">software</span> <span class="n">without</span> <span class="n">specific</span> <span class="n">prior</span> <span class="n">written</span> <span class="n">permission</span><span class="o">.</span>
|
||||
|
||||
<span class="n">THIS</span> <span class="n">SOFTWARE</span> <span class="n">IS</span> <span class="n">PROVIDED</span> <span class="n">BY</span> <span class="n">THE</span> <span class="n">COPYRIGHT</span> <span class="n">HOLDERS</span> <span class="n">AND</span> <span class="n">CONTRIBUTORS</span> <span class="s2">"AS IS"</span>
|
||||
<span class="n">AND</span> <span class="n">ANY</span> <span class="n">EXPRESS</span> <span class="n">OR</span> <span class="n">IMPLIED</span> <span class="n">WARRANTIES</span><span class="p">,</span> <span class="n">INCLUDING</span><span class="p">,</span> <span class="n">BUT</span> <span class="n">NOT</span> <span class="n">LIMITED</span> <span class="n">TO</span><span class="p">,</span> <span class="n">THE</span>
|
||||
<span class="n">IMPLIED</span> <span class="n">WARRANTIES</span> <span class="n">OF</span> <span class="n">MERCHANTABILITY</span> <span class="n">AND</span> <span class="n">FITNESS</span> <span class="n">FOR</span> <span class="n">A</span> <span class="n">PARTICULAR</span> <span class="n">PURPOSE</span> <span class="n">ARE</span>
|
||||
<span class="n">DISCLAIMED</span><span class="o">.</span> <span class="n">IN</span> <span class="n">NO</span> <span class="n">EVENT</span> <span class="n">SHALL</span> <span class="n">THE</span> <span class="n">COPYRIGHT</span> <span class="n">HOLDER</span> <span class="n">OR</span> <span class="n">CONTRIBUTORS</span> <span class="n">BE</span> <span class="n">LIABLE</span>
|
||||
<span class="n">FOR</span> <span class="n">ANY</span> <span class="n">DIRECT</span><span class="p">,</span> <span class="n">INDIRECT</span><span class="p">,</span> <span class="n">INCIDENTAL</span><span class="p">,</span> <span class="n">SPECIAL</span><span class="p">,</span> <span class="n">EXEMPLARY</span><span class="p">,</span> <span class="n">OR</span> <span class="n">CONSEQUENTIAL</span>
|
||||
<span class="n">DAMAGES</span> <span class="p">(</span><span class="n">INCLUDING</span><span class="p">,</span> <span class="n">BUT</span> <span class="n">NOT</span> <span class="n">LIMITED</span> <span class="n">TO</span><span class="p">,</span> <span class="n">PROCUREMENT</span> <span class="n">OF</span> <span class="n">SUBSTITUTE</span> <span class="n">GOODS</span> <span class="n">OR</span>
|
||||
<span class="n">SERVICES</span><span class="p">;</span> <span class="n">LOSS</span> <span class="n">OF</span> <span class="n">USE</span><span class="p">,</span> <span class="n">DATA</span><span class="p">,</span> <span class="n">OR</span> <span class="n">PROFITS</span><span class="p">;</span> <span class="n">OR</span> <span class="n">BUSINESS</span> <span class="n">INTERRUPTION</span><span class="p">)</span> <span class="n">HOWEVER</span>
|
||||
<span class="n">CAUSED</span> <span class="n">AND</span> <span class="n">ON</span> <span class="n">ANY</span> <span class="n">THEORY</span> <span class="n">OF</span> <span class="n">LIABILITY</span><span class="p">,</span> <span class="n">WHETHER</span> <span class="n">IN</span> <span class="n">CONTRACT</span><span class="p">,</span> <span class="n">STRICT</span> <span class="n">LIABILITY</span><span class="p">,</span>
|
||||
<span class="n">OR</span> <span class="n">TORT</span> <span class="p">(</span><span class="n">INCLUDING</span> <span class="n">NEGLIGENCE</span> <span class="n">OR</span> <span class="n">OTHERWISE</span><span class="p">)</span> <span class="n">ARISING</span> <span class="n">IN</span> <span class="n">ANY</span> <span class="n">WAY</span> <span class="n">OUT</span> <span class="n">OF</span> <span class="n">THE</span> <span class="n">USE</span>
|
||||
<span class="n">OF</span> <span class="n">THIS</span> <span class="n">SOFTWARE</span><span class="p">,</span> <span class="n">EVEN</span> <span class="n">IF</span> <span class="n">ADVISED</span> <span class="n">OF</span> <span class="n">THE</span> <span class="n">POSSIBILITY</span> <span class="n">OF</span> <span class="n">SUCH</span> <span class="n">DAMAGE</span><span class="o">.</span>
|
||||
</pre></div>
|
||||
</div>
|
||||
</section>
|
||||
<div class="toctree-wrapper compound">
|
||||
</div>
|
||||
<div class="toctree-wrapper compound">
|
||||
</div>
|
||||
<div class="toctree-wrapper compound">
|
||||
</div>
|
||||
<div class="toctree-wrapper compound">
|
||||
</div>
|
||||
<div class="toctree-wrapper compound">
|
||||
</div>
|
||||
<div class="toctree-wrapper compound">
|
||||
</div>
|
||||
<div class="toctree-wrapper compound">
|
||||
</div>
|
||||
<section id="indices-and-tables">
|
||||
<h1>Indices and tables<a class="headerlink" href="#indices-and-tables" title="Permalink to this heading">#</a></h1>
|
||||
<ul class="simple">
|
||||
<li><p><a class="reference internal" href="genindex.html"><span class="std std-ref">Index</span></a></p></li>
|
||||
<li><p><a class="reference internal" href="py-modindex.html"><span class="std std-ref">Module Index</span></a></p></li>
|
||||
<li><p><a class="reference internal" href="search.html"><span class="std std-ref">Search Page</span></a></p></li>
|
||||
</ul>
|
||||
</section>
|
||||
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
<a class="next-page" href="install.html">
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Next</span>
|
||||
</div>
|
||||
<div class="title">Installation</div>
|
||||
</div>
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer">
|
||||
|
||||
|
||||
<div class="toc-sticky toc-scroll">
|
||||
<div class="toc-title-container">
|
||||
<span class="toc-title">
|
||||
On this page
|
||||
</span>
|
||||
</div>
|
||||
<div class="toc-tree-container">
|
||||
<div class="toc-tree">
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">CUTLASS Python Interface</a><ul>
|
||||
<li><a class="reference internal" href="#overview">Overview</a><ul>
|
||||
<li><a class="reference internal" href="#non-goals">Non-goals</a></li>
|
||||
<li><a class="reference internal" href="#comparison-to-pycutlass">Comparison to PyCUTLASS</a><ul>
|
||||
<li><a class="reference internal" href="#transitioning-from-pycutlass">Transitioning from PyCUTLASS</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#current-functionality">Current functionality</a></li>
|
||||
<li><a class="reference internal" href="#getting-started">Getting started</a><ul>
|
||||
<li><a class="reference internal" href="#optional-environment-variables">Optional environment variables</a></li>
|
||||
<li><a class="reference internal" href="#installation">Installation</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#examples">Examples</a></li>
|
||||
<li><a class="reference internal" href="#building-documentation">Building documentation</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#copyright">Copyright</a></li>
|
||||
<li><a class="reference internal" href="#indices-and-tables">Indices and tables</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/scripts/furo.js"></script>
|
||||
<script src="_static/clipboard.min.js"></script>
|
||||
<script src="_static/copybutton.js"></script>
|
||||
<script src="_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
364
python/docs/install.html
Normal file
@ -0,0 +1,364 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
<link rel="index" title="Index" href="genindex.html" /><link rel="search" title="Search" href="search.html" /><link rel="next" title="Basic example of using the CUTLASS Python interface" href="externals/00_basic_gemm.html" /><link rel="prev" title="CUTLASS Python Interface" href="index.html" />
|
||||
<link rel="canonical" href="docs/install.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>Installation - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul class="current">
|
||||
<li class="toctree-l1 current current-page"><a class="current reference internal" href="#">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<section id="installation">
|
||||
<h1>Installation<a class="headerlink" href="#installation" title="Permalink to this heading">#</a></h1>
|
||||
<section id="installing-from-source">
|
||||
<h2>Installing from source<a class="headerlink" href="#installing-from-source" title="Permalink to this heading">#</a></h2>
|
||||
<p>Installing from source requires the latest CUDA Toolkit that matches the major.minor of CUDA Python installed.</p>
|
||||
<p>Prior to installing the CUTLASS Python interface, one may optionally set the following environment variables:</p>
|
||||
<ul class="simple">
|
||||
<li><p><code class="docutils literal notranslate"><span class="pre">CUTLASS_PATH</span></code>: the path to the cloned CUTLASS repository</p></li>
|
||||
<li><p><code class="docutils literal notranslate"><span class="pre">CUDA_INSTALL_PATH</span></code>: the path to the installation of CUDA</p></li>
|
||||
</ul>
|
||||
<p>If these environment variables are not set, the installation process will infer them to be the following:</p>
|
||||
<ul class="simple">
|
||||
<li><p><code class="docutils literal notranslate"><span class="pre">CUTLASS_PATH</span></code>: one directory level above the current directory (i.e., <code class="docutils literal notranslate"><span class="pre">$(pwd)/..</span></code>)</p></li>
|
||||
<li><p><code class="docutils literal notranslate"><span class="pre">CUDA_INSTALL_PATH</span></code>: the directory holding <code class="docutils literal notranslate"><span class="pre">/bin/nvcc</span></code> for the first version of <code class="docutils literal notranslate"><span class="pre">nvcc</span></code> on <code class="docutils literal notranslate"><span class="pre">$PATH</span></code> (i.e., <code class="docutils literal notranslate"><span class="pre">which</span> <span class="pre">nvcc</span> <span class="pre">|</span> <span class="pre">awk</span> <span class="pre">-F'/bin/nvcc'</span> <span class="pre">'{print</span> <span class="pre">$1}'</span></code>)</p></li>
|
||||
</ul>
|
||||
<p><strong>NOTE:</strong> The version of <code class="docutils literal notranslate"><span class="pre">cuda-python</span></code> installed must match the CUDA version in <code class="docutils literal notranslate"><span class="pre">CUDA_INSTALL_PATH</span></code>.</p>
|
||||
<section id="installing-a-developer-mode-package">
|
||||
<h3>Installing a developer-mode package<a class="headerlink" href="#installing-a-developer-mode-package" title="Permalink to this heading">#</a></h3>
|
||||
<p>The CUTLASS Python interface can currently be installed via:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>python<span class="w"> </span>setup.py<span class="w"> </span>develop<span class="w"> </span>--user
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>This will allow changes to the Python interface source to be reflected when using the Python interface.</p>
|
||||
<p>We plan to add support for installing via <code class="docutils literal notranslate"><span class="pre">python</span> <span class="pre">setup.py</span> <span class="pre">install</span></code> in a future release.</p>
|
||||
</section>
|
||||
</section>
|
||||
<section id="docker">
|
||||
<h2>Docker<a class="headerlink" href="#docker" title="Permalink to this heading">#</a></h2>
|
||||
<p>To ensure that you have all of the necessary Python modules for running the examples using the
|
||||
CUTLASS Python interface, we recommend using one of the Docker images located in the docker directory.</p>
|
||||
<p>For example, to build and launch a container that uses CUDA 12.1 via an NGC PyTorch container, run:</p>
|
||||
<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>docker<span class="w"> </span>build<span class="w"> </span>-t<span class="w"> </span>cutlass-cuda12.1:latest<span class="w"> </span>-f<span class="w"> </span>docker/Dockerfile-cuda12.1-pytorch<span class="w"> </span>.
|
||||
docker<span class="w"> </span>run<span class="w"> </span>--gpus<span class="w"> </span>all<span class="w"> </span>-it<span class="w"> </span>--rm<span class="w"> </span>cutlass-cuda12.1:latest
|
||||
</pre></div>
|
||||
</div>
|
||||
<p>The CUTLASS Python interface has been tested with CUDA 11.8, 12.0, and 12.1 on Python 3.8.10 and 3.9.7.</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
<a class="next-page" href="externals/00_basic_gemm.html">
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Next</span>
|
||||
</div>
|
||||
<div class="title">Basic example of using the CUTLASS Python interface</div>
|
||||
</div>
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
</a>
|
||||
<a class="prev-page" href="index.html">
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Previous</span>
|
||||
</div>
|
||||
|
||||
<div class="title">Home</div>
|
||||
|
||||
</div>
|
||||
</a>
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer">
|
||||
|
||||
|
||||
<div class="toc-sticky toc-scroll">
|
||||
<div class="toc-title-container">
|
||||
<span class="toc-title">
|
||||
On this page
|
||||
</span>
|
||||
</div>
|
||||
<div class="toc-tree-container">
|
||||
<div class="toc-tree">
|
||||
<ul>
|
||||
<li><a class="reference internal" href="#">Installation</a><ul>
|
||||
<li><a class="reference internal" href="#installing-from-source">Installing from source</a><ul>
|
||||
<li><a class="reference internal" href="#installing-a-developer-mode-package">Installing a developer-mode package</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a class="reference internal" href="#docker">Docker</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/scripts/furo.js"></script>
|
||||
<script src="_static/clipboard.min.js"></script>
|
||||
<script src="_static/copybutton.js"></script>
|
||||
<script src="_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
406
python/docs/modules.html
Normal file
@ -0,0 +1,406 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||||
<link rel="index" title="Index" href="genindex.html" /><link rel="search" title="Search" href="search.html" /><link rel="next" title="CUTLASS" href="cutlass.html" /><link rel="prev" title="Contributing" href="contribute.html" />
|
||||
<link rel="canonical" href="docs/modules.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 -->
|
||||
<title>CUTLASS Python API - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul class="current">
|
||||
<li class="toctree-l1 current has-children current-page"><a class="current reference internal" href="#">CUTLASS Python API</a><input checked="" class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
<section id="cutlass-python-api">
|
||||
<h1>CUTLASS Python API<a class="headerlink" href="#cutlass-python-api" title="Permalink to this heading">#</a></h1>
|
||||
<div class="toctree-wrapper compound">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="cutlass.html">CUTLASS</a><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="cutlass.html#subpackages">Subpackages</a><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.emit.html">Emitters</a><ul>
|
||||
<li class="toctree-l4"><a class="reference internal" href="cutlass.emit.html#module-cutlass.emit.common">Common</a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="cutlass.emit.html#module-cutlass.emit.pytorch">PyTorch</a><ul>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.emit.html#cutlass.emit.pytorch.pytorch"><code class="docutils literal notranslate"><span class="pre">pytorch()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.op.html">Operations</a><ul>
|
||||
<li class="toctree-l4"><a class="reference internal" href="cutlass.op.html#module-cutlass.op.gemm">GEMM</a><ul>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.op.html#cutlass.op.gemm.Gemm"><code class="docutils literal notranslate"><span class="pre">Gemm</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="cutlass.op.html#module-cutlass.op.gemm_grouped">Grouped GEMM</a><ul>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.op.html#cutlass.op.gemm_grouped.GroupedGemm"><code class="docutils literal notranslate"><span class="pre">GroupedGemm</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="cutlass.op.html#module-cutlass.op.op">Operation</a><ul>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.op.html#cutlass.op.op.OperationBase"><code class="docutils literal notranslate"><span class="pre">OperationBase</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.utils.html">Utilities</a><ul>
|
||||
<li class="toctree-l4"><a class="reference internal" href="cutlass.utils.html#module-cutlass.utils.check">Checks</a><ul>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.check.alignment_or_default"><code class="docutils literal notranslate"><span class="pre">alignment_or_default()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.check.calculate_smem_usage"><code class="docutils literal notranslate"><span class="pre">calculate_smem_usage()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.check.calculate_smem_usage_per_stage"><code class="docutils literal notranslate"><span class="pre">calculate_smem_usage_per_stage()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.check.valid_cluster_shape"><code class="docutils literal notranslate"><span class="pre">valid_cluster_shape()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.check.valid_kernel_schedule"><code class="docutils literal notranslate"><span class="pre">valid_kernel_schedule()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.check.valid_stage_count"><code class="docutils literal notranslate"><span class="pre">valid_stage_count()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="cutlass.utils.html#module-cutlass.utils.datatypes">Data Types</a><ul>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.backend_math_operation"><code class="docutils literal notranslate"><span class="pre">backend_math_operation()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.bfloat16_library_type"><code class="docutils literal notranslate"><span class="pre">bfloat16_library_type()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.bfloat16_type"><code class="docutils literal notranslate"><span class="pre">bfloat16_type()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.binding_layout"><code class="docutils literal notranslate"><span class="pre">binding_layout()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.binding_library_type"><code class="docutils literal notranslate"><span class="pre">binding_library_type()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.binding_opclass"><code class="docutils literal notranslate"><span class="pre">binding_opclass()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.binding_type"><code class="docutils literal notranslate"><span class="pre">binding_type()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.construct_backend_td"><code class="docutils literal notranslate"><span class="pre">construct_backend_td()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.cupy_library_type"><code class="docutils literal notranslate"><span class="pre">cupy_library_type()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.cupy_type"><code class="docutils literal notranslate"><span class="pre">cupy_type()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.get_datatype_and_layout"><code class="docutils literal notranslate"><span class="pre">get_datatype_and_layout()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.has_binding_type"><code class="docutils literal notranslate"><span class="pre">has_binding_type()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.library_layout"><code class="docutils literal notranslate"><span class="pre">library_layout()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.library_to_binding"><code class="docutils literal notranslate"><span class="pre">library_to_binding()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.library_type"><code class="docutils literal notranslate"><span class="pre">library_type()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.numpy_library_type"><code class="docutils literal notranslate"><span class="pre">numpy_library_type()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.numpy_type"><code class="docutils literal notranslate"><span class="pre">numpy_type()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.td_from_profiler_op"><code class="docutils literal notranslate"><span class="pre">td_from_profiler_op()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.td_from_profiler_td"><code class="docutils literal notranslate"><span class="pre">td_from_profiler_td()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.torch_library_type"><code class="docutils literal notranslate"><span class="pre">torch_library_type()</span></code></a></li>
|
||||
<li class="toctree-l5"><a class="reference internal" href="cutlass.utils.html#cutlass.utils.datatypes.torch_type"><code class="docutils literal notranslate"><span class="pre">torch_type()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="cutlass.html#module-cutlass.epilogue">Epilogue</a><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.html#cutlass.epilogue.get_activation_epilogue"><code class="docutils literal notranslate"><span class="pre">get_activation_epilogue()</span></code></a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.html#cutlass.epilogue.get_activations"><code class="docutils literal notranslate"><span class="pre">get_activations()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="cutlass.html#module-cutlass.library_defaults">Library Defaults</a><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.html#cutlass.library_defaults.ArchOptions"><code class="docutils literal notranslate"><span class="pre">ArchOptions</span></code></a><ul>
|
||||
<li class="toctree-l4"><a class="reference internal" href="cutlass.html#cutlass.library_defaults.ArchOptions.opclass_supports_combination"><code class="docutils literal notranslate"><span class="pre">ArchOptions.opclass_supports_combination()</span></code></a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="cutlass.html#cutlass.library_defaults.ArchOptions.operations"><code class="docutils literal notranslate"><span class="pre">ArchOptions.operations()</span></code></a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="cutlass.html#cutlass.library_defaults.ArchOptions.supporting_opclasses"><code class="docutils literal notranslate"><span class="pre">ArchOptions.supporting_opclasses()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.html#cutlass.library_defaults.KernelsForDataType"><code class="docutils literal notranslate"><span class="pre">KernelsForDataType</span></code></a><ul>
|
||||
<li class="toctree-l4"><a class="reference internal" href="cutlass.html#cutlass.library_defaults.KernelsForDataType.add"><code class="docutils literal notranslate"><span class="pre">KernelsForDataType.add()</span></code></a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="cutlass.html#cutlass.library_defaults.KernelsForDataType.alignments"><code class="docutils literal notranslate"><span class="pre">KernelsForDataType.alignments</span></code></a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="cutlass.html#cutlass.library_defaults.KernelsForDataType.all_operations"><code class="docutils literal notranslate"><span class="pre">KernelsForDataType.all_operations</span></code></a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="cutlass.html#cutlass.library_defaults.KernelsForDataType.find_alignment"><code class="docutils literal notranslate"><span class="pre">KernelsForDataType.find_alignment()</span></code></a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="cutlass.html#cutlass.library_defaults.KernelsForDataType.operations"><code class="docutils literal notranslate"><span class="pre">KernelsForDataType.operations()</span></code></a></li>
|
||||
<li class="toctree-l4"><a class="reference internal" href="cutlass.html#cutlass.library_defaults.KernelsForDataType.sort"><code class="docutils literal notranslate"><span class="pre">KernelsForDataType.sort()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.html#cutlass.library_defaults.OptionRegistry"><code class="docutils literal notranslate"><span class="pre">OptionRegistry</span></code></a><ul>
|
||||
<li class="toctree-l4"><a class="reference internal" href="cutlass.html#cutlass.library_defaults.OptionRegistry.options_for_cc"><code class="docutils literal notranslate"><span class="pre">OptionRegistry.options_for_cc()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="cutlass.html#module-cutlass.swizzle">Swizzle</a><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.html#cutlass.swizzle.get_swizzling_functors"><code class="docutils literal notranslate"><span class="pre">get_swizzling_functors()</span></code></a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
<a class="next-page" href="cutlass.html">
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Next</span>
|
||||
</div>
|
||||
<div class="title">CUTLASS</div>
|
||||
</div>
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
</a>
|
||||
<a class="prev-page" href="contribute.html">
|
||||
<svg class="furo-related-icon"><use href="#svg-arrow-right"></use></svg>
|
||||
<div class="page-info">
|
||||
<div class="context">
|
||||
<span>Previous</span>
|
||||
</div>
|
||||
|
||||
<div class="title">Contributing</div>
|
||||
|
||||
</div>
|
||||
</a>
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer no-toc">
|
||||
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/scripts/furo.js"></script>
|
||||
<script src="_static/clipboard.min.js"></script>
|
||||
<script src="_static/copybutton.js"></script>
|
||||
<script src="_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
BIN
python/docs/objects.inv
Normal file
362
python/docs/py-modindex.html
Normal file
@ -0,0 +1,362 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><link rel="index" title="Index" href="genindex.html" /><link rel="search" title="Search" href="search.html" />
|
||||
<link rel="canonical" href="docs/py-modindex.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 --><title>Python Module Index - CUTLASS Python</title>
|
||||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="search.html" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
|
||||
<section class="domainindex-section">
|
||||
<h1>Python Module Index</h1>
|
||||
<div class="domainindex-jumpbox"><a href="#cap-c"><strong>c</strong></a></div>
|
||||
</section>
|
||||
<table class="domainindex-table">
|
||||
<tr class="pcap">
|
||||
<td></td><td> </td><td></td>
|
||||
</tr>
|
||||
<tr class="cap" id="cap-c">
|
||||
<td></td><td><strong>c</strong></td><td></td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td><img src="_static/minus.png" class="toggler"
|
||||
id="toggle-1" style="display: none" alt="-" /></td>
|
||||
<td>
|
||||
<code class="xref">cutlass</code></td><td>
|
||||
<em></em></td>
|
||||
</tr>
|
||||
<tr class="cg-1">
|
||||
<td></td>
|
||||
<td>   
|
||||
<a href="cutlass.emit.html#module-cutlass.emit.common"><code class="xref">cutlass.emit.common</code></a></td><td>
|
||||
<em></em></td>
|
||||
</tr>
|
||||
<tr class="cg-1">
|
||||
<td></td>
|
||||
<td>   
|
||||
<a href="cutlass.emit.html#module-cutlass.emit.pytorch"><code class="xref">cutlass.emit.pytorch</code></a></td><td>
|
||||
<em></em></td>
|
||||
</tr>
|
||||
<tr class="cg-1">
|
||||
<td></td>
|
||||
<td>   
|
||||
<a href="cutlass.html#module-cutlass.epilogue"><code class="xref">cutlass.epilogue</code></a></td><td>
|
||||
<em></em></td>
|
||||
</tr>
|
||||
<tr class="cg-1">
|
||||
<td></td>
|
||||
<td>   
|
||||
<a href="cutlass.html#module-cutlass.library_defaults"><code class="xref">cutlass.library_defaults</code></a></td><td>
|
||||
<em></em></td>
|
||||
</tr>
|
||||
<tr class="cg-1">
|
||||
<td></td>
|
||||
<td>   
|
||||
<a href="cutlass.op.html#module-cutlass.op.gemm"><code class="xref">cutlass.op.gemm</code></a></td><td>
|
||||
<em></em></td>
|
||||
</tr>
|
||||
<tr class="cg-1">
|
||||
<td></td>
|
||||
<td>   
|
||||
<a href="cutlass.op.html#module-cutlass.op.gemm_grouped"><code class="xref">cutlass.op.gemm_grouped</code></a></td><td>
|
||||
<em></em></td>
|
||||
</tr>
|
||||
<tr class="cg-1">
|
||||
<td></td>
|
||||
<td>   
|
||||
<a href="cutlass.op.html#module-cutlass.op.op"><code class="xref">cutlass.op.op</code></a></td><td>
|
||||
<em></em></td>
|
||||
</tr>
|
||||
<tr class="cg-1">
|
||||
<td></td>
|
||||
<td>   
|
||||
<a href="cutlass.html#module-cutlass.swizzle"><code class="xref">cutlass.swizzle</code></a></td><td>
|
||||
<em></em></td>
|
||||
</tr>
|
||||
<tr class="cg-1">
|
||||
<td></td>
|
||||
<td>   
|
||||
<a href="cutlass.utils.html#module-cutlass.utils.check"><code class="xref">cutlass.utils.check</code></a></td><td>
|
||||
<em></em></td>
|
||||
</tr>
|
||||
<tr class="cg-1">
|
||||
<td></td>
|
||||
<td>   
|
||||
<a href="cutlass.utils.html#module-cutlass.utils.datatypes"><code class="xref">cutlass.utils.datatypes</code></a></td><td>
|
||||
<em></em></td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
|
||||
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer no-toc">
|
||||
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/scripts/furo.js"></script>
|
||||
<script src="_static/clipboard.min.js"></script>
|
||||
<script src="_static/copybutton.js"></script>
|
||||
<script src="_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
295
python/docs/search.html
Normal file
@ -0,0 +1,295 @@
|
||||
<!doctype html>
|
||||
<html class="no-js" lang="en">
|
||||
<head><meta charset="utf-8"/>
|
||||
<meta name="viewport" content="width=device-width,initial-scale=1"/>
|
||||
<meta name="color-scheme" content="light dark"><link rel="index" title="Index" href="genindex.html" /><link rel="search" title="Search" href="#" />
|
||||
<link rel="canonical" href="docs/search.html" />
|
||||
|
||||
<!-- Generated with Sphinx 6.1.3 and Furo 2023.03.27 --><title>Search - CUTLASS Python</title><link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo.css?digest=fad236701ea90a88636c2a8c73b44ae642ed2a53" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/copybutton.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/tabs.css" />
|
||||
<link rel="stylesheet" type="text/css" href="_static/styles/furo-extensions.css?digest=30d1aed668e5c3a91c3e3bf6a60b675221979f0e" />
|
||||
|
||||
|
||||
|
||||
|
||||
<style>
|
||||
body {
|
||||
--color-code-background: #eeffcc;
|
||||
--color-code-foreground: black;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media not print {
|
||||
body[data-theme="dark"] {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
@media (prefers-color-scheme: dark) {
|
||||
body:not([data-theme="light"]) {
|
||||
--color-code-background: #272822;
|
||||
--color-code-foreground: #f8f8f2;
|
||||
--color-brand-primary: #76B900;
|
||||
--color-brand-content: #76B900;
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
</style></head>
|
||||
<body>
|
||||
|
||||
<script>
|
||||
document.body.dataset.theme = localStorage.getItem("theme") || "auto";
|
||||
</script>
|
||||
|
||||
|
||||
<svg xmlns="http://www.w3.org/2000/svg" style="display: none;">
|
||||
<symbol id="svg-toc" viewBox="0 0 24 24">
|
||||
<title>Contents</title>
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 1024 1024">
|
||||
<path d="M408 442h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8zm-8 204c0 4.4 3.6 8 8 8h480c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8H408c-4.4 0-8 3.6-8 8v56zm504-486H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zm0 632H120c-4.4 0-8 3.6-8 8v56c0 4.4 3.6 8 8 8h784c4.4 0 8-3.6 8-8v-56c0-4.4-3.6-8-8-8zM115.4 518.9L271.7 642c5.8 4.6 14.4.5 14.4-6.9V388.9c0-7.4-8.5-11.5-14.4-6.9L115.4 505.1a8.74 8.74 0 0 0 0 13.8z"/>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-menu" viewBox="0 0 24 24">
|
||||
<title>Menu</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-menu">
|
||||
<line x1="3" y1="12" x2="21" y2="12"></line>
|
||||
<line x1="3" y1="6" x2="21" y2="6"></line>
|
||||
<line x1="3" y1="18" x2="21" y2="18"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-arrow-right" viewBox="0 0 24 24">
|
||||
<title>Expand</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="feather-chevron-right">
|
||||
<polyline points="9 18 15 12 9 6"></polyline>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun" viewBox="0 0 24 24">
|
||||
<title>Light mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="feather-sun">
|
||||
<circle cx="12" cy="12" r="5"></circle>
|
||||
<line x1="12" y1="1" x2="12" y2="3"></line>
|
||||
<line x1="12" y1="21" x2="12" y2="23"></line>
|
||||
<line x1="4.22" y1="4.22" x2="5.64" y2="5.64"></line>
|
||||
<line x1="18.36" y1="18.36" x2="19.78" y2="19.78"></line>
|
||||
<line x1="1" y1="12" x2="3" y2="12"></line>
|
||||
<line x1="21" y1="12" x2="23" y2="12"></line>
|
||||
<line x1="4.22" y1="19.78" x2="5.64" y2="18.36"></line>
|
||||
<line x1="18.36" y1="5.64" x2="19.78" y2="4.22"></line>
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-moon" viewBox="0 0 24 24">
|
||||
<title>Dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-moon">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none" />
|
||||
<path d="M12 3c.132 0 .263 0 .393 0a7.5 7.5 0 0 0 7.92 12.446a9 9 0 1 1 -8.313 -12.454z" />
|
||||
</svg>
|
||||
</symbol>
|
||||
<symbol id="svg-sun-half" viewBox="0 0 24 24">
|
||||
<title>Auto light/dark mode</title>
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24" fill="none" stroke="currentColor"
|
||||
stroke-width="1.5" stroke-linecap="round" stroke-linejoin="round" class="icon-tabler-shadow">
|
||||
<path stroke="none" d="M0 0h24v24H0z" fill="none"/>
|
||||
<circle cx="12" cy="12" r="9" />
|
||||
<path d="M13 12h5" />
|
||||
<path d="M13 15h4" />
|
||||
<path d="M13 18h1" />
|
||||
<path d="M13 9h4" />
|
||||
<path d="M13 6h1" />
|
||||
</svg>
|
||||
</symbol>
|
||||
</svg>
|
||||
|
||||
<input type="checkbox" class="sidebar-toggle" name="__navigation" id="__navigation">
|
||||
<input type="checkbox" class="sidebar-toggle" name="__toc" id="__toc">
|
||||
<label class="overlay sidebar-overlay" for="__navigation">
|
||||
<div class="visually-hidden">Hide navigation sidebar</div>
|
||||
</label>
|
||||
<label class="overlay toc-overlay" for="__toc">
|
||||
<div class="visually-hidden">Hide table of contents sidebar</div>
|
||||
</label>
|
||||
|
||||
|
||||
|
||||
<div class="page">
|
||||
<header class="mobile-header">
|
||||
<div class="header-left">
|
||||
<label class="nav-overlay-icon" for="__navigation">
|
||||
<div class="visually-hidden">Toggle site navigation sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-menu"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<div class="header-center">
|
||||
<a href="index.html"><div class="brand">CUTLASS Python</div></a>
|
||||
</div>
|
||||
<div class="header-right">
|
||||
<div class="theme-toggle-container theme-toggle-header">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-header-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
</header>
|
||||
<aside class="sidebar-drawer">
|
||||
<div class="sidebar-container">
|
||||
|
||||
<div class="sidebar-sticky"><a class="sidebar-brand" href="index.html">
|
||||
|
||||
<div class="sidebar-logo-container">
|
||||
<img class="sidebar-logo only-light" src="_static/cutlass-logo-small.png" alt="Light Logo"/>
|
||||
<img class="sidebar-logo only-dark" src="_static/cutlass-logo-small.png" alt="Dark Logo"/>
|
||||
</div>
|
||||
|
||||
<span class="sidebar-brand-text">CUTLASS Python</span>
|
||||
|
||||
</a><form class="sidebar-search-container" method="get" action="#" role="search">
|
||||
<input class="sidebar-search" placeholder="Search" name="q" aria-label="Search">
|
||||
<input type="hidden" name="check_keywords" value="yes">
|
||||
<input type="hidden" name="area" value="default">
|
||||
</form>
|
||||
<div id="searchbox"></div><div class="sidebar-scroll"><div class="sidebar-tree">
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="index.html">Home</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Getting Started:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference internal" href="install.html">Installation</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="externals/00_basic_gemm.html">Getting Started</a></li>
|
||||
<li class="toctree-l1"><a class="reference internal" href="contribute.html">Contributing</a></li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Python Documentation:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="modules.html">CUTLASS Python API</a><input class="toctree-checkbox" id="toctree-checkbox-1" name="toctree-checkbox-1" role="switch" type="checkbox"/><label for="toctree-checkbox-1"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2 has-children"><a class="reference internal" href="cutlass.html">CUTLASS</a><input class="toctree-checkbox" id="toctree-checkbox-2" name="toctree-checkbox-2" role="switch" type="checkbox"/><label for="toctree-checkbox-2"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.emit.html">Emitters</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.op.html">Operations</a></li>
|
||||
<li class="toctree-l3"><a class="reference internal" href="cutlass.utils.html">Utilities</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Examples and Tutorials:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1 has-children"><a class="reference internal" href="examples.html">Examples</a><input class="toctree-checkbox" id="toctree-checkbox-3" name="toctree-checkbox-3" role="switch" type="checkbox"/><label for="toctree-checkbox-3"><div class="visually-hidden">Toggle child pages in navigation</div><i class="icon"><svg><use href="#svg-arrow-right"></use></svg></i></label><ul>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/00_basic_gemm.html">Basic GEMM</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/01_epilogue.html">Epilogue</a></li>
|
||||
<li class="toctree-l2"><a class="reference internal" href="externals/02_pytorch_extension_grouped_gemm.html">PyTorch Extension</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
</ul>
|
||||
<p class="caption" role="heading"><span class="caption-text">Reference:</span></p>
|
||||
<ul>
|
||||
<li class="toctree-l1"><a class="reference external" href="https://github.com/NVIDIA/cutlass">Github</a></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
|
||||
</div>
|
||||
</aside>
|
||||
<div class="main">
|
||||
<div class="content">
|
||||
<div class="article-container">
|
||||
<a href="#" class="back-to-top muted-link">
|
||||
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 24 24">
|
||||
<path d="M13 20h-2V8l-5.5 5.5-1.42-1.42L12 4.16l7.92 7.92-1.42 1.42L13 8v12z"></path>
|
||||
</svg>
|
||||
<span>Back to top</span>
|
||||
</a>
|
||||
<div class="content-icon-container">
|
||||
<div class="theme-toggle-container theme-toggle-content">
|
||||
<button class="theme-toggle">
|
||||
<div class="visually-hidden">Toggle Light / Dark / Auto color theme</div>
|
||||
<svg class="theme-icon-when-auto"><use href="#svg-sun-half"></use></svg>
|
||||
<svg class="theme-icon-when-dark"><use href="#svg-moon"></use></svg>
|
||||
<svg class="theme-icon-when-light"><use href="#svg-sun"></use></svg>
|
||||
</button>
|
||||
</div>
|
||||
<label class="toc-overlay-icon toc-content-icon no-toc" for="__toc">
|
||||
<div class="visually-hidden">Toggle table of contents sidebar</div>
|
||||
<i class="icon"><svg><use href="#svg-toc"></use></svg></i>
|
||||
</label>
|
||||
</div>
|
||||
<article role="main">
|
||||
|
||||
<noscript>
|
||||
<div class="admonition error">
|
||||
<p class="admonition-title">Error</p>
|
||||
<p>
|
||||
Please activate JavaScript to enable the search functionality.
|
||||
</p>
|
||||
</div>
|
||||
</noscript>
|
||||
|
||||
<div id="search-results"></div>
|
||||
|
||||
</article>
|
||||
</div>
|
||||
<footer>
|
||||
|
||||
<div class="related-pages">
|
||||
|
||||
|
||||
</div>
|
||||
<div class="bottom-of-page">
|
||||
<div class="left-details">
|
||||
<div class="copyright">
|
||||
Copyright © 2023, NVIDIA
|
||||
</div>
|
||||
Made with <a href="https://www.sphinx-doc.org/">Sphinx</a> and <a class="muted-link" href="https://pradyunsg.me">@pradyunsg</a>'s
|
||||
|
||||
<a href="https://github.com/pradyunsg/furo">Furo</a>
|
||||
|
||||
</div>
|
||||
<div class="right-details">
|
||||
<div class="icons">
|
||||
<a class="muted-link " href="https://github.com/NVIDIA/cutlass" aria-label="GitHub">
|
||||
<svg stroke="currentColor" fill="currentColor" stroke-width="0" viewBox="0 0 16 16">
|
||||
<path fill-rule="evenodd" d="M8 0C3.58 0 0 3.58 0 8c0 3.54 2.29 6.53 5.47 7.59.4.07.55-.17.55-.38 0-.19-.01-.82-.01-1.49-2.01.37-2.53-.49-2.69-.94-.09-.23-.48-.94-.82-1.13-.28-.15-.68-.52-.01-.53.63-.01 1.08.58 1.23.82.72 1.21 1.87.87 2.33.66.07-.52.28-.87.51-1.07-1.78-.2-3.64-.89-3.64-3.95 0-.87.31-1.59.82-2.15-.08-.2-.36-1.02.08-2.12 0 0 .67-.21 2.2.82.64-.18 1.32-.27 2-.27.68 0 1.36.09 2 .27 1.53-1.04 2.2-.82 2.2-.82.44 1.1.16 1.92.08 2.12.51.56.82 1.27.82 2.15 0 3.07-1.87 3.75-3.65 3.95.29.25.54.73.54 1.48 0 1.07-.01 1.93-.01 2.2 0 .21.15.46.55.38A8.013 8.013 0 0 0 16 8c0-4.42-3.58-8-8-8z"></path>
|
||||
</svg>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
</footer>
|
||||
</div>
|
||||
<aside class="toc-drawer no-toc">
|
||||
|
||||
|
||||
|
||||
</aside>
|
||||
</div>
|
||||
</div><script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||||
<script src="_static/doctools.js"></script>
|
||||
<script src="_static/sphinx_highlight.js"></script>
|
||||
<script src="_static/scripts/furo.js"></script>
|
||||
<script src="_static/clipboard.min.js"></script>
|
||||
<script src="_static/copybutton.js"></script>
|
||||
<script src="_static/tabs.js"></script>
|
||||
<script crossorigin="anonymous" integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA=" src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"></script>
|
||||
|
||||
<script src="_static/searchtools.js"></script>
|
||||
<script src="_static/language_data.js"></script>
|
||||
<script src="searchindex.js"></script></body>
|
||||
</html>
|
||||
1
python/docs/searchindex.js
Normal file
@ -25,13 +25,12 @@ We plan to add support for installing via `python setup.py install` in a future
|
||||
|
||||
## Docker
|
||||
To ensure that you have all of the necessary Python modules for running the examples using the
|
||||
CUTLASS Python interface, we recommend using one of the Docker images for CUDA [11.8](../../../python/docker/Dockerfile-cuda11.8-pytorch)
|
||||
and [12.0](../../../python/docker/Dockerfile-cuda12.0-pytorch) are located in the docker directory.
|
||||
CUTLASS Python interface, we recommend using one of the Docker images located in the docker directory.
|
||||
|
||||
For example, to build and launch a container that uses CUDA 12.0 via an NGC PyTorch container, run:
|
||||
For example, to build and launch a container that uses CUDA 12.1 via an NGC PyTorch container, run:
|
||||
```bash
|
||||
docker build -t cutlass-cuda12.0:latest -f docker/Dockerfile-cuda12.0-pytorch .
|
||||
docker run --gpus all -it --rm cutlass-cuda12.0:latest
|
||||
docker build -t cutlass-cuda12.1:latest -f docker/Dockerfile-cuda12.1-pytorch .
|
||||
docker run --gpus all -it --rm cutlass-cuda12.1:latest
|
||||
```
|
||||
|
||||
The CUTLASS Python interface has been tested with CUDA 11.8 and CUDA 12.0 on Python 3.8.10 and 3.9.7.
|
||||
The CUTLASS Python interface has been tested with CUDA 11.8, 12.0, and 12.1 on Python 3.8.10 and 3.9.7.
|
||||
|
||||