Release v4.0.0 (#2294)

This commit is contained in:
Kihiro Bando
2025-05-13 15:55:29 -04:00
committed by GitHub
parent ad7b2f5e84
commit f115c3f854
299 changed files with 51495 additions and 4413 deletions

View File

@ -217,7 +217,7 @@ and `TensorRef` objects for each of the operands whose extents are implied as a
redundant storage of extent quantities, CUTLASS minimizes capacity utilization of precious resources such as constant memory.
This is consistent with BLAS conventions.
# Summary:
## Summary:
The design patterns described in this document form a hierarchy:
* `T *ptr;` is a pointer to a contiguous sequence of elements of type `T`
@ -225,7 +225,7 @@ The design patterns described in this document form a hierarchy:
* `TensorRef<T, Layout> ref(ptr, layout);` is an object pointing to an _unbounded_ tensor containing elements of type `T` and a layout of type `Layout`
* `TensorView<T, Layout> view(ref, extent);` is an object pointing to a _bounded_ tensor containing elements of type `T` and a layout of type `Layout`
# Appendix: Existing Layouts
### Appendix: Existing Layouts
This section enumerates several existing Layout types defined in CUTLASS.
@ -268,7 +268,7 @@ Permuted Shared Memory Layouts:
- `TensorOpCrosswise<ElementSize>`
# Copyright
### Copyright
Copyright (c) 2017 - 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: BSD-3-Clause