69e3709da4
Fixed typeo
...
Fixed typeo
2018-09-28 12:59:20 -07:00
1a7ac522f8
Clarification to README
2018-09-20 11:04:03 -07:00
206e38dac5
Updated copyright of CUTLASS.md
2018-09-19 21:31:12 -07:00
0826572c4c
Reduced range of random values to avoid bit-level inconsistencies for large matrices.
2018-09-19 21:11:48 -07:00
77d1e0ca81
Updated README and CHANGELOG.
2018-09-19 20:42:51 -07:00
d7137f9c0a
Updated doxygen
2018-09-19 14:02:08 -07:00
461f417b9d
Checkpointing CUTLASS 1.1 release.
2018-09-18 16:58:03 -07:00
cf0301e00f
Merge pull request #15 from NVIDIA/release_1.0.1_edits
...
Minor edits to README and changelog pursuant CUTLASS 1.0.1 patch.
v1.0.1
2018-06-26 13:59:01 -07:00
b9bb0d1a49
Edits to README and changelog pursuant CUTLASS 1.0.1 patch.
2018-06-26 13:57:39 -07:00
e1c4ba501b
Merge pull request #13 from NVIDIA/cutlass_v1.0.1
...
Cutlass v1.0.1
2018-06-12 08:25:56 -07:00
c566e83e6d
Updated changelog.
2018-06-11 14:54:07 -07:00
374882be53
Replaced GoogleTest copy with submodule. Added updates to support intra-threadblock reductions. Added tests for same.
2018-06-11 11:47:15 -07:00
2c496c3e9e
Replaced GoogleTest copy with Git submodule.
2018-06-11 11:32:41 -07:00
9fd55460c6
Merge pull request #10 from NVIDIA/cutlass_v1.0_rel
...
Minor updates to usage and README.
2018-05-18 12:27:31 -07:00
480732c2e8
Minor updates to usage and readme.
2018-05-17 15:10:55 -07:00
68aaee8773
Merge pull request #9 from NVIDIA/cutlass_v1.0_rel
...
Updated URL to Doxygen and modified usage statement
2018-05-17 11:12:37 -07:00
acb90e962a
Updated url to Doxygen and modified usage statement in performance test program.
2018-05-17 11:11:05 -07:00
96bc3f227f
Merge pull request #8 from NVIDIA/cutlass_v1.0_rel
...
Configured Github Pages
2018-05-16 15:26:55 -07:00
25ff282403
Moved Doxygen documents.
2018-05-16 15:25:24 -07:00
9d5726a568
Set theme jekyll-theme-minimal
2018-05-16 13:49:06 -07:00
6f0d271d8d
CUTLASS v1.0
...
CUTLASS v1.0 released.
v1.0.0
2018-05-16 13:47:13 -07:00
923dfb42ce
Updated README.md
2018-05-16 12:50:10 -07:00
6f6f269a0a
Updated README.md
2018-05-16 12:47:07 -07:00
2028ebe120
CUTLASS v1.0 release
2018-05-16 11:44:56 -07:00
84377249a1
Merge pull request #2 from Artem-B/clang-fixes
...
Merging "Clang fixes" into master.
v0.1.1
2018-01-04 15:52:53 -08:00
901287175f
Merge branch 'Artem-B-clang-fixes'
2018-01-04 15:46:08 -08:00
1c9b54df16
Whitespace fix.
2018-01-03 16:42:51 -08:00
39616514d0
Reworked CUDA_LOG macro to print location&the message with one printf.
...
This replies on the fact that clang allows using device-side features
from __host__/__device__ functions from __host__ ones as long as we
don't have to generate code for that. Wrapping thread/blockIdx in
__host__ __device__ function allows using CUDA_LOG everywhere during
host and device compilation.
2018-01-03 16:36:50 -08:00
df4b4e4bb6
Added _cuda_ to the name of the executable to indicate that it's not clang's version.
2017-12-11 16:34:10 -08:00
81957b3a3d
Force inlining of few functions that rely on that for performance.
...
Clang is less agressive than nvccnvcc, so number of functions did not getn
inlined into the kernel by default. That prevented SROA from eliminating
loads/stores to temporary buffers and resulted in abysmal performance.
Replaced inline with __forceinline__ to ensure that we do inline the
functions necessary for optimal performance.
2017-12-11 14:52:30 -08:00
ce2b3f695d
Fixed debug macros for clang.
...
Unlike nvcc, clang always sees both host and device-side code during
compilation. CUDA_LOG macro is used in both host and device code, so when it
expanded to contain device-only code, that resulted in errors when it was used
from the host-side functions.
In order to make CUDA_LOG work with clang it was split into two parts -- a pair
of target-attribute-based overloaded functions that perform host or device
specific parts of logging, and a printf which works on both sides.
2017-12-11 14:52:30 -08:00
e9e7cd4d44
Make cutlass compilable with clang.
...
E.g:
PATH=/nvcc/path/bin:/clang/path/bin:$PATH make sm=35,60 compiler=clang all
2017-12-11 14:52:30 -08:00
95b0578d34
Update license info
2017-12-06 10:00:59 -05:00
f4b48c7669
Update README.md
2017-12-05 22:58:46 -05:00
6cb88d53eb
Update README.md
2017-12-05 22:58:12 -05:00
537a4bcedf
Update README.md
2017-12-05 22:54:49 -05:00
5bd3f09312
Update README.md
2017-12-05 22:53:11 -05:00
6f091f5620
Update README.md
2017-12-05 22:44:01 -05:00
0428c89fd5
Updating readme with relative per chart
2017-12-05 22:40:47 -05:00
e2bf51c3fe
Update README.md
2017-12-05 22:25:42 -05:00
57747e382e
Update README.md
2017-12-05 21:32:06 -05:00
dd4dd4cebf
Update README.md
2017-12-05 20:58:01 -05:00
6565b48747
Update README.md
2017-12-05 20:56:49 -05:00
73211bbb88
Update README.md
2017-12-05 20:55:54 -05:00
9dcb2b4c7d
Update README.md
2017-12-05 20:55:03 -05:00
f30abfc00a
Update README.md
2017-12-05 20:50:15 -05:00
8ebd6b06d0
Replace svg with png+text
2017-12-05 20:20:25 -05:00
04ffa156e8
Adding figure to readme.md
2017-12-05 20:15:33 -05:00
24d0ba65c5
Update code formatting
2017-12-05 15:51:01 -05:00
4276e46e61
Improved formatting of Makefile
v0.1.0
2017-12-05 12:45:06 -08:00