-
Notifications
You must be signed in to change notification settings - Fork 0
/
CITATION.cff
82 lines (82 loc) · 2.45 KB
/
CITATION.cff
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
cff-version: 1.2.0
title: CUTLASS
message: >-
If you use this software, please cite using the
following metadata.
type: software
authors:
- given-names: Andrew
email: [email protected]
family-names: Kerr
affiliation: NVIDIA
- given-names: Haicheng
family-names: Wu
affiliation: NVIDIA
email: [email protected]
- given-names: Manish
family-names: Gupta
affiliation: Google
email: [email protected]
- given-names: Dustyn
family-names: Blasig
email: [email protected]
affiliation: NVIDIA
- given-names: Pradeep
family-names: Ramini
email: [email protected]
affiliation: NVIDIA
- given-names: Duane
family-names: Merrill
email: [email protected]
affiliation: NVIDIA
- given-names: Aniket
family-names: Shivam
email: [email protected]
affiliation: NVIDIA
- given-names: Piotr
family-names: Majcher
email: [email protected]
affiliation: NVIDIA
- given-names: Paul
family-names: Springer
email: [email protected]
affiliation: NVIDIA
- given-names: Markus
family-names: Hohnerbach
affiliation: NVIDIA
email: [email protected]
- given-names: Jin
family-names: Wang
email: [email protected]
affiliation: NVIDIA
- given-names: Matt
family-names: Nicely
email: [email protected]
affiliation: NVIDIA
repository-code: 'https://github.com/NVIDIA/cutlass'
abstract: >-
CUTLASS is a collection of CUDA C++ template
abstractions for implementing high-performance
matrix-multiplication (GEMM) and related
computations at all levels and scales within CUDA.
It incorporates strategies for hierarchical
decomposition and data movement similar to those
used to implement cuBLAS and cuDNN. CUTLASS
decomposes these "moving parts" into reusable,
modular software components abstracted by C++
template classes. These thread-wide, warp-wide,
block-wide, and device-wide primitives can be
specialized and tuned via custom tiling sizes, data
types, and other algorithmic policy. The resulting
flexibility simplifies their use as building blocks
within custom kernels and applications.
keywords:
- 'cutlass, tensor cores, cuda'
license: BSD-3-Clause
license-url: https://github.com/NVIDIA/cutlass/blob/v2.10.0/LICENSE.txt
version: '2.10.0'
date-released: '2022-09-15'
identifiers:
- type: url
value: "https://github.com/NVIDIA/cutlass/tree/v2.10.0"
description: The GitHub release URL of tag 2.10.0