Note that this repository is under active development.
Section | Videos | Codes |
---|---|---|
01 | 第1集 CUDA介绍及Windows开发环境安装 | / |
02 | 第2集 Ubuntu系统下安装CUDA开发环境 | / |
03 | 第3集 Windows和Ubuntu下运行第一个CUDA程序 | course01_hello_cuda |
04 | 第4集 你好, CUDA! | course01_hello_cuda |
- ...
Thanks for the following excellent public learning resources.
-
codingonion/awesome-cuda-and-hpc : A collection of some awesome public CUDA, cuBLAS, TensorRT and High Performance Computing (HPC) projects.
-
NVIDIA CUDA Toolkit Documentation : CUDA Toolkit Documentation.
-
NVIDIA CUDA C++ Programming Guide : CUDA C++ Programming Guide.
-
NVIDIA CUDA C++ Best Practices Guide : CUDA C++ Best Practices Guide.
-
NVIDIA/cuda-samples : Samples for CUDA Developers which demonstrates features in CUDA Toolkit.
-
NVIDIA/CUDALibrarySamples : CUDA Library Samples.
-
NVIDIA-developer-blog/code-samples : Source code examples from the Parallel Forall Blog.
-
HeKun-NVIDIA/CUDA-Programming-Guide-in-Chinese : This is a Chinese translation of the CUDA programming guide. 本项目为 CUDA C Programming Guide 的中文翻译版。
-
cuda-mode/lectures : Material for cuda-mode lectures.
-
cuda-mode/resource-stream : CUDA related news and material links.
-
brucefan1983/CUDA-Programming : Sample codes for my CUDA programming book.
-
YouQixiaowu/CUDA-Programming-with-Python : 关于书籍CUDA Programming使用了pycuda模块的Python版本的示例代码。
-
QINZHAOYU/CudaSteps : 基于《cuda编程-基础与实践》(樊哲勇 著)的cuda学习之路。
-
sangyc10/CUDA-code : bilibili视频【CUDA编程基础入门系列(持续更新)】配套代码。
-
RussWong/CUDATutorial : A CUDA tutorial to make people learn CUDA program from 0.
-
DefTruth//CUDA-Learn-Notes : 🎉CUDA/C++ 笔记 / 大模型手撕CUDA / 技术博客,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
-
BBuf/how-to-optim-algorithm-in-cuda : how to optimize some algorithm in cuda.
-
PaddleJitLab/CUDATutorial : A self-learning tutorail for CUDA High Performance Programing. 从零开始学习 CUDA 高性能编程。
-
leimao/CUDA-GEMM-Optimization : CUDA Matrix Multiplication Optimization. This repository contains the CUDA kernels for general matrix-matrix multiplication (GEMM) and the corresponding performance analysis.
-
Liu-xiandong/How_to_optimize_in_GPU : This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
-
Bruce-Lee-LY/matrix_multiply : Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
-
Bruce-Lee-LY/cuda_hgemm : Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
-
Bruce-Lee-LY/cuda_hgemv : Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
-
enp1s0/ozIMMU : FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme. arxiv.org/abs/2306.11975
-
Cjkkkk/CUDA_gemm : A simple high performance CUDA GEMM implementation.
-
AyakaGEMM/Hands-on-GEMM : A GEMM tutorial.
-
AyakaGEMM/Hands-on-MLIR : Hands-on-MLIR.
-
zpzim/MSplitGEMM : Large matrix multiplication in CUDA.
-
jundaf2/CUDA-INT8-GEMM : CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API.
-
chanzhennan/cuda_gemm_benchmark : Base on gtest/benchmark, refer to https://github.com/Liu-xiandong/How_to_optimize_in_GPU.
-
YuxueYang1204/CudaDemo : Implement custom operators in PyTorch with cuda/c++.
-
CoffeeBeforeArch/cuda_programming : Code from the "CUDA Crash Course" YouTube series by CoffeeBeforeArch.
-
rbaygildin/learn-gpgpu : Algorithms implemented in CUDA + resources about GPGPU.
-
godweiyang/NN-CUDA-Example : Several simple examples for popular neural network toolkits calling custom CUDA operators.
-
yhwang-hub/Matrix_Multiplication_Performance_Optimization : Matrix Multiplication Performance Optimization.
-
yao-jiashu/KernelCodeGen : GEMM/Conv2d CUDA/HIP kernel code generation using MLIR.
-
caiwanxianhust/ClusteringByCUDA : 使用 CUDA C++ 实现的一系列聚类算法。
-
ulrichstern/cuda-convnet : Alex Krizhevsky's original code from Google Code. "微信公众号「人工智能大讲堂」《找到了AlexNet当年的源代码,没用框架,从零手撸CUDA/C++》"。
-
PacktPublishing/Learn-CUDA-Programming : Learn CUDA Programming, published by Packt.
-
PacktPublishing/Hands-On-GPU-Programming-with-Python-and-CUDA : Hands-On GPU Programming with Python and CUDA, published by Packt.
-
PacktPublishing/Hands-On-GPU-Accelerated-Computer-Vision-with-OpenCV-and-CUDA : Hands-On GPU Accelerated Computer Vision with OpenCV and CUDA, published by Packt.
-
codingonion/cuda-beginner-course-cpp-version : bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码。
-
codingonion/cuda-beginner-course-python-version : bilibili视频【CUDA 12.x 并行编程入门(Python版)】配套代码。
-
codingonion/cuda-beginner-course-rust-version : bilibili视频【CUDA 12.x 并行编程入门(Rust版)】配套代码。