Skip to content

ddoddii/Multicore-GPU-Programming

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multicore GPU Programming

The repository covers a wide range of topics, each aimed at improving efficiency and performance in GPU programming. Here’s a detailed look at what I learned: Nerd Face

Theory

Theme Post
Basic Parallel Architectures Basic Parallel Architectures에 대해 알아보자
Thread Programming c++로 알아본 쓰레드 프로그래밍
Thread Management 멀티쓰레드에서 쓰레드 간 작업을 어떻게 균일하게 분할할까?
Matrix Multiplication (multi-threaded) 멀티쓰레드에서 행렬 연산(matmul) 성능 증가시키는 방법들
OpenMP 멀티쓰레딩을 편리하게 해주는 OpenMP 사용법
Graph Processing 그래프 구조를 더 효율적으로 저장하는 방법들
Prefix sum Prefix Sum : 효율적인 연산을 위한 가이드
CUDA Programming Intro CUDA 프로그래밍 기초
CPU-GPU communication and thread indexing CPU-GPU 통신 및 CUDA를 활용한 이미지 프로세싱 기법
CUDA thread hierarchy, memory hierarchy, GPU cache structure CUDA와 Nvidia GPU 아키텍처: 스레드 계층, 메모리 계층 및 GPU 캐시 구조 이해하기
CUDA memories : registers, shared memory, global memory CUDA Memories : 레지스터, 공유 메모리, 글로벌 메모리

Hands on Assignment

Assignment Description Link
Assignment #1 A Simple Filter on 1D Array link
Assignment #2 Hash table locking link
Assignment #3 Matrix Multiplication link
Assignment #4 Matrix Multiplication using CUDA link
Assignment #5 Sum Reduction link
Assignment #6 CUDA Application of DNN link

Content Breakdown

Basic Parallel Architectures

Thread Programming

Thread Management

Matrix Multiplication (multi-threaded)

OpenMP

Graph Processing

Prefix Sum

CUDA 101

  • Post : CUDA 프로그래밍 기초
  • Description : This section provides an introduction to CUDA programming, designed for those new to GPU programming. This post includes the basics of CUDA, including how to set up your development environment, write and compile your first CUDA program.

CPU-GPU communication and thread indexing

  • Post : CPU-GPU 통신 및 CUDA를 활용한 이미지 프로세싱 기법
  • Description : This section provides detailed explanation about the hierarchical structure of CUDA threads, including grids, blocks, and threads. This post includes calculating global thread index through thread indexing and some example code about image processing.

CUDA thread hierarchy, memory hierarchy, GPU cache structure

CUDA memories : registers, shared memory, global memory

  • Post : CUDA Memories : 레지스터, 공유 메모리, 글로벌 메모리
  • Decsription : This section explores the different types of memory in CUDA, focusing on registers, shared memory, and global memory. his post delves into the characteristics of each memory type and provides strategies for effectively utilizing them to enhance the efficiency of CUDA kernels.