GitHub - taras-sereda/cuda-machine-learning: Machine Learning related CUDA code

HowTos

Compile pytorch CUDA extension with multiple Compute Capabilities as well as PTX. https://pytorch.org/docs/stable/cpp_extension.html env TORCH_CUDA_ARCH_LIST="6.1 7.5 8.6+PTX" python setup.py install

Quering device properties.

cudaDeviceGetAttribute argued being faster , though it's too verbose

  int maxBlockDimX;
  int maxBlockDimY;
  int maxBlockDimZ;
  cudaDeviceGetAttribute(&maxBlockDimX, cudaDevAttrMaxBlockDimX, devIdx);
  cudaDeviceGetAttribute(&maxBlockDimY, cudaDevAttrMaxBlockDimY, devIdx);
  cudaDeviceGetAttribute(&maxBlockDimZ, cudaDevAttrMaxBlockDimZ, devIdx);
  printf("  %d %d %d \n", maxBlockDimX, maxBlockDimY, maxBlockDimZ);

vs. cudaGetDeviceProperties

  cudaDeviceProp prop;
  cudaGetDeviceProperties(&prop, devIdx);
  printf("  Max Threads Dim: %d %d %d\n", prop.maxThreadsDim[0], prop.maxThreadsDim[1], prop.maxThreadsDim[2]);

reference of CUDA device properties
Hemi - library for writting reusable CPU and GPU code. single kernel function executable on both device types. More in this blog post.

Compile CUDA kernels

nvcc device_info.cu -o device_info

Theory Thread Hierarchy with easy examples.

1d(Dx=15)
like finiding a point on a line
threadIdx = (7)
threadId = threadIdx.x
threadId = 7

-> -------x-------

2d(Dx=15, Dy=4)
like finding a line on a plane and then finding a point on it.
threadIdx = (2, 7)
threadId = threadIdx.x + threadIdx.y * Dx
threadId = 7 + 2 * 15 = 37
   ---------------
   ---------------
-> -------x-------
   ---------------



3d(Dx=15, Dy=4, Dz=2)
threadIdx = (0, 2, 7)
threadId = threadIdx.x + threadIdx.y * Dx + threadIdx.z * Dx * Dy
like finding a plane in a volume then line on a plane and finaly a point.

   --------------- --
   ---------------   | first
-> --------x------   | find a slice
   --------------- --

   ---------------
   --------------- 
   --------------- 
   ---------------

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.vscode		.vscode
black_and_white		black_and_white
blur		blur
cuda_by_example		cuda_by_example
intro_to_cuda_1		intro_to_cuda_1
julia_set_2		julia_set_2
matMul/src		matMul/src
memory_allocation		memory_allocation
saxpy/src		saxpy/src
vector_addition		vector_addition
README.md		README.md
device_info.cu		device_info.cu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HowTos

Quering device properties.

About

Releases

Packages

Languages

taras-sereda/cuda-machine-learning

Folders and files

Latest commit

History

Repository files navigation

HowTos

Quering device properties.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages