Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
-
Updated
Sep 8, 2024 - Cuda
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
Multiple GEMM operators are constructed with cutlass to support LLM inference.
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
c lib for calculating matrices
Add a description, image, and links to the matrix-multiply topic page so that developers can more easily learn about it.
To associate your repository with the matrix-multiply topic, visit your repo's landing page and select "manage topics."