Skip to content

Commit

Permalink
ggml-cuda : perform cublas mat mul of quantized types as f16 (ggergan…
Browse files Browse the repository at this point in the history
…ov#3412)

* ggml-cuda : perform cublas matrix multiplication of quantized types as fp16

* rename CC_TURING to CC_VOLTA

* disable fp16 mat mul completely with multi GPU
  • Loading branch information
slaren authored Sep 30, 2023
1 parent 40e07a6 commit f5ef5cf
Showing 1 changed file with 122 additions and 72 deletions.
Loading

0 comments on commit f5ef5cf

Please sign in to comment.