ggml-cuda : perform cublas mat mul of quantized types as f16 (#3412) · yxq321/skywork.cpp@f5ef5cf · GitHub

Commit

ggml-cuda : perform cublas mat mul of quantized types as f16 (ggergan…

Browse files

…ov#3412)

* ggml-cuda : perform cublas matrix multiplication of quantized types as fp16

* rename CC_TURING to CC_VOLTA

* disable fp16 mat mul completely with multi GPU

Loading branch information

slaren authored Sep 30, 2023

1 parent 40e07a6 commit f5ef5cf