You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've successfully written some wrappers that extend CLBlast to become a drop-in blas replacement that runs Armadillo matrix multiplies on an AMD GPU. Is there interest in adding that to this project?
I'm on os X and I'm a long-time armadillo user, and I've been using NVIDIA's nvblast library as a drop-in blas replacement that lets armadillo use an NVIDIA GPU. Of course NVIDIA and Apple are parting ways, so last week I purchased an AMD RX 580. I then went on a search for a similar blas replacement that would work on the AMD.
I am very happy to find CLBlast, and see how it provides a blas-compliant library. But it does not use the same function names as blas, which makes sense because while the APIs in CLBlast's netlib routines are very close to blas they are not the same.
So I wrote some wrappers in a new clblast_netlib_armadillo.h file (attached) that creates an API that looks like blas with the names expected by Armadillo, and calls the CLBlast netlib routines in clblast_netlib_c.cpp with the appropriate argument translation. I've only done this for xgemm, and have only actually run zgemm ('cause that's all my project needs), but it works!
I'm sure my approach is not the best: I inserted #include <clblast_netlib_armadillo.h>
at line 992 of clblast_netlib_c.h (mostly because I'm not confident about the details of CMakeLists.txt), and this approach requires adding #define NETLIB_PERSISTENT_OPENCL
at the top of clblast_netlib_c.cpp to allow many multiple calls to cblas_xgemm.
But this new capability is extremely important to me, and I'm sure I'm not the only one.
I've only done this for xgemm, but I would be happy to extend the wrapper code to many other routines. I just need some guidance on the best way to do that. I would be honored to contribute to this most excellent library. Would that be welcome?
Thanks for your request and sorry for my late reply.
I think that could be nice indeed. You probably noticed that some of these header and definition files are very repetitive (for each routine and for each precision), so I've actually scripted the generation of such files using Python. Not sure if that was the best decision, but it does have some benefits. You can see here which of the files it currently already produces: https://github.com/CNugteren/CLBlast/blob/master/scripts/generator/generator.py#L11
And you can run the script as follows: python ./scripts/generator/generator.py . (it shouldn't change anything).
So I would propose extending this Python script with the Armadillo interface for the sake of consistency and to save typing. Since this is a bit of hacky thing, I could make a very first stub of the implementation based on your GEMM example, and perhaps you can finish it and test it?
I've successfully written some wrappers that extend CLBlast to become a drop-in blas replacement that runs Armadillo matrix multiplies on an AMD GPU. Is there interest in adding that to this project?
I'm on os X and I'm a long-time armadillo user, and I've been using NVIDIA's nvblast library as a drop-in blas replacement that lets armadillo use an NVIDIA GPU. Of course NVIDIA and Apple are parting ways, so last week I purchased an AMD RX 580. I then went on a search for a similar blas replacement that would work on the AMD.
I am very happy to find CLBlast, and see how it provides a blas-compliant library. But it does not use the same function names as blas, which makes sense because while the APIs in CLBlast's netlib routines are very close to blas they are not the same.
So I wrote some wrappers in a new clblast_netlib_armadillo.h file (attached) that creates an API that looks like blas with the names expected by Armadillo, and calls the CLBlast netlib routines in clblast_netlib_c.cpp with the appropriate argument translation. I've only done this for xgemm, and have only actually run zgemm ('cause that's all my project needs), but it works!
I'm sure my approach is not the best: I inserted
#include <clblast_netlib_armadillo.h>
at line 992 of clblast_netlib_c.h (mostly because I'm not confident about the details of CMakeLists.txt), and this approach requires adding
#define NETLIB_PERSISTENT_OPENCL
at the top of clblast_netlib_c.cpp to allow many multiple calls to cblas_xgemm.
But this new capability is extremely important to me, and I'm sure I'm not the only one.
I've only done this for xgemm, but I would be happy to extend the wrapper code to many other routines. I just need some guidance on the best way to do that. I would be honored to contribute to this most excellent library. Would that be welcome?
clblast_netlib_armadillo.h.zip
The text was updated successfully, but these errors were encountered: