Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending CLBlast to include BLAS symbols expected by Armadillo #365

Open
stevepur opened this issue Jul 19, 2019 · 1 comment
Open

Extending CLBlast to include BLAS symbols expected by Armadillo #365

stevepur opened this issue Jul 19, 2019 · 1 comment

Comments

@stevepur
Copy link

I've successfully written some wrappers that extend CLBlast to become a drop-in blas replacement that runs Armadillo matrix multiplies on an AMD GPU. Is there interest in adding that to this project?

I'm on os X and I'm a long-time armadillo user, and I've been using NVIDIA's nvblast library as a drop-in blas replacement that lets armadillo use an NVIDIA GPU. Of course NVIDIA and Apple are parting ways, so last week I purchased an AMD RX 580. I then went on a search for a similar blas replacement that would work on the AMD.

I am very happy to find CLBlast, and see how it provides a blas-compliant library. But it does not use the same function names as blas, which makes sense because while the APIs in CLBlast's netlib routines are very close to blas they are not the same.

So I wrote some wrappers in a new clblast_netlib_armadillo.h file (attached) that creates an API that looks like blas with the names expected by Armadillo, and calls the CLBlast netlib routines in clblast_netlib_c.cpp with the appropriate argument translation. I've only done this for xgemm, and have only actually run zgemm ('cause that's all my project needs), but it works!

I'm sure my approach is not the best: I inserted
#include <clblast_netlib_armadillo.h>
at line 992 of clblast_netlib_c.h (mostly because I'm not confident about the details of CMakeLists.txt), and this approach requires adding
#define NETLIB_PERSISTENT_OPENCL
at the top of clblast_netlib_c.cpp to allow many multiple calls to cblas_xgemm.

But this new capability is extremely important to me, and I'm sure I'm not the only one.

I've only done this for xgemm, but I would be happy to extend the wrapper code to many other routines. I just need some guidance on the best way to do that. I would be honored to contribute to this most excellent library. Would that be welcome?

clblast_netlib_armadillo.h.zip

@CNugteren
Copy link
Owner

Thanks for your request and sorry for my late reply.

I think that could be nice indeed. You probably noticed that some of these header and definition files are very repetitive (for each routine and for each precision), so I've actually scripted the generation of such files using Python. Not sure if that was the best decision, but it does have some benefits. You can see here which of the files it currently already produces:
https://github.com/CNugteren/CLBlast/blob/master/scripts/generator/generator.py#L11
And you can run the script as follows: python ./scripts/generator/generator.py . (it shouldn't change anything).

So I would propose extending this Python script with the Armadillo interface for the sake of consistency and to save typing. Since this is a bit of hacky thing, I could make a very first stub of the implementation based on your GEMM example, and perhaps you can finish it and test it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants