Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector dot is much slower than build-in operation #69

Open
learning-chip opened this issue Apr 20, 2022 · 1 comment
Open

Vector dot is much slower than build-in operation #69

learning-chip opened this issue Apr 20, 2022 · 1 comment

Comments

@learning-chip
Copy link

I can get decent parallel speed-up for sparse matmul and sparse matvec, but the dot product between two vectors seems very slow:

using SuiteSparseGraphBLAS
using BenchmarkTools

gbset(:nthreads, 16)

b = ones(10000)
b_gb = GBVector(b)

@btime b' * b  #  1 μs
@btime b_gb' * b_gb  # 15 μs

Is this expected? Or it can be tuned to be faster?

Version: [email protected]

@rayegun
Copy link
Member

rayegun commented Apr 20, 2022

I do see this behavior (although more like 10x on my device). The big thing is that SuiteSparse:GraphBLAS is not a replacement for BLAS1 operations. It's a sparse matrix library, so it will always be a bit slow for simple BLAS operations.

That being said we can probably do better here. Perhaps by unpacking and repacking the result and actually doing BLAS1. For the basic arithmetic semiring.

We could also not be at O3 for some reason, I'll check on that. As well as talk to Tim Davis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants