Vector dot is much slower than build-in operation #69

learning-chip · 2022-04-20T10:05:49Z

I can get decent parallel speed-up for sparse matmul and sparse matvec, but the dot product between two vectors seems very slow:

using SuiteSparseGraphBLAS
using BenchmarkTools

gbset(:nthreads, 16)

b = ones(10000)
b_gb = GBVector(b)

@btime b' * b  #  1 μs
@btime b_gb' * b_gb  # 15 μs

Is this expected? Or it can be tuned to be faster?

Version: [email protected]

rayegun · 2022-04-20T10:52:14Z

I do see this behavior (although more like 10x on my device). The big thing is that SuiteSparse:GraphBLAS is not a replacement for BLAS1 operations. It's a sparse matrix library, so it will always be a bit slow for simple BLAS operations.

That being said we can probably do better here. Perhaps by unpacking and repacking the result and actually doing BLAS1. For the basic arithmetic semiring.

We could also not be at O3 for some reason, I'll check on that. As well as talk to Tim Davis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vector dot is much slower than build-in operation #69

Vector dot is much slower than build-in operation #69

learning-chip commented Apr 20, 2022

rayegun commented Apr 20, 2022 •

edited

Loading

Vector dot is much slower than build-in operation #69

Vector dot is much slower than build-in operation #69

Comments

learning-chip commented Apr 20, 2022

rayegun commented Apr 20, 2022 • edited Loading

rayegun commented Apr 20, 2022 •

edited

Loading