-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative Allocators #182
Conversation
Small note to myself as well: I think there are some |
I moved the Bumper implementation to a package extension and found a way to work around defining the macro in the base package, while the implementation is in the extension. |
Looks great. Maybe we can use the same macro in extension package technique for |
I think that's definitely a good idea, but I would move that to a separate PR or commit ;) |
Ok, if tests work I think this completes this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments, otherwise definitely looks okay to me.
I don't think we currently test the Bumper functionality at all, and @butensor
seems not exported.
src/implementation/base.jl
Outdated
Atemp = tensoralloc_add(eltype(A), A, pA, conjA, true, allocator) | ||
à = permutedims!(Atemp, A, linearize(pA)) | ||
Atemp = tensoralloc_add(eltype(A), A, pA, conjA, Val(true), allocator) | ||
Atemp = permutedims!(Atemp, A, linearize(pA)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, there is no guarantee that the lhs is the same object as the one we allocated. In that case, we created a memory leak here. In practice this probably does not really happen, but this is definitely why I used different variable names in my initial change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, I do think permutedims!
does guarantee to store the result in the first argument, i.e. this should work without the Atemp =
part.
permutedims!(dest, src, perm)
Permute the dimensions of array src and store the result in the array dest. perm is a vector specifying a permutation of length ndims(src). The preallocated array dest should have size(dest) == size(src)[perm] and is completely
overwritten. No in-place permutation is supported and unexpected results will happen if src and dest have overlapping memory regions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the docstring guarantees this, but for example, in TensorOperations with AD, we abuse the fact that the macro automatically adds C = tensoradd!(C,...)
to hack into the system and make the lhs a copy. In principle anyone could do something like this for a custom type, and then the memory management chain would be broken (doom thinking here)
This PR attempts to add some additional allocator strategies to the toolbox, focused around dense arrays of isbits types.
I added some rudimentary support for PtrArrays.jl, which provides a manual way of implementing
malloc
andfree
for the temporaries. My first tests seem to indicate however that this does not improve the performance (even making it slightly worse in most cases). This probably requires more investigation, as it seems unlikely that this should be happening.I also added support for Bumper.jl. Here, I use their buffer types as an allocator, which means that it is quite easy to manually make use of the bumper interface as follows:
Nevertheless, for further automation, I also added the convenience
@butensor
macro, which does exactly that.Some implementation notes:
In order to make this work, the current way of dispatching with
StridedView
s and choosing between GPU and CPU definitely does not work. Both these options require a parent type of the StridedView which is notArray
(but isDenseArray
!), which would now be unsupported. I could add manualselect_backend
procedures for this, but I am a bit scared of the ambiguities, as I don't want to have to deal with the many combinations. This should probably be reconsidered.In principle the current implementation of the Bumper methods could be part of a package extension, as it does not even require a definition of an
Allocator
type. Is this something we would like?We should probably invest some time in a proper benchmark suite, as it is quite hard to gauge the effectiveness of these methods.