Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative Allocators #182

Merged
merged 25 commits into from
Jul 12, 2024
Merged

Alternative Allocators #182

merged 25 commits into from
Jul 12, 2024

Conversation

lkdvos
Copy link
Collaborator

@lkdvos lkdvos commented Jul 5, 2024

This PR attempts to add some additional allocator strategies to the toolbox, focused around dense arrays of isbits types.

I added some rudimentary support for PtrArrays.jl, which provides a manual way of implementing malloc and free for the temporaries. My first tests seem to indicate however that this does not improve the performance (even making it slightly worse in most cases). This probably requires more investigation, as it seems unlikely that this should be happening.

I also added support for Bumper.jl. Here, I use their buffer types as an allocator, which means that it is quite easy to manually make use of the bumper interface as follows:

buf = Bumper.default_buffer()
@no_escape buf begin
    @tensor allocator=buf tensorexpr...
end

Nevertheless, for further automation, I also added the convenience @butensor macro, which does exactly that.


Some implementation notes:

In order to make this work, the current way of dispatching with StridedViews and choosing between GPU and CPU definitely does not work. Both these options require a parent type of the StridedView which is not Array (but is DenseArray!), which would now be unsupported. I could add manual select_backend procedures for this, but I am a bit scared of the ambiguities, as I don't want to have to deal with the many combinations. This should probably be reconsidered.

In principle the current implementation of the Bumper methods could be part of a package extension, as it does not even require a definition of an Allocator type. Is this something we would like?

We should probably invest some time in a proper benchmark suite, as it is quite hard to gauge the effectiveness of these methods.

@lkdvos lkdvos requested a review from Jutho July 5, 2024 15:26
@lkdvos
Copy link
Collaborator Author

lkdvos commented Jul 8, 2024

Small note to myself as well: I think there are some tensoralloc calls that don't have a matching free statement in the base implementations. I'll try to fix this in this PR

src/implementation/base.jl Outdated Show resolved Hide resolved
@lkdvos
Copy link
Collaborator Author

lkdvos commented Jul 10, 2024

I moved the Bumper implementation to a package extension and found a way to work around defining the macro in the base package, while the implementation is in the extension.
I think the main missing ingredient now is just a couple tests, after which I think this could be ready to go.

@Jutho
Copy link
Owner

Jutho commented Jul 10, 2024

Looks great. Maybe we can use the same macro in extension package technique for @cutensor?

@lkdvos
Copy link
Collaborator Author

lkdvos commented Jul 10, 2024

I think that's definitely a good idea, but I would move that to a separate PR or commit ;)

docs/src/man/backends.md Outdated Show resolved Hide resolved
@Jutho
Copy link
Owner

Jutho commented Jul 10, 2024

Ok, if tests work I think this completes this PR.

@Jutho Jutho marked this pull request as ready for review July 10, 2024 15:30
Copy link
Collaborator Author

@lkdvos lkdvos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments, otherwise definitely looks okay to me.
I don't think we currently test the Bumper functionality at all, and @butensor seems not exported.

docs/src/man/backends.md Show resolved Hide resolved
Atemp = tensoralloc_add(eltype(A), A, pA, conjA, true, allocator)
= permutedims!(Atemp, A, linearize(pA))
Atemp = tensoralloc_add(eltype(A), A, pA, conjA, Val(true), allocator)
Atemp = permutedims!(Atemp, A, linearize(pA))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, there is no guarantee that the lhs is the same object as the one we allocated. In that case, we created a memory leak here. In practice this probably does not really happen, but this is definitely why I used different variable names in my initial change.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, I do think permutedims! does guarantee to store the result in the first argument, i.e. this should work without the Atemp = part.

  permutedims!(dest, src, perm)

  Permute the dimensions of array src and store the result in the array dest. perm is a vector specifying a permutation of length ndims(src). The preallocated array dest should have size(dest) == size(src)[perm] and is completely
  overwritten. No in-place permutation is supported and unexpected results will happen if src and dest have overlapping memory regions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the docstring guarantees this, but for example, in TensorOperations with AD, we abuse the fact that the macro automatically adds C = tensoradd!(C,...) to hack into the system and make the lhs a copy. In principle anyone could do something like this for a custom type, and then the memory management chain would be broken (doom thinking here)

src/implementation/strided.jl Outdated Show resolved Hide resolved
src/implementation/strided.jl Outdated Show resolved Hide resolved
src/implementation/ncon.jl Outdated Show resolved Hide resolved
test/butensor.jl Outdated Show resolved Hide resolved
@Jutho Jutho merged commit 754aa96 into master Jul 12, 2024
15 checks passed
@lkdvos lkdvos deleted the ld-ptrarrays branch July 12, 2024 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TensorOperationscuTENSORExt fails to compile Manual allocation strategy
2 participants