Alternative Allocators #182

lkdvos · 2024-07-05T15:25:51Z

This PR attempts to add some additional allocator strategies to the toolbox, focused around dense arrays of isbits types.

I added some rudimentary support for PtrArrays.jl, which provides a manual way of implementing malloc and free for the temporaries. My first tests seem to indicate however that this does not improve the performance (even making it slightly worse in most cases). This probably requires more investigation, as it seems unlikely that this should be happening.

I also added support for Bumper.jl. Here, I use their buffer types as an allocator, which means that it is quite easy to manually make use of the bumper interface as follows:

buf = Bumper.default_buffer()
@no_escape buf begin
    @tensor allocator=buf tensorexpr...
end

Nevertheless, for further automation, I also added the convenience @butensor macro, which does exactly that.

Some implementation notes:

In order to make this work, the current way of dispatching with StridedViews and choosing between GPU and CPU definitely does not work. Both these options require a parent type of the StridedView which is not Array (but is DenseArray!), which would now be unsupported. I could add manual select_backend procedures for this, but I am a bit scared of the ambiguities, as I don't want to have to deal with the many combinations. This should probably be reconsidered.

In principle the current implementation of the Bumper methods could be part of a package extension, as it does not even require a definition of an Allocator type. Is this something we would like?

We should probably invest some time in a proper benchmark suite, as it is quite hard to gauge the effectiveness of these methods.

lkdvos · 2024-07-08T16:04:38Z

Small note to myself as well: I think there are some tensoralloc calls that don't have a matching free statement in the base implementations. I'll try to fix this in this PR

src/implementation/allocator.jl

src/implementation/base.jl

lkdvos · 2024-07-10T06:51:45Z

I moved the Bumper implementation to a package extension and found a way to work around defining the macro in the base package, while the implementation is in the extension.
I think the main missing ingredient now is just a couple tests, after which I think this could be ready to go.

Jutho · 2024-07-10T07:42:50Z

Looks great. Maybe we can use the same macro in extension package technique for @cutensor?

lkdvos · 2024-07-10T08:00:20Z

I think that's definitely a good idea, but I would move that to a separate PR or commit ;)

docs/src/man/backends.md

Jutho · 2024-07-10T13:45:33Z

Ok, if tests work I think this completes this PR.

lkdvos

Left some minor comments, otherwise definitely looks okay to me.
I don't think we currently test the Bumper functionality at all, and @butensor seems not exported.

docs/src/man/backends.md

lkdvos · 2024-07-10T15:42:36Z

src/implementation/base.jl

-    Atemp = tensoralloc_add(eltype(A), A, pA, conjA, true, allocator)
-    Ã = permutedims!(Atemp, A, linearize(pA))
+    Atemp = tensoralloc_add(eltype(A), A, pA, conjA, Val(true), allocator)
+    Atemp = permutedims!(Atemp, A, linearize(pA))


Technically, there is no guarantee that the lhs is the same object as the one we allocated. In that case, we created a memory leak here. In practice this probably does not really happen, but this is definitely why I used different variable names in my initial change.

I am not sure, I do think permutedims! does guarantee to store the result in the first argument, i.e. this should work without the Atemp = part.

permutedims!(dest, src, perm) Permute the dimensions of array src and store the result in the array dest. perm is a vector specifying a permutation of length ndims(src). The preallocated array dest should have size(dest) == size(src)[perm] and is completely overwritten. No in-place permutation is supported and unexpected results will happen if src and dest have overlapping memory regions.

I know the docstring guarantees this, but for example, in TensorOperations with AD, we abuse the fact that the macro automatically adds C = tensoradd!(C,...) to hack into the system and make the lhs a copy. In principle anyone could do something like this for a custom type, and then the memory management chain would be broken (doom thinking here)

src/implementation/strided.jl

src/implementation/ncon.jl

test/butensor.jl

lkdvos requested a review from Jutho July 5, 2024 15:26

Start allocator implementations

466c115

lkdvos force-pushed the ld-ptrarrays branch from bbe1e5b to 466c115 Compare July 8, 2024 15:21

Replace istemp with Val in allocators

da0d944

lkdvos force-pushed the ld-ptrarrays branch from 5c66a54 to da0d944 Compare July 8, 2024 16:03

Jutho reviewed Jul 8, 2024

View reviewed changes

src/implementation/allocator.jl Show resolved Hide resolved

Jutho reviewed Jul 8, 2024

View reviewed changes

src/implementation/allocator.jl Outdated Show resolved Hide resolved

lkdvos added 4 commits July 9, 2024 09:39

Add tensorfree calls in base implementations

9258e21

Fix typo

1d541d3

Stricter typing in allocation functions

917eae8

Refactor allocations

4771e90

Jutho reviewed Jul 9, 2024

View reviewed changes

src/implementation/base.jl Outdated Show resolved Hide resolved

Jutho and others added 5 commits July 10, 2024 00:22

slightly modify BaseCopy implementation

077aafc

Fix some missing Vals

96498d4

Add @no_escape block in blas_contract

38a82bf

Move Bumper to package extension

5835182

Specify unexported macro for julia 1.8

88a93c5

Update docs

3052103

Jutho reviewed Jul 10, 2024

View reviewed changes

docs/src/man/backends.md Outdated Show resolved Hide resolved

Jutho reviewed Jul 10, 2024

View reviewed changes

docs/src/man/backends.md Show resolved Hide resolved

lkdvos and others added 4 commits July 10, 2024 11:59

Remove extraneous arguments

3a36fce

Fix docstring

5e82742

Link TBLIS in docs

e95d462

add ncon backend support and tests

7518ff4

improved backend tests

6197fba

Jutho marked this pull request as ready for review July 10, 2024 15:30

Jutho approved these changes Jul 10, 2024

View reviewed changes

lkdvos commented Jul 10, 2024

View reviewed changes

src/implementation/ncon.jl Outdated Show resolved Hide resolved

Jutho and others added 6 commits July 10, 2024 19:00

address first set of comments

00d4cbc

fix cuda issue

5501be3

update ncon and improve extra macros; more tests part 1

26e6584

extensions on julia v1.8

23d4bfb

Formatter [no ci]

d41e7ce

fix and test cuTENSORExt

17882ee

This was linked to issues Jul 12, 2024

Manual allocation strategy #143

Closed

TensorOperationscuTENSORExt fails to compile #174

Closed

add bumper tests

729f7b8

lkdvos commented Jul 12, 2024

View reviewed changes

test/butensor.jl Outdated Show resolved Hide resolved

fix and test zero-dimensional bumper allocations

ff89d70

Jutho merged commit 754aa96 into master Jul 12, 2024
15 checks passed

lkdvos deleted the ld-ptrarrays branch July 12, 2024 13:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative Allocators #182

Alternative Allocators #182

lkdvos commented Jul 5, 2024

lkdvos commented Jul 8, 2024

lkdvos commented Jul 10, 2024

Jutho commented Jul 10, 2024

lkdvos commented Jul 10, 2024

Jutho commented Jul 10, 2024

lkdvos left a comment

lkdvos Jul 10, 2024

Jutho Jul 10, 2024

lkdvos Jul 10, 2024

Alternative Allocators #182

Alternative Allocators #182

Conversation

lkdvos commented Jul 5, 2024

lkdvos commented Jul 8, 2024

lkdvos commented Jul 10, 2024

Jutho commented Jul 10, 2024

lkdvos commented Jul 10, 2024

Jutho commented Jul 10, 2024

lkdvos left a comment

Choose a reason for hiding this comment

lkdvos Jul 10, 2024

Choose a reason for hiding this comment

Jutho Jul 10, 2024

Choose a reason for hiding this comment

lkdvos Jul 10, 2024

Choose a reason for hiding this comment