-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manual allocation strategy #143
Comments
As promised, I've started experimenting with some strategies, it is still very much WIP but you can follow it here: AllocationKit So far I've tried just plain malloc/free, and the finalizer version which is less prone to memory leaks, and initial benchmarks on my laptop report the underlying times. The testcase is the application of a single-site update encountered for example in DMRG or VUMPS for an Ising-like hamiltonian, repeated 30 times. (benchmark here ) In summary, it seems as if just doing this already leads to various improvements, depending on the tensor sizes. I might continue playing around this week with something like Bumper.jl, allthough I ran into some issues trying to get the D = 64: julia> result["default"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
Range (min … max): 400.062 μs … 715.187 μs ┊ GC (min … max): 13.62% … 35.15%
Time (median): 509.756 μs ┊ GC (median): 11.73%
Time (mean ± σ): 503.847 μs ± 51.078 μs ┊ GC (mean ± σ): 13.61% ± 3.73%
▁ ▂▂█▅▁▁ ▁
▃▂▄▄█▄▇▄▇▆▅▅▅▄▃▃▃▄▄▄▄▄▅▄▄▄▄▆██████▆▅█▇█▇▇█▅▆▅▅▅▄▄▁▃▃▄▂▃▃▁▃▃▃▃ ▄
400 μs Histogram: frequency by time 617 μs <
Memory estimate: 1.63 MiB, allocs estimate: 38.
julia> result["malloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
Range (min … max): 271.277 μs … 727.405 μs ┊ GC (min … max): 0.00% … 37.25%
Time (median): 345.764 μs ┊ GC (median): 13.97%
Time (mean ± σ): 360.340 μs ± 61.898 μs ┊ GC (mean ± σ): 8.75% ± 6.93%
▁▄▅ ▃▄▆▃█▅ ▁ ▂▁ ▃
▃████▄▆▆█▆▅▆▄▄▇████████▇████▆▅▅▅▆▃▅▄▃▃▄██▇▅▄▄▁▃▁▁▁▁▁▁▁▁▄█▇█▇▃ ▅
271 μs Histogram: frequency by time 493 μs <
Memory estimate: 899.11 KiB, allocs estimate: 39.
julia> result["safemalloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
Range (min … max): 276.067 μs … 568.403 μs ┊ GC (min … max): 0.00% … 37.13%
Time (median): 362.419 μs ┊ GC (median): 13.79%
Time (mean ± σ): 365.961 μs ± 39.841 μs ┊ GC (mean ± σ): 8.66% ± 6.82%
▃▃▃▇▅▁ ▁ ▁▇█▇▃▃
▃▅▄▄▆▄▄▂▃▃▂▁▅▃▃▂▃▃▅▆██████▇█▅▇▄▆▅▄▃▄▃▄▂▆████████▅▄▄▃▃▄▂▁▁▁▁▁▂ ▄
276 μs Histogram: frequency by time 448 μs <
Memory estimate: 899.11 KiB, allocs estimate: 39. D = 128 julia> result["default"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
Range (min … max): 1.929 ms … 3.871 ms ┊ GC (min … max): 11.19% … 6.77%
Time (median): 2.403 ms ┊ GC (median): 10.51%
Time (mean ± σ): 2.416 ms ± 271.936 μs ┊ GC (mean ± σ): 10.48% ± 1.48%
▁ ▃▄▇█▅
▄▅▅▅▄▄▄▃▄▃▃▃▃▃▄▆███████▇▆▅▅▄▄▅▅▃▃▂▃▂▃▃▂▁▃▃▃▂▁▃▃▃▃▃▂▃▂▂▃▁▂▁▃ ▃
1.93 ms Histogram: frequency by time 3.29 ms <
Memory estimate: 6.50 MiB, allocs estimate: 38.
julia> result["malloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
Range (min … max): 1.625 ms … 3.481 ms ┊ GC (min … max): 5.90% … 3.47%
Time (median): 1.734 ms ┊ GC (median): 6.02%
Time (mean ± σ): 1.886 ms ± 299.257 μs ┊ GC (mean ± σ): 6.39% ± 1.41%
▂█▃▇▁
▅█████▆▄▃▄▅▄▄▃▃▄▃▃▃▃▃▂▃▂▂▃▃▃▃▃▃▃▃▂▃▃▃▃▃▃▄▂▂▂▂▂▂▁▃▂▃▂▂▂▂▁▁▁▂ ▃
1.62 ms Histogram: frequency by time 2.85 ms <
Memory estimate: 3.50 MiB, allocs estimate: 39.
julia> result["safemalloc"][D]
BenchmarkTools.Trial: 500 samples with 30 evaluations.
Range (min … max): 1.625 ms … 2.556 ms ┊ GC (min … max): 6.02% … 7.28%
Time (median): 1.716 ms ┊ GC (median): 6.02%
Time (mean ± σ): 1.761 ms ± 134.372 μs ┊ GC (mean ± σ): 6.81% ± 1.37%
▂▄█▄▄▃▆▆
▃▄█████████▇▇▅▄▄▃▃▃▃▃▃▄▃▁▂▃▃▃▃▃▁▂▃▃▃▃▃▃▂▂▃▂▂▃▂▁▂▃▂▁▂▂▁▂▂▁▂▃ ▃
1.63 ms Histogram: frequency by time 2.26 ms <
Memory estimate: 3.50 MiB, allocs estimate: 39. |
Nice! And cool that this is very simple to implement with the |
Regarding the discussion in #140, do you think the
tensoralloc
,tensorfree
interface is compatible with approaches like this?Could one alternatively write backend code that uses
malloc
andfree
to avoid GC? Would this require introducing a special array wrapper type?Is it possible to guarantee that manually allocated intermediates get freed in case of an exception, within the current interface? Would this involve defining a finalizer for the array wrapper type?
The text was updated successfully, but these errors were encountered: