Reduce peak memory usage in single vector case #2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR updates both CPU and GPU versions of the dist matrix calculation such that the pairwise distances are accumulated in a 1D array of size (m(m-1) // 2 - m) rather than the full m**2 array, reducing peak memory usage by around half.
Also includes all the changes from #1, which should be merged first. The commit d5034b9 contains all the real changes of this PR.
Outstanding questions:
scipy.spatial.pdist
rather thancdist
. It would be simple enough to then create the big 2D array afterwards (by default?) if you want to keep the same API, although of course in the CPU case this requires even larger peak memory than the current status!There's one outstanding wrinkle where the CUDA sparse calculation works but dense has some issue, will investigate soon.