Revise KDTree implementation to store coordinates in dense arrays #333

tpietzsch · 2023-07-14T22:37:44Z

This PR revises the implementation of KDTree.

The old implementation created KDTreeNode objects for each point, with references to children to represent the tree structure.

The new implementation does not store explicit links. Instead, coordinates of all points are stored in a flat double[] array
(or alternatively one double[] array for each dimension). The tree structure is represented purely by the order of points in the array.

The nodes in the tree are arranged in Eytzinger layout (children of i are at 2i and 2i+1). Additionally, pivot indices are chosen such that "leaf layers" are filled from the left.
For example 10 nodes will always be arranged like this:

             0
          /     \
        1         2
      /   \     /   \
     3     4   5     6
    / \   /
   7   8 9

never like this:

             0
          /     \
        1         2
      /   \     /   \
     3     4   5     6
    /         /     /
   7         8     9

By choosing pivots in this way, the tree structure is fully determined. For every node index, the child indices can be calculated without dependent reads. And iff the calculated child index is less than the number of nodes, the child exists.

Less memory overhead

This has considerably less memory overhead: For example we store 1,000,000 3D RealPoints into a KDTree.
The list of points (ArrayList object and the RealPoint objects) requires ~82MB of memory.
The old KDTree added ~90MB on top of that, just for storing the tree structure.
Despite copying the point coordinates, the new KDTree only adds ~22MB (which is basically just the double[3_000_000] array required for storing the coordinates.)

Better performance

Performance of building trees is 2x better than the old implementation. Neighbor searches are slightly faster.
By using a different kthElement implementation, this PR also resolves #326. (building the tree is fast, also in the case that many points have identical coordinates in a particular dimension).

Easy Serialization

Note that these KDTrees can be serialized very efficiently (for example into N5), by just storing/loading the flattened coordinate array as is (avoiding the overhead of serializing the object structure of the interlinked KDTreeNodes).

StephanPreibisch · 2023-07-15T12:34:13Z

Just tested this using BigStitcher's interest point-based registration, works!

StephanPreibisch · 2023-07-15T12:43:21Z

check this out @minnerbe ... we can make the math how many points we can maximally support

tpietzsch · 2023-07-16T10:01:18Z

check this out @minnerbe ... we can make the math how many points we can maximally support

You should be able to use 2^31-8 points (2,147,483,640).

StephanPreibisch · 2023-07-16T13:44:59Z

Would it be too hard to back it with a RandomAccessibleInterval instead? ☺️

tpietzsch · 2023-07-16T19:11:27Z

Lets talk about it when you have used it for 2^31-8 points

minnerbe · 2023-07-16T20:14:14Z

Looks great! The number of points that can be stored depends on the storage layout, for FLAT, this is (2^31-8) / numDimensions (~700mi in 3D). Is there any (noticeable) performance difference between FLAT and NESTED, @tpietzsch? It seems like the improvement factors you provided were done with the FLAT layout.

Would it be too hard to back it with a RandomAccessibleInterval instead? relaxed

Do you mean positions, values, or both? This doesn't look too hard, though, since there are dedicated wrappers for accessing both of those data structures.

Also, I noticed two things while glancing over the code:

Why does KDTreeData store the position array? It looks to me as if all access was done in KDTreeImpl, which also stores the positions. Imo, the conditional null assignment to positions or flatPositions in KDTreeData looks like the position arrays don't really belong there.
The methods in KDTreeImpl suggest to me that they return a KDTreeNode, when they only return an int. It would be more consistent to me to name them leftIndex, rightIndex and so on.

tpietzsch · 2023-07-18T08:44:50Z

Looks great! The number of points that can be stored depends on the storage layout, for FLAT, this is (2^31-8) / numDimensions (~700mi in 3D). Is there any (noticeable) performance difference between FLAT and NESTED, @tpietzsch? It seems like the improvement factors you provided were done with the FLAT layout.

Yes, improvements are with FLAT layout. (You can experiment for yourself a bit with KDTreeBenchmark. The options for forcing NESTED layout are there for building KDTreeData somehwere, but not pulled all the way through to KDTree constructor, but it shouldn't be too hard to tweak.)

I did benchmarks for FLAT vs NESTED. Searches are a bit faster for FLAT, probably because it's better packed for caching.

I found this in some old notes. I think it could be branch kdtree-benchmark-variations in imglib2-ij repo:

Result "net.imglib2.kdtree.KDTreeBenchmark.nearestNeighborSearch":
Benchmark                              (numDataVertices)  Mode  Cnt     Score    Error  Units
KDTreeBenchmark.nearestNeighborSearch              10000  avgt    8     2,952 ±  0,008  ms/op
KDTreeBenchmark.nearestNeighborSearch             100000  avgt    8    12,747 ±  0,036  ms/op
KDTreeBenchmark.nearestNeighborSearch           10000000  avgt    8   521,957 ±  1,689  ms/op
KDTreeBenchmark.nearestNeighborSearch          100000000  avgt    8  2680,278 ± 13,609  ms/op

Result "net.imglib2.kdtree.KDTreeBenchmark.nearestNeighborSearchFlat":
Benchmark                                  (numDataVertices)  Mode  Cnt     Score    Error  Units
KDTreeBenchmark.nearestNeighborSearchFlat              10000  avgt    8     1,636 ±  0,048  ms/op
KDTreeBenchmark.nearestNeighborSearchFlat             100000  avgt    8     6,904 ±  0,112  ms/op
KDTreeBenchmark.nearestNeighborSearchFlat           10000000  avgt    8   291,768 ±  1,341  ms/op
KDTreeBenchmark.nearestNeighborSearchFlat          100000000  avgt    8  2437,730 ± 73,116  ms/op

But I don't remember what state this was (I would be surprised if the difference is still that much).

Would it be too hard to back it with a RandomAccessibleInterval instead? relaxed

Do you mean positions, values, or both? This doesn't look too hard, though, since there are dedicated wrappers for accessing both of those data structures.

Yes, it should be easy to do. I took as the subtext form @StephanPreibisch comment: "... because then we could use CellImg and have as many points as we want". And then the subtext to my reply would be "... because, try to build a tree with 2G points and see how long that takes, because it may take a prohibitively long time."

For using >2G points besides backing the positions with a CellImg, we would need to change int indices to long in many places, and that would negatively impact performance for "normal" scenarios, so I would not just do it "because we can".

My preferred solution for >2G points would be a combination oct-tree with kd-trees in the leafs.
This is more efficient to build (no sorting of the whole array of points, quite easy to parallelize).
And it should be also possible to speed up search for common scenarios (for example for rendering, we often query nearest neighbors for many nearby query points).

Also, I noticed two things while glancing over the code:

Why does KDTreeData store the position array? It looks to me as if all access was done in KDTreeImpl, which also stores the positions. Imo, the conditional null assignment to positions or flatPositions in KDTreeData looks like the position arrays don't really belong there.

KDTreeData has all data (positions arrays and values List/RandomAccessibleInterval).
KDTreeData is what you would serialize/deserialize.

KDtreeImpl is the "coordinate part" of KDTree<T>, that is, no values associated, not typed on T.
But all the "kdtree logic" happens on this level. NearestNeighborSearchOnKDTreeImpl is at the same level for example.
The part above (KDTree<T>, NearestNeighborSearchOnKDTree<T>) is pretty shallow wrappers).

I think this split makes sense.

The methods in KDTreeImpl suggest to me that they return a KDTreeNode, when they only return an int. It would be more consistent to me to name them leftIndex, rightIndex and so on.

Yes, good point

minnerbe · 2023-07-18T19:14:25Z

Thanks for the (very) detailed answer!

I run the benchmarks locally with 1M points:

Benchmark (Nested)                      Mode  Cnt    Score    Error  Units
KDTreeBenchmark.createKDTree            avgt    8  401.968 ± 24.802  ms/op
KDTreeBenchmark.kNearestNeighborSearch  avgt    8   98.354 ±  7.822  ms/op
KDTreeBenchmark.nearestNeighborSearch   avgt    8   67.467 ±  1.867  ms/op
KDTreeBenchmark.radiusNeighborSearch    avgt    8   70.195 ± 17.052  ms/op

Benchmark (Flat)                        Mode  Cnt    Score    Error  Units
KDTreeBenchmark.createKDTree            avgt    8  267.109 ±  7.922  ms/op
KDTreeBenchmark.kNearestNeighborSearch  avgt    8   82.138 ± 27.582  ms/op
KDTreeBenchmark.nearestNeighborSearch   avgt    8   62.839 ±  2.145  ms/op
KDTreeBenchmark.radiusNeighborSearch    avgt    8   64.165 ±  3.959  ms/op

You're absolutely right, the difference is not as stark anymore. However, it's still worth noting that the nested layout (almost) doubles the memory consumption in 3D (38 = 24 bytes vs 4 or 8 (reference on system with <32G / >32G, respectively) + 16 (array) + 38 = 44 or 48 bytes per coordinate).

For using >2G points besides backing the positions with a CellImg, we would need to change int indices to long in many places, and that would negatively impact performance for "normal" scenarios, so I would not just do it "because we can".

Thanks, that was the part that I overlooked! In my head, backing the positions with a CellImg didn't infer with the existing implementations.

[...] I think this split makes sense.

I do see your point. Still, my feeling is that a split into an object holding the positions and one holding the values could also be beneficial, especially when adding a backing option for positions and/or values. Right now, the number of constructors of KDTreeData seems to be a product of the number of backing options for positions and values, respectively; however, all non-creation methods in KDTreeData only use positions or values, never both.
This is, of course, just from an outside perspective, so there might be arguments against my suggestions that I just overlook.

tpietzsch · 2023-07-19T10:03:47Z

I run the benchmarks locally with 1M points:

Benchmark (Nested)                      Mode  Cnt    Score    Error  Units
KDTreeBenchmark.createKDTree            avgt    8  401.968 ± 24.802  ms/op
KDTreeBenchmark.kNearestNeighborSearch  avgt    8   98.354 ±  7.822  ms/op
KDTreeBenchmark.nearestNeighborSearch   avgt    8   67.467 ±  1.867  ms/op
KDTreeBenchmark.radiusNeighborSearch    avgt    8   70.195 ± 17.052  ms/op

Benchmark (Flat)                        Mode  Cnt    Score    Error  Units
KDTreeBenchmark.createKDTree            avgt    8  267.109 ±  7.922  ms/op
KDTreeBenchmark.kNearestNeighborSearch  avgt    8   82.138 ± 27.582  ms/op
KDTreeBenchmark.nearestNeighborSearch   avgt    8   62.839 ±  2.145  ms/op
KDTreeBenchmark.radiusNeighborSearch    avgt    8   64.165 ±  3.959  ms/op

Yes, cool! That looks as I would have expected. Which Java version did you benchmark it with?

You're absolutely right, the difference is not as stark anymore. However, it's still worth noting that the nested layout (almost) doubles the memory consumption in 3D (3_8 = 24 bytes vs 4 or 8 (reference on system with <32G / >32G, respectively) + 16 (array) + 3_8 = 44 or 48 bytes per coordinate).

I think you have the nesting order wrong. for 1000 3D points you would have a double[3][1000], so one double[3][] array with 3 elements, and each element a double[1000] array. So the memory overhead is not much. It's more about cache locality. (And perhaps one more indirect lookup for getting the double[] for dimension d and then looking up the value vs directly looking up the value in the flattened array. But I don't know how much of that is "speculated away".)

[...] I think this split makes sense.

I do see your point. Still, my feeling is that a split into an object holding the positions and one holding the values could also be beneficial, especially when adding a backing option for positions and/or values. Right now, the number of constructors of KDTreeData seems to be a product of the number of backing options for positions and values, respectively; however, all non-creation methods in KDTreeData only use positions or values, never both. This is, of course, just from an outside perspective, so there might be arguments against my suggestions that I just overlook.

That is a good point. Splitting into into an object holding the positions and one holding the values makes sense. Probably having KDTreeData as a container for the two is still useful, but not sure. Looking at the code again, it is also not ideal that KDTreeData does the lazy bounding box computation, that goes a bit against the "pure data container" idea, but also maybe it's pragmatically okay.

If you want to play around with splitting up KDTreeData, that would be cool. There is more related work in https://github.com/imglib/imglib2-algorithm/tree/kdtree (more searches) and https://github.com/tpietzsch/imglib2-pointcloud (N5 serialization experiments). It would be cool if you pull these along for refactoring.

minnerbe · 2023-07-20T08:32:11Z

Yes, cool! That looks as I would have expected. Which Java version did you benchmark it with?

Azul Zulu JDK 17.

I think you have the nesting order wrong.

Jep, my bad. So, the memory overhead is negligible and the runtime is also only slightly worse. That's great!

Looking at the code again, it is also not ideal that KDTreeData does the lazy bounding box computation, that goes a bit against the "pure data container" idea, but also maybe it's pragmatically okay.

The "correct" thing (modeling-wise) would probably be to eagerly compute the bounding box and pass it to the constructor. But from a performance perspective, this is strictly worse than the current lazy computation, which is definitely "pragmatically okay". Also, KDTreeData seems like worth keeping for serialization?

If you want to play around with splitting up KDTreeData, that would be cool. There is more related work in https://github.com/imglib/imglib2-algorithm/tree/kdtree (more searches) and https://github.com/tpietzsch/imglib2-pointcloud (N5 serialization experiments). It would be cool if you pull these along for refactoring.

Sounds good, I'll give it a try in the next few days.

for consistency, because the other NeighborSearch interfaces have copy() methods

…ible

tpietzsch · 2024-04-02T12:23:26Z

I incorporated changes from #334

minnerbe mentioned this pull request Aug 12, 2023

Refactor/split kdtree #334

Closed

tpietzsch and others added 15 commits March 30, 2024 18:10

Add static ListImg.wrap() constructor method that wraps an existing List

f2f03fe

Add default getDistance() implementation for neighbor search interfaces

1430b75

Add RadiusNeighborSearch.copy() method

dca6cd5

for consistency, because the other NeighborSearch interfaces have copy() methods

Revise KDTree implementation

d8e622e

Replace explicit type arguments with <>

6dc58e0

Move neighbor search related tests to net,imglib2.neighborsearch

0606781

fix javadoc error

d4f3ea6

Carve out values from KDTreeData

f93d86d

Separate actual positions from tree implementation

25e2a87

Store only KDTreePositions in KDTreeData

981ef4d

Add some convenience functions used in serialization

d67bbed

Consolidate KDTreeValues

cdaffd6

Decouple creation logic of positions and values

4877374

formatting

d3c949a

Merge KDTreeValues back into KDTreeData. Reduce visibility where poss…

6b75991

…ible

tpietzsch force-pushed the kdtree branch from 72b7e9c to 6b75991 Compare April 2, 2024 12:20

tpietzsch merged commit cd06a18 into master Apr 3, 2024
1 check passed

tpietzsch deleted the kdtree branch April 3, 2024 12:00

tpietzsch added a commit to tpietzsch/TrackMate that referenced this pull request Apr 5, 2024

Adapt API changes introduced by imglib/imglib2#333

b2cc9e1

tpietzsch mentioned this pull request Apr 5, 2024

Fix error wrt changed API in imglib trackmate-sc/TrackMate#289

Merged

This was referenced Apr 16, 2024

Building a kdTree is slow for repeated values #326

Closed

Adapt KDTree algorithms wrt changes in imglib2-6.4.0 imglib/imglib2-algorithm#102

Merged

tpietzsch added a commit to tpietzsch/TrackMate that referenced this pull request Aug 28, 2024

Adapt API changes introduced by imglib/imglib2#333

377e5a4

tpietzsch added a commit to tpietzsch/TrackMate that referenced this pull request Aug 28, 2024

Adapt API changes introduced by imglib/imglib2#333

514d5db

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revise KDTree implementation to store coordinates in dense arrays #333

Revise KDTree implementation to store coordinates in dense arrays #333

tpietzsch commented Jul 14, 2023

StephanPreibisch commented Jul 15, 2023

StephanPreibisch commented Jul 15, 2023

tpietzsch commented Jul 16, 2023

StephanPreibisch commented Jul 16, 2023

tpietzsch commented Jul 16, 2023

minnerbe commented Jul 16, 2023

tpietzsch commented Jul 18, 2023

minnerbe commented Jul 18, 2023

tpietzsch commented Jul 19, 2023

minnerbe commented Jul 20, 2023

tpietzsch commented Apr 2, 2024

Revise KDTree implementation to store coordinates in dense arrays #333

Revise KDTree implementation to store coordinates in dense arrays #333

Conversation

tpietzsch commented Jul 14, 2023

Less memory overhead

Better performance

Easy Serialization

StephanPreibisch commented Jul 15, 2023

StephanPreibisch commented Jul 15, 2023

tpietzsch commented Jul 16, 2023

StephanPreibisch commented Jul 16, 2023

tpietzsch commented Jul 16, 2023

minnerbe commented Jul 16, 2023

tpietzsch commented Jul 18, 2023

minnerbe commented Jul 18, 2023

tpietzsch commented Jul 19, 2023

minnerbe commented Jul 20, 2023

tpietzsch commented Apr 2, 2024