Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speed up Relate/Contains with an rstar backed edge set intersector #829

Merged
merged 5 commits into from
May 27, 2022

Conversation

michaelkirk
Copy link
Member

@michaelkirk michaelkirk commented May 10, 2022

  • I agree to follow the project's code of conduct.
  • I added an entry to CHANGES.md if knowledge of this change could be valuable to users.

Fixes #649 (includes some good context too)

Please note that I'm changing the bounds on GeoFloat to be compatible with RTreeNum. I think it's unlikely that anyone is using unbounded or unsigned Floats, but let me know if you think otherwise.

Perf Highlights:

Larger overlapping geometries can hugely benefit over the naive O(n2)version — for example an ~80x speedup in these two:

large rotated polygons  time:   [8.0515 ms 8.0733 ms 8.0978 ms]
                        change: [-98.779% -98.776% -98.772%] (p = 0.00 < 0.05)
                        Performance has improved.

offset polygons         time:   [7.8104 ms 7.8154 ms 7.8218 ms]
                        change: [-98.818% -98.817% -98.816%] (p = 0.00 < 0.05)
                        Performance has improved.

But not everything is faster. In particular, small geometries don't benefit much from the lower O(nlg(n)), while still paying the tax of loading the RTree.

The JTS test suite has a bunch of tests, but they're almost all comparing small geometries with other small geometries:

entire jts test suite   time:   [1.0457 ms 1.0502 ms 1.0549 ms]
                        change: [+24.311% +24.820% +25.351%] (p = 0.00 < 0.05)

Perhaps the worst case is comparing a big geometry to a small one. These used to be pretty fast due to a small n, but now, because we have to pay the tax of loading the big geometry into the RTree, but only perform a small number of queries against it, it results in a ~5x loss:

line across complex polygon                                                                                                             
                        time:   [415.72 us 417.35 us 419.23 us]                                                                         
                        change: [+487.88% +490.58% +494.00%] (p = 0.00 < 0.05)                                                          
                        Performance has regressed.    

Despite some of the regression with small geometries, I think this change is likely to be a big win for the kinds of operations people are likely to do in the real world. But I wanted to include these benched regressions to show some of the tradeoffs and opportunities for future work.

Full bench output
$ cargo bench --bench "*" -- --baseline rstar-edge-set-intersector-baseline
   Compiling geo v0.20.1 (/Users/mkirk/src/georust/geo/geo)
   Compiling jts-test-runner v0.1.0 (/Users/mkirk/src/georust/geo/jts-test-runner)
    Finished bench [optimized] target(s) in 13.22s
     Running unittests (target/release/deps/area-0b8c1a820085a672)
Gnuplot not found, using plotters backend
area                    time:   [8.5296 us 8.5386 us 8.5485 us]
                        change: [-0.1712% +0.0029% +0.1847%] (p = 0.98 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
 Running unittests (target/release/deps/concave_hull-2f7149869f3a316b)

Gnuplot not found, using plotters backend
concave hull f32 time: [3.8781 ms 3.8865 ms 3.8957 ms]
change: [-0.1281% +0.1487% +0.4656%] (p = 0.34 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
8 (8.00%) high mild

concave hull f64 time: [4.4244 ms 4.4315 ms 4.4392 ms]
change: [-0.6194% -0.2200% +0.1702%] (p = 0.29 > 0.05)
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

 Running unittests (target/release/deps/contains-e4f5f16e04689e1f)

Gnuplot not found, using plotters backend
point in simple polygon time: [31.625 ns 31.649 ns 31.675 ns]
change: [-0.1575% +0.0085% +0.1803%] (p = 0.92 > 0.05)
No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
5 (5.00%) high mild
2 (2.00%) high severe

point outside simple polygon
time: [5.9312 ns 5.9391 ns 5.9481 ns]
change: [-2.8268% -2.5942% -2.3294%] (p = 0.00 < 0.05)
Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
1 (1.00%) low mild
5 (5.00%) high mild
7 (7.00%) high severe

point inside complex polygon
time: [11.856 us 11.865 us 11.875 us]
change: [-0.2277% -0.0962% +0.0346%] (p = 0.15 > 0.05)
No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
2 (2.00%) high mild

point outside complex polygon
time: [9.1280 us 9.1374 us 9.1468 us]
change: [+0.0772% +0.2399% +0.3912%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low severe
2 (2.00%) low mild
1 (1.00%) high severe

line across complex polygon
time: [415.72 us 417.35 us 419.23 us]
change: [+487.88% +490.58% +494.00%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe

complex polygon contains polygon
time: [635.00 us 636.57 us 638.18 us]
change: [-90.360% -90.336% -90.307%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

 Running unittests (target/release/deps/convex_hull-8bbe3f1942adc5d6)

Gnuplot not found, using plotters backend
convex hull f32 time: [249.90 us 254.40 us 261.37 us]
change: [+0.2974% +1.5514% +3.2978%] (p = 0.03 < 0.05)
Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
8 (8.00%) high mild
6 (6.00%) high severe

convex hull f64 time: [248.96 us 249.61 us 250.24 us]
change: [-0.8463% -0.4822% -0.1187%] (p = 0.01 < 0.05)
Change within noise threshold.

convex hull with collinear random i64
time: [50.107 ms 50.178 ms 50.255 ms]
change: [-0.0373% +0.1351% +0.2966%] (p = 0.13 > 0.05)
No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe

 Running unittests (target/release/deps/euclidean_distance-54438be5bae59208)

Gnuplot not found, using plotters backend
Polygon Euclidean distance RTree f64
time: [7.7357 us 7.7431 us 7.7524 us]
change: [-0.3278% -0.2016% -0.0743%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe

Polygon Euclidean distance rotating calipers f64
time: [4.0159 us 4.0252 us 4.0360 us]
change: [-1.0307% -0.7785% -0.5178%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
8 (8.00%) high mild
3 (3.00%) high severe

 Running unittests (target/release/deps/extremes-8dffeefd89ec04f4)

Gnuplot not found, using plotters backend
extremes f32 time: [17.528 us 17.539 us 17.551 us]
change: [-0.3366% -0.1541% +0.0197%] (p = 0.09 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe

extremes f64 time: [17.424 us 17.430 us 17.436 us]
change: [-0.6778% -0.5158% -0.3505%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 17 outliers among 100 measurements (17.00%)
2 (2.00%) low mild
11 (11.00%) high mild
4 (4.00%) high severe

 Running unittests (target/release/deps/frechet_distance-9e7bcb0179a3b5ed)

Gnuplot not found, using plotters backend
Benchmarking frechet distance f32: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.5s, enable flat sampling, or reduce sample count to 50.
frechet distance f32 time: [1.6973 ms 1.6988 ms 1.7006 ms]
change: [-1.3785% -1.0898% -0.7948%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe

Benchmarking frechet distance f64: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.8s, enable flat sampling, or reduce sample count to 50.
frechet distance f64 time: [1.7609 ms 1.7647 ms 1.7685 ms]
change: [-0.0329% +0.1653% +0.3541%] (p = 0.09 > 0.05)
No change in performance detected.
Found 19 outliers among 100 measurements (19.00%)
3 (3.00%) high mild
16 (16.00%) high severe

 Running unittests (target/release/deps/geodesic_distance-1c6fe3c7253f5c46)

Gnuplot not found, using plotters backend
geodesic distance f64 time: [526.69 ns 529.63 ns 533.96 ns]
change: [-0.7018% -0.3981% -0.0753%] (p = 0.02 < 0.05)
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
5 (5.00%) high mild
1 (1.00%) high severe

 Running unittests (target/release/deps/intersection-802c59c2adfd8419)

Gnuplot not found, using plotters backend
Benchmarking intersection: Warming up for 3.0000 s
Warning: Unable to complete 10 samples in 5.0s. You may wish to increase target time to 9.4s.
intersection time: [934.42 ms 935.71 ms 937.97 ms]
change: [-0.7515% -0.4668% -0.1439%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 2 outliers among 10 measurements (20.00%)
1 (10.00%) high mild
1 (10.00%) high severe

 Running unittests (target/release/deps/relate-ceffe38ba496cd43)

Gnuplot not found, using plotters backend
relate overlapping 50-point polygons
time: [28.875 us 28.904 us 28.936 us]
change: [-7.8495% -7.6973% -7.5476%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild

Benchmarking entire jts test suite: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.3s, enable flat sampling, or reduce sample count to 60.
entire jts test suite time: [1.0457 ms 1.0502 ms 1.0549 ms]
change: [+24.311% +24.820% +25.351%] (p = 0.00 < 0.05)
Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
6 (6.00%) high mild
5 (5.00%) high severe

jts test suite matching Relate
time: [554.63 us 556.97 us 559.57 us]
change: [+28.194% +28.932% +29.682%] (p = 0.00 < 0.05)
Performance has regressed.

disjoint polygons time: [103.25 us 103.38 us 103.50 us]
change: [-0.6161% -0.4141% -0.2065%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) low mild
1 (1.00%) high mild
1 (1.00%) high severe

large rotated polygons time: [8.0515 ms 8.0733 ms 8.0978 ms]
change: [-98.779% -98.776% -98.772%] (p = 0.00 < 0.05)
Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
5 (5.00%) low severe
3 (3.00%) high mild
12 (12.00%) high severe

offset polygons time: [7.8104 ms 7.8154 ms 7.8218 ms]
change: [-98.818% -98.817% -98.816%] (p = 0.00 < 0.05)
Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
5 (5.00%) low severe
2 (2.00%) low mild
3 (3.00%) high mild
2 (2.00%) high severe

 Running unittests (target/release/deps/rotate-040ec92c58b8cab0)

Gnuplot not found, using plotters backend
rotate f32 time: [43.596 us 43.599 us 43.603 us]
change: [-0.5593% -0.4650% -0.3753%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low mild
1 (1.00%) high mild
5 (5.00%) high severe

rotate f64 time: [50.061 us 50.075 us 50.090 us]
change: [+0.4022% +0.7342% +1.0530%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe

 Running unittests (target/release/deps/simplify-aeb9769932233ce2)

Gnuplot not found, using plotters backend
simplify simple f32 time: [89.241 us 89.359 us 89.480 us]
change: [-0.5195% -0.4112% -0.2970%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 17 outliers among 100 measurements (17.00%)
11 (11.00%) low severe
3 (3.00%) high mild
3 (3.00%) high severe

simplify simple f64 time: [95.772 us 95.782 us 95.794 us]
change: [-0.5634% -0.4103% -0.2657%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
3 (3.00%) high mild
3 (3.00%) high severe

 Running unittests (target/release/deps/simplifyvw-4faef7b96d25d158)

Gnuplot not found, using plotters backend
simplify vw simple f32 time: [188.27 us 188.77 us 189.21 us]
change: [-0.3144% -0.0655% +0.1750%] (p = 0.60 > 0.05)
No change in performance detected.

simplify vw simple f64 time: [203.72 us 203.84 us 203.97 us]
change: [-1.7891% -1.0598% -0.1814%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) high mild
3 (3.00%) high severe

Benchmarking simplify vwp f32: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60.
simplify vwp f32 time: [1.1728 ms 1.1732 ms 1.1738 ms]
change: [-1.1690% -0.9161% -0.6695%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) low severe
4 (4.00%) low mild
4 (4.00%) high severe

Benchmarking simplify vwp f64: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.9s, enable flat sampling, or reduce sample count to 60.
simplify vwp f64 time: [1.1519 ms 1.1527 ms 1.1535 ms]
change: [-0.7080% -0.5331% -0.3648%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe

 Running unittests (target/release/deps/vincenty_distance-c2705d043172c943)

Gnuplot not found, using plotters backend
vincenty distance f32 time: [151.17 ns 151.17 ns 151.18 ns]
change: [-0.4311% -0.3368% -0.2455%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 18 outliers among 100 measurements (18.00%)
2 (2.00%) low mild
4 (4.00%) high mild
12 (12.00%) high severe

vincenty distance f64 time: [267.05 ns 267.05 ns 267.06 ns]
change: [-0.5073% -0.3750% -0.2535%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 16 outliers among 100 measurements (16.00%)
1 (1.00%) low mild
6 (6.00%) high mild
9 (9.00%) high severe

@michaelkirk michaelkirk force-pushed the mkirk/rstar-edge-set-intersector branch from 2b45341 to 207226b Compare May 10, 2022 02:29
@michaelkirk michaelkirk changed the title rstar edge set intersector speed up Relate/Contains with an rstar backed edge set intersector May 10, 2022
@lnicola
Copy link
Member

lnicola commented May 10, 2022

I might be missing something, but what's up with louisiana.geojson? We still seem to use the WKT version.

@lnicola
Copy link
Member

lnicola commented May 10, 2022

Perhaps the worst case is comparing a big geometry to a small one.

Could we have a user-friendly API for that? I.e. loading a geometry into an R-tree, in order to be able to reuse it.

@urschrei
Copy link
Member

Perhaps the worst case is comparing a big geometry to a small one.

Could we have a user-friendly API for that? I.e. loading a geometry into an R-tree, in order to be able to reuse it.

#803

Copied the "SimpleEdgeSetIntersector" into a new struct. I'll add the
actual RTree implementation in a follow up, and hopefully this way it'll
be easier to compare the two implementations.
If we want to rely on using an RTree for some of our operations (like
the Relate trait) we have to ensure our numeric types are RTree
compatible.

== Alternative considered

Since RTreeNum isn't necessarily a float, we could instead add these new
bounds to GeoNum instead of GeoFloat.

However, doing so would mean dropping support for unsigned ints from
GeoNum. Note that using unsigned ints now, while supported, can easily
lead to underflow if you're using one of the many operations that
involve subtraction.

It would also put one more barrier between ever getting BigDecimal
support in geo - which is not Bounded.

Also, apparently Float isn't necessarily Signed, but having never
personally encountered unsigned floating point in the wild, I don't have
strong feelings about retaining support for it.

And since Relate doesn't current support non-floats, this would be a
cost with no benefit. If that changes, we could reconsider this
decision, or perhaps add the required behavior to some derivative type,
like one of the HasKernel implementations.
@michaelkirk michaelkirk force-pushed the mkirk/rstar-edge-set-intersector branch from 207226b to 68a4c40 Compare May 10, 2022 16:23
@michaelkirk
Copy link
Member Author

I might be missing something, but what's up with louisiana.geojson? We still seem to use the WKT version.

Thank you for catching this @lnicola. I've amended the commit to remove it.

@michaelkirk
Copy link
Member Author

Could we have a user-friendly API for that? I.e. loading a geometry into an R-tree, in order to be able to reuse it.

I think @urschrei's proposal for some kind of pre-indexed geometry makes sense for this. I'm happy to work on that unless someone else is planning on it. I'd prefer it to be a followup PR though, as I feel this one stands as a net win on its own.

@lnicola
Copy link
Member

lnicola commented May 10, 2022

Yeah, of course, no need to block this one.

@michaelkirk michaelkirk requested a review from rmanoka May 26, 2022 00:12
Copy link
Contributor

@rmanoka rmanoka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@michaelkirk
Copy link
Member Author

bors r=rmanoka

@bors
Copy link
Contributor

bors bot commented May 27, 2022

Build succeeded:

@bors bors bot merged commit 8b701a1 into main May 27, 2022
@bors bors bot deleted the mkirk/rstar-edge-set-intersector branch May 27, 2022 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

perf: speed up Relate trait with an R-Tree backed EdgeSetIntersector
4 participants