-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: allow other predicates for determining leaf nodes than node capacity #6
Comments
Hi! :) Why do you think this would increase the io perf on your ds? Can you explain a bit more where the bottleneck is? |
I have a problem where I need to compute a statistic across a large, lazy, very slow to load raster (a 30m height map) over 250,000 polygons. I want to partition these polygons in space such that I can load sufficiently small segments of the raster, process that over the polygons in that segment, and then move on to the next segment. This drastically increases my processing times and can be made generic to essentially any dataset. This means that I basically do a depth-first search across a tree, and stop the search for a particular path down the tree if I either encounter a leaf node, or a node with small enough extent that it would fit into memory. The problem is that if the leaf node is too large, I can't fit my data into memory, and so I end up needing to process each polygon individually and load data within its bounding box from the raster. This is a time-consuming and painful operation. For reference, if I manually map the computation across 2x2 degree segments of the raster, I can get this done in an hour. Using the tiling approach, it takes me around 11 hours for the same thing. |
Ok, I see. The striping is bad indeed. Maybe an R* tree would perform better, might be worth to look into https://infolab.usc.edu/csci599/Fall2001/paper/rstar-tree.pdf R* trees seem to look less striped (best viewed with white background): Actually, I'm not an expert in spatial trees, just coded this up in an afternoon cause I needed some minimal solution. So there could also be a bug in the splitting logic, but I'm pretty sure I nailed https://apps.dtic.mil/sti/tr/pdf/ADA324493.pdf Feel free to check! I don't know what the best way forward is:
The easiest way to achieve what you need would probably be to implement the custom predicate you mentioned. Maybe like so:
Or how did you imagine it? If you can come up with a clean idea how different splitting logics could be implemented, I wouldn't hold up the merge ;-) However, this sounds like it's rather something for a more abstract package like an imaginary "SpatialTrees". |
I want to partition a dataset of geometries into, finally, leaf nodes whose areas are approximately the same. I don't really care about the number of geometries per node, since my problem is I/O on the raster dataset I'm using.
My understanding is that this would need a custom predicate that decides when to split, and it could receive a vector of the extents of the geometries as input.
Would there be interest in a PR that adds this feature? I could code it up, but curious to hear your thoughts @maxfreu.
The text was updated successfully, but these errors were encountered: