-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to debug index usage crashes? #220
Comments
@laurikoobas how big a cluster are you using? and what is the node configuration? |
Running it as an AWS Glue Job on 40 DPUs. It makes sense that the polygon dataset is the cause of this, but I can't share it. What would be something in the polygons that would make the index use an issue though? |
I'm not familiar with Glue, but I think the amount of memory you need for these polygons might be tipping you over the 5GB limit you have set for the YARN job... what index precision are you using? |
Used just the 30 that's in the example. Do you have guidelines or documentation on what it means and which values make sense for which use cases? |
You want to pick a precision that can eliminate a large fraction of polygons..eg if your polygons are US states and you pick say precision of 10/15 each polygon roughly falls into O(1) grids at that precision If you pick precision 30 that still holds true but we not spend more time computing the grids that overlap with the polygon and more space storing those grids since there will be a lot more of them now |
precision is nothing but the geohash precision https://gis.stackexchange.com/questions/115280/what-is-the-precision-of-a-geohash instead of characters, we are using the bit size (so to convert to geohash character length simply divide by 5). eg, precision of 35 = 7 character geohash |
My code was successfully running with 350 million points and 300 polygons.
Now the number of polygons went up to 450 and it started crashing. I did some tests and it still crashes with 10 points (not 10 million, just 10) and those 450 polygons. It's still fine if I limit the number of polygons to 300 though.
Right now I just disabled the index use, but I'd like to get to the root of the issue. Could the problem be in a weird polygon? The largest polygon we have has 174 points.
During my tests, these were some of the error messages:
The text was updated successfully, but these errors were encountered: