-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent class attributes across countries #26
Comments
This is surprising - in our geoBoundaries builds, we actually cast everything to multi: self.geomDta["geometry"] = [MultiPolygon([feature]) if type(feature) == Polygon else feature for feature in self.geomDta["geometry"]] This is applied to every single data layer, so I'm not sure how we could have Polygons (not MultiPolygon's) slipping through. As you point out, in our geoJSON PRY is cast to Polygon: We're in the middle of refactoring our build code, so I'll put this high on our list of bugs to fix. This won't be an immediate turnaround (as it impacts the actual core builds, not just the R package), and my suspicion is this is actually an upstream issue with Mapshaper, which is what we use to build the final files. If that is the case, the solution is going to be - in the build around line 875 (https://github.com/wmgeolab/geoBoundaryBot/blob/86f05d9bd2a9df4c075b6c3c247b0ca548ef5e5c/geoBoundaryBuilder/builderClass.py#L875), we'll need to add a routine that re-loads each file, recasts to multiPolygon (again?), and then re-saves. At least until we can figure out why mapshaper isn't pushing to Multi in some cases (which we will need to confirm is the case, by looking at the tmpJson that we output from geoPandas). Also of note - we actually already reload the files after mapshaper builds them to fix a projection issue with the JSONs (lines 872-873, and then again on 887-888). Adding a multiPolygon recast there wouldn't be hard to implement at all, if needed in the near-term to fix this. I'll look into doing that for a short-term bandaid. @rohith4444 see the above notes for our package development and k8s refactor. |
I have made a small edit to the build, and am going to try and run one this evening to see if it at least gives a short-term resolution: tmpGeomJSONproj_simplified_multi = tmpGeomJSONproj_simplified.cast("MultiPolygon") This is then written out with the correct EPSG codes in the next step. This is a pretty hacky / short-term solution, but may give us an immediate resolution here, and then we can work on a better/longer term fix during our refactor. I'll try to remember to loop back to this tomorrow morning after a build is run with the new code. |
Update: this wasn't quite as simple as I had thought, so trying some new solutions this AM. Current test - after mapshaper json warp - is reloading and running: def to_multipolygon(geom):
if isinstance(geom, Polygon):
return MultiPolygon([geom])
return geom
tmpGeomJSONproj_multi = tmpGeomJSONproj.geometry.apply(to_multipolygon) If this goes, I'll do a full build this AM. |
Ok - that seems to do it, so moving forward with this bandaid fix. |
Thanks so much, @DanRunfola! |
Just FYI, this fix is now in for our geoJSONs, but I need to do a little more testing before I can apply it to the shapefiles, which the R package relies on. Hoping to have time to do that in the coming day or two (this afternoon with some real luck). |
To followup here - this is actually trickier than originally thought, as the "simple" fix to change all types results in corrupted attribute tables. I have isolated the issue to our use of mapshaper for our simplification, and we will probably need to look into deprecating that in favor of some of our own algorithms to really "fix" this, which may be a ways off. Open to any suggestions on how we might make this easier, but I don't have any "quick" fixes in mind right this moment. Mapshaper is just too important for the process to do a drop-in replacement (it's wicked fast!). |
I would imagine that simply calling |
We'll give that a shot - just pinged @rohith4444 to prioritize this one. |
Hi @noahdasanaike, Thank you for bringing this to us. I've implemented the suggested modification to cast all polygons to a minimum common geometry type (like multipolygon) before returning. I've tested the functionality with the query you provided countries <- map(c("Canada", "Mexico", "Brazil", "Argentina", "Uruguay", "Paraguay"), ~rgeoboundaries::gb_adm0(.)) , and I didn't encounter any errors. It seems like the issue has been resolved. Please feel free to try it out on your end and let me know if you encounter any further issues. Best regards, |
When querying certain countries, as so:
countries <- map(c("Canada", "Mexico", "Brazil", "Argentina", "Uruguay", "Paraguay"), ~rgeoboundaries::gb_adm0(.))
,the resulting geometry columns have inconsistent class attributes (multi polygon or polygon) and therefore cannot be easily grouped together; i.e., the following call fails:
data.table::rbindlist(countries)
.This may be an intended feature rather than an issue per say but there should be no harm in casting all polygons to a minimum common geometry type (like multi polygon) before returning.
The text was updated successfully, but these errors were encountered: