Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

emergency services deduplication code #13

Open
wants to merge 12 commits into
base: flooding
Choose a base branch
from

Conversation

aileenmcd
Copy link
Contributor

  • Adding a function to utils.R to follow the multipolygon, polygon, points method for OSM data.
  • Additional steps taken to reduce duplications further on line 119 in cases where looks to be multiple polygons relating to same building. Assumption made that unlikely would be a building for the same service within the same OA here.
  • When a polygon or multipolygon spans over multiple OAs take the OA which has largest intersection area here to allow output to be at LSOA level.
  • Re-ran the build index script and performed checks on the effect the new service code had on the 'Community Support' domain deciles and the flooding vulnerability output deciles here.
  • Change to the social renters data as think this had been overwritten with the Wales data since the last index built for England so effected the index build results when comparing to previous result.

Copy link
Contributor

@MikeJohnPage MikeJohnPage left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left you some feedback to review. Interested on your thoughts 👍

Comment on lines +378 to +379
points_not_polygon_multipolygon_overlap <- points |>
st_join(polys_multipolys) |>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could some defensive programming for this join be introduced to help potential debugging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in f030d5b.

Comment on lines +346 to +355
# Check if error on joins
tryCatch(
{
polygons |>
st_join(multipolygons)
},
error = function(e) {
message("There is a joining error, you may need to turn off s2 processing using sf::sf_use_s2(FALSE)")
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job on the defensive programming style 😃

Comment on lines 121 to 137
services_eng_dups <- services_eng |>
group_by(OA11CD, service) |>
mutate(count_id = n()) |>
filter(count_id > 1) |>
arrange(desc(count_id), OA11CD, name) |>
st_transform(crs = 4326)

# Take top one where has name (if not null for all) and then the largest size
services_eng_dedup <- services_eng_dups |>
mutate(size = st_area(geometry)) |>
group_by(OA11CD, service) |>
arrange(OA11CD, name, desc(size)) |>
slice(1)

services_eng_dedup <- services_eng |>
filter(!osm_id %in% services_eng_dups$osm_id) |>
bind_rows(services_eng_dedup)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this method should also be bundled into a separate R function?

Are we safe in the assumption that it is unlikely that the same building will be used for the same service in the same OA?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intuition is that this assumption is sound, and is probably something we want to generically apply to all of the outputs from the osm_data_reduce() function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah suppose difficult to validate the assumption and there may be exceptions where is more than 1 of a service in a single OA but thinking was since there are about 171k OAs in England and if take fire (which has more than police or ambulance) there are 1,400 stations of these so if well distributed in theory unlikely to be crossover (and OAs are about ~125 households and have a population of ~300). Suppose balance of this assumption, which may have some exceptions, with the duplicated cases within the OSM data which would need manually checking perhaps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have put this deduplication part into a separate function osm_oa_deduping() in case want to use logic inosm_data_reduce() and osm_oa_deduping() separately. Has been updated in 558a30a.

@MikeJohnPage
Copy link
Contributor

MikeJohnPage commented Apr 19, 2022

@aileenmcd is this ready for merging?

@aileenmcd
Copy link
Contributor Author

@aileenmcd is this ready for merging?
Yup sorry good to go :)

@MikeJohnPage
Copy link
Contributor

Great. @matthewgthomas I've done an initial review (see my comments above). Please can you do a final review and check you are happy with the code/logic?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants