emergency services deduplication code #13

aileenmcd · 2022-04-07T10:28:30Z

Adding a function to utils.R to follow the multipolygon, polygon, points method for OSM data.
Additional steps taken to reduce duplications further on line 119 in cases where looks to be multiple polygons relating to same building. Assumption made that unlikely would be a building for the same service within the same OA here.
When a polygon or multipolygon spans over multiple OAs take the OA which has largest intersection area here to allow output to be at LSOA level.
Re-ran the build index script and performed checks on the effect the new service code had on the 'Community Support' domain deciles and the flooding vulnerability output deciles here.
Change to the social renters data as think this had been overwritten with the Wales data since the last index built for England so effected the index build results when comparing to previous result.

…ta which has been overwritten with welsh data

MikeJohnPage

Left you some feedback to review. Interested on your thoughts 👍

MikeJohnPage · 2022-04-07T14:13:14Z

R/utils.R

+      points_not_polygon_multipolygon_overlap <- points |>
+        st_join(polys_multipolys) |>


Could some defensive programming for this join be introduced to help potential debugging?

Added in f030d5b.

MikeJohnPage · 2022-04-07T14:14:11Z

R/utils.R

+    # Check if error on joins
+    tryCatch(
+      {
+        polygons |>
+          st_join(multipolygons)
+      },
+      error = function(e) {
+        message("There is a joining error, you may need to turn off s2 processing using sf::sf_use_s2(FALSE)")
+      }
+    )


Good job on the defensive programming style 😃

MikeJohnPage · 2022-04-07T14:34:36Z

...es/flooding/2022-interim/england/service-availability/flooding-exposure-emergency-services.R

+services_eng_dups <- services_eng |>
+  group_by(OA11CD, service) |>
+  mutate(count_id = n()) |>
+  filter(count_id > 1) |>
+  arrange(desc(count_id), OA11CD, name) |>
+  st_transform(crs = 4326)
+
+# Take top one where has name (if not null for all) and then the largest size
+services_eng_dedup <- services_eng_dups |>
+  mutate(size = st_area(geometry)) |>
+  group_by(OA11CD, service) |>
+  arrange(OA11CD, name, desc(size)) |>
+  slice(1)
+
+services_eng_dedup <- services_eng |>
+  filter(!osm_id %in% services_eng_dups$osm_id) |>
+  bind_rows(services_eng_dedup)


I wonder if this method should also be bundled into a separate R function?

Are we safe in the assumption that it is unlikely that the same building will be used for the same service in the same OA?

My intuition is that this assumption is sound, and is probably something we want to generically apply to all of the outputs from the osm_data_reduce() function.

Yeah suppose difficult to validate the assumption and there may be exceptions where is more than 1 of a service in a single OA but thinking was since there are about 171k OAs in England and if take fire (which has more than police or ambulance) there are 1,400 stations of these so if well distributed in theory unlikely to be crossover (and OAs are about ~125 households and have a population of ~300). Suppose balance of this assumption, which may have some exceptions, with the duplicated cases within the OSM data which would need manually checking perhaps.

Have put this deduplication part into a separate function osm_oa_deduping() in case want to use logic inosm_data_reduce() and osm_oa_deduping() separately. Has been updated in 558a30a.

MikeJohnPage · 2022-04-19T16:18:53Z

@aileenmcd is this ready for merging?

aileenmcd · 2022-04-19T16:21:09Z

@aileenmcd is this ready for merging?
Yup sorry good to go :)

MikeJohnPage · 2022-04-19T16:25:52Z

Great. @matthewgthomas I've done an initial review (see my comments above). Please can you do a final review and check you are happy with the code/logic?

aileenmcd added 10 commits March 31, 2022 13:19

comparing points, polygons, multipolygons

697dac2

more eda of osm data

2057991

more polygons vs points checks

a66a9a2

adding duplicate points for isle of wight

d043cad

more testing

959be85

create function from checks

4651cd3

running new services code, comparing to previous and fixing renter da…

740be8f

…ta which has been overwritten with welsh data

checks on effect of new services code on final output

2b2cea6

remove checks from final build script and save the results of script

7435a18

explicit comment about assumption

0ea78ed

aileenmcd requested a review from MikeJohnPage April 7, 2022 10:28

MikeJohnPage reviewed Apr 7, 2022

View reviewed changes

aileenmcd added 2 commits April 19, 2022 15:27

add additional join check

f030d5b

function for deduplicating OSM/OA data

558a30a

MikeJohnPage requested a review from matthewgthomas April 19, 2022 16:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

emergency services deduplication code #13

emergency services deduplication code #13

aileenmcd commented Apr 7, 2022

MikeJohnPage left a comment

MikeJohnPage Apr 7, 2022

aileenmcd Apr 19, 2022

MikeJohnPage Apr 7, 2022

MikeJohnPage Apr 7, 2022

MikeJohnPage Apr 7, 2022

aileenmcd Apr 19, 2022

aileenmcd Apr 19, 2022

MikeJohnPage commented Apr 19, 2022 •

edited

Loading

aileenmcd commented Apr 19, 2022

MikeJohnPage commented Apr 19, 2022

		points_not_polygon_multipolygon_overlap <- points \|>
		st_join(polys_multipolys) \|>

emergency services deduplication code #13

Are you sure you want to change the base?

emergency services deduplication code #13

Conversation

aileenmcd commented Apr 7, 2022

MikeJohnPage left a comment

Choose a reason for hiding this comment

MikeJohnPage Apr 7, 2022

Choose a reason for hiding this comment

aileenmcd Apr 19, 2022

Choose a reason for hiding this comment

MikeJohnPage Apr 7, 2022

Choose a reason for hiding this comment

MikeJohnPage Apr 7, 2022

Choose a reason for hiding this comment

MikeJohnPage Apr 7, 2022

Choose a reason for hiding this comment

aileenmcd Apr 19, 2022

Choose a reason for hiding this comment

aileenmcd Apr 19, 2022

Choose a reason for hiding this comment

MikeJohnPage commented Apr 19, 2022 • edited Loading

aileenmcd commented Apr 19, 2022

MikeJohnPage commented Apr 19, 2022

MikeJohnPage commented Apr 19, 2022 •

edited

Loading