Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting s2_geography using supported methods is not guaranteed to get back the original data #260

Open
tzakharko opened this issue Jul 12, 2024 · 1 comment

Comments

@tzakharko
Copy link

tzakharko commented Jul 12, 2024

I am trying to work around the lack of serialization support (#259) by converting s2 geography data into other formats (such as using simple features or wkb), but the data can get corrupted under certain circumstances. That is, doing s2_as_geography(st_as_sfc(g)) or s2_as_geography(s2_as_binary(g)) is not guaranteed to recover the original data.

Consider the following example from real-world code:

g <- geodesic_grid$geography
all(s2_is_valid(g))
#> [1] TRUE

g1 <- as_s2_geography(s2_as_binary(g), check = FALSE)
all(s2_is_valid(g1))
#> [1] FALSE

g2 <- as_s2_geography(sf::st_as_sfc(g), check = FALSE)
all(s2_is_valid(g2))
#> [1] FALSE

Unfortunately, I cannot provide an MRE because I do not know how to accurately persist s2 geography data. In the cases I am dealing with, it seems to be a precision issue when converting the data between various formats. The affected geographies have some points that are very close by, and it appears that these are detected as duplicates when the data is converted back to s2. While I can fix this by running s2_simplify(), it would be good to have a way to persist the data as is.

@paleolimbot
Copy link
Collaborator

paleolimbot commented Jul 13, 2024

Thanks for opening this!

We store s2 geometries as pointers to C++ structures for performance reasons. This is usually OK since s2 is most commonly used as an intermediary format (usually the start and end is an sfc). This is usually OK but, as you've found out, it is not possible to serialize it using serialize().

The issue you're seeing here is I think because S2's internal representation is a unit vector (e.g., 3 doubles), whereas the WKB export and import by default take the more ubiquitous longitude/latitude. The conversion between those is slightly lossy and looses a tiny bit of precision (in your case enough to trigguer a difference!).

It is possible to override this projection but it's not obvious how to do this. I've pasted an example below which may help!

library(s2)
set.seed(1234)

lng <- c(-64, 15) + runif(2)
lat <- c(45, 52) + runif(2)

(geog <- s2::s2_make_line(lng, lat))
#> <geodesic s2_geography[1] with CRS=OGC:CRS84>
#> [1] LINESTRING (-63.8862966 45.6092747, 15.6222994 52.6233794)

(wkb_storage <- wk::wk_handle(geog, wk::wkb_writer(), s2_projection = NULL))
#> <wk_wkb[1]>
#> [1] <LINESTRING Z (0.3079087 -0.6281395 0.7145859, 0.5846258 0.1634758 0.7946624)>

serialized <- serialize(wkb_storage, NULL)
geog2 <- wk::wk_handle(
  unserialize(serialized),
  s2::s2_geography_writer(oriented = TRUE, projection = NULL)
)

geog2
#> <geodesic s2_geography[1] with CRS=OGC:CRS84>
#> [1] LINESTRING (-63.8862966 45.6092747, 15.6222994 52.6233794)

s2_equals(geog, geog2)
#> [1] TRUE

Created on 2024-07-12 with reprex v2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants