-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some thoughts on specifying tile size #83
Comments
For option 2 we could default to using library(terra)
#> terra 1.7.78 x <- rast(system.file("ex/elev.tif", package = "terra"))
getTileExtents(x, y = fileBlocksize(x))
#> xmin xmax ymin ymax
#> [1,] 5.741667 6.533333 49.83333 50.19167
#> [2,] 5.741667 6.533333 49.47500 49.83333
#> [3,] 5.741667 6.533333 49.44167 49.47500 Created on 2024-07-17 with reprex v2.1.0 |
I like the idea of a template object being passed along. I think that if we have default behaviour for not specifying nrow/ncol we would be introducing multiple APIs in the same function
This also introduces additional complexity into the code now to manage these two approaches. It's not a lot of extra complexity, but I would prefer to avoid scope creep early. As a user I think this could be harder to understand / predict what tar_target(
elev_file,
system.file("ex/elev.tif", package="terra"),
format = "file"
),
tar_terra_rast(
elev_rast,
terra::rast(elev_file)
),
tar_terra_tiles(
name = rast_2x2,
raster = elev_rast,
ncol = 2,
nrow = 2
),
tar_terra_tiles(
name = rast_blocksize,
raster = elev_rast
) So I think your idea of replacing nrow/ncol is good, we might need to make the user work a little harder and have to supply a tar_target(
elev_file,
system.file("ex/elev.tif", package="terra"),
format = "file"
),
tar_terra_rast(
elev_rast,
terra::rast(elev_file)
),
tar_target(tile_2x2,
terra::rast(
terra::ext(elev_rast),
ncol = 2,
nrow = 2,
crs = terra::crs(elev_rast)
)
),
tar_target(
ext_2x2,
terra::getTileExtents(elev_rast, template)
),
tar_target(
ext_blocksize,
getTileExtents(elev_rast, y = fileBlocksize(elev_rast))
),
tar_terra_tiles(
name = rast_2x2,
raster = elev_rast,
template = ext_2x2
),
tar_terra_tiles(
name = rast_blocked,
raster = elev_rast,
template = ext_blocksize
) |
The thing I don't love about this is that we're no longer taking advantage of an opportunity to introduce some consistency that isn't there with I also think it would be ideal if you could just supply the template directly for simplicity of tar_terra_tiles(
name = rast_blocked,
raster = elev_rast,
template = fileBlocksize(elev_rast) #this would be default if `template` is omitted or NULL.
) or tar_terra_tiles(
name = rast_blocked,
raster = elev_rast,
template = rast(ext(elev_rast), ncol = 2, nrow = 2, crs = crs(elev_rast))
) Or maybe, as I think you're suggesting, there are some helper functions so we could do something like: tar_terra_tiles(
name = rast_blocked,
raster = elev_rast,
template = \(x) tile_blocksize(x) #or just `template = tile_blocksize`
) OR tar_terra_tiles(
name = rast_blocked,
raster = elev_rast,
template = \(x) tile_n(x, ncol = 2, nrow = 2)
) |
I think perhaps we have overloaded the term I was thinking that template would just be a vector that is the extent: xmin, xmax, ymin, ymax, or perhaps using the tile index information from grout: https://github.com/hypertidy/grout. The reason I like the output from grout is that we have an abstraction around tiling that provides flexibility. Potentially, you could craft your own extents that aren't tiles exactly, I cannot imagine a solid/common usecase for that, but it might be handy? Annoyingly, extent as returned by terra is an pointer, which means we might need a custom target function to handle terra::ext, so you can't just supply that as I have in my above example. Another reason that just returning data, as grout does, might be a goer. library(terra)
#> terra 1.7.78
elev_file <- system.file("ex/elev.tif", package = "terra")
r <- rast(elev_file)
r_ext <- ext(r)
class(r_ext)
#> [1] "SpatExtent"
#> attr(,"package")
#> [1] "terra"
as.numeric(r_ext)
#> Error in as.numeric(r_ext): cannot coerce type 'S4' to vector of type 'double' Created on 2024-07-19 with reprex v2.1.1 Session infosessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.1 (2024-06-14)
#> os macOS Sonoma 14.5
#> system aarch64, darwin20
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz Australia/Hobart
#> date 2024-07-19
#> pandoc 3.1.11 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> cli 3.6.3 2024-06-21 [1] CRAN (R 4.4.0)
#> codetools 0.2-20 2024-03-31 [2] CRAN (R 4.4.1)
#> digest 0.6.36 2024-06-23 [1] CRAN (R 4.4.0)
#> evaluate 0.24.0 2024-06-10 [1] CRAN (R 4.4.0)
#> fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
#> fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0)
#> glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.0)
#> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
#> knitr 1.48 2024-07-07 [1] CRAN (R 4.4.0)
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0)
#> Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.4.0)
#> reprex 2.1.1 2024-07-06 [1] CRAN (R 4.4.0)
#> rlang 1.1.4 2024-06-04 [1] CRAN (R 4.4.0)
#> rmarkdown 2.27 2024-05-17 [1] CRAN (R 4.4.0)
#> rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.0)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0)
#> terra * 1.7-78 2024-05-22 [1] CRAN (R 4.4.0)
#> withr 3.0.0 2024-01-16 [1] CRAN (R 4.4.0)
#> xfun 0.45 2024-06-16 [1] CRAN (R 4.4.0)
#> yaml 2.3.9 2024-07-05 [1] CRAN (R 4.4.0)
#>
#> [1] /Users/njt/Library/R/arm64/4.4/library
#> [2] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
Yes I think I like this as a default for I was not imagining an anonymous function, but rather something like: tar_target(
rast_blocksize,
tile_blocksize(x)
),
tar_terra_tiles(
name = rast_blocked,
raster = elev_rast,
template = rast_blocksize # or tile_blocksize(x) or tile_n(x, ncol = 2, nrow = 2)
) |
Tagging @mdsumner in here as he wrote grout and I'd be curious to hear his thoughts on this |
I was thinking that the
Yeah, this is exactly why I ended up writing
This would be fine (although I think |
I think specifying tile size is a good default, check that it's a valid value - a reasonable fallback is to treat each row as a tile and that's probably what terra does (it's what raster did, and is what GDAL does). It doesn't make sense to me to have the user specify a tiling dimension, because the opportunity for optimization is about the size of each tile and a multiplier of that is very simple. Also the terra interface for it has some unexpected behaviours for centring the scheme, and I think that logic belongs elsewhere ( like terra itself). I personally need the logic for the schemes completely independently of any package or library, it's extremely powerful (see how much the xarray community cares about "chunking" as one example, but even determining the number and size of tiles and where they go - might be in memory to S3 storage for example) |
It seems there could be 3 practical ways to specify tile size
The text was updated successfully, but these errors were encountered: