Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[draft] tar_terra_rast_wrap: multi-target method to preserve SpatRaster metadata #63

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
# Generated by roxygen2: do not edit by hand

export(geotargets_destroy_cache)
export(geotargets_init_cache)
export(geotargets_option_get)
export(geotargets_option_set)
export(tar_terra_rast)
export(tar_terra_rast_wrap)
export(tar_terra_sprc)
export(tar_terra_vect)
importFrom(rlang,"%||%")
Expand Down
47 changes: 47 additions & 0 deletions R/geotargets-cache.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
#' Manage geotargets Cache Files
#'
#' The `geotargets` cache is a folder containing one subfolder for each target created using `tar_terra_rast_wrap()` method. Each subfolder contains one or more files containing data and metadata to support a `terra` `SpatRaster` object.
#'
#' The cache directory can be customized using `geotargets_option_set("cache.dir")` or environment variable `GEOTARGETS_CACHE_DIR`.
#'
#' Periodically you may want to purge files in the cache using `geotargets_destroy_cache()`.
#'
#' @param name character. Target name (default: `NULL` will delete all target cache files)
#' @param init logical. Re-create empty cache directory? Default: `FALSE`
#'
#' @return integer. 0 for success, 1 for failure, invisibly.
#' @export
#' @rdname geotargets-cache
#' @examples
#' \dontrun{
#'
#' # delete cache folder for target named "foo"
#' geotargets_destroy_cache("foo")
#'
#' # create empty folder
#' geotargets_init_cache("foo")
#'
#' # delete and recreate folder
#' geotargets_destroy_cache("foo", init = TRUE)
#'
#' # delete all cache files
#' geotargets_destroy_cache()
#'
#' }
geotargets_destroy_cache <- function(name = NULL, init = FALSE) {
cachedir <- geotargets_option_get("cache.dir")
target_cache_dir <- file.path(cachedir %||% "geotargets_cache", name %||% "")
res <- unlink(target_cache_dir, recursive = TRUE)
if (init) geotargets_init_cache(name = name)
invisible(res)
}
Comment on lines +31 to +37
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there is a way that this could set some "flag" that could be used to invalidate the "upstream" target through a custom cue? Or perhaps it runs tar_invalidate() on all targets created with tar_terra_rast_wrap() when run? I think it's fine that manually deleting a file from the cache breaks the pipeline, but I think any "official" way of deleting the cache should correctly invalidate targets.

Copy link
Collaborator

@Aariq Aariq Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, now that I think of it, the directory names inside the cache are target names, yeah? So if this could get all those dir names and pass them to tar_invalidate(any_of(dirnames)) I think it would make this function a lot more useful.


#' @rdname geotargets-cache
#' @export
geotargets_init_cache <- function(name = NULL) {
cachedir <- geotargets_option_get("cache.dir")
target_cache_dir <- file.path(cachedir %||% "geotargets_cache", name %||% "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
target_cache_dir <- file.path(cachedir %||% "geotargets_cache", name %||% "")
target_cache_dir <- file.path(cachedir %||% "_geotargets", name %||% "")

Maybe? Just for consistency with _targets/—both being directories you shouldn't edit manually.

dir.create(target_cache_dir,
showWarnings = FALSE,
recursive = TRUE)
}
14 changes: 10 additions & 4 deletions R/geotargets-option.R
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,10 @@
#' a unique set of creation options. For example, with the default `"GeoJSON"`
#' driver:
#' <https://gdal.org/drivers/vector/geojson.html#layer-creation-options>
#'
#' @param cache_dir character. Path to directory where file sources for
#' `PackedSpatRaster` objects can be stored. Default: `"geotargets_cache"`
#' when `geotargets::geotargets_option_get("cache.dir")`
#' is not set.
#' @details
#' These options can also be set using `options()`. For example,
#' `geotargets_options_set(gdal_raster_driver = "GTiff")` is equivalent to
Expand All @@ -34,14 +37,16 @@ geotargets_option_set <- function(
gdal_raster_driver = NULL,
gdal_raster_creation_options = NULL,
gdal_vector_driver = NULL,
gdal_vector_creation_options = NULL
gdal_vector_creation_options = NULL,
cache_dir = NULL
) {

options(
"geotargets.gdal.raster.driver" = gdal_raster_driver,
"geotargets.gdal.raster.creation.options" = gdal_raster_creation_options,
"geotargets.gdal.vector.driver" = gdal_raster_creation_options,
"geotargets.gdal.vector.creation.options" = gdal_raster_creation_options
"geotargets.gdal.vector.creation.options" = gdal_raster_creation_options,
"geotargets.cache.dir" = cache_dir
)

}
Expand All @@ -58,7 +63,8 @@ geotargets_option_get <- function(name) {
"geotargets.gdal.raster.driver",
"geotargets.gdal.raster.creation.options",
"geotargets.gdal.vector.driver",
"geotargets.gdal.vector.creation.options"
"geotargets.gdal.vector.creation.options",
"geotargets.cache.dir"
))

env_name <- gsub("\\.", "_", toupper(option_name))
Expand Down
153 changes: 153 additions & 0 deletions R/tar-terra-wrap.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,153 @@
#' Create a wrapped terra SpatRaster target
#'
#' Provides a target format for [terra::SpatRaster-class] objects backed by
#' [terra::PackedSpatRaster-class] and file-based targets in the `geotargets`
#' cache folder (`cachedir`).
#'
#' @param filetype character. File format expressed as GDAL driver names passed
#' to [terra::writeRaster()]
#' @param gdal character. GDAL driver specific datasource creation options
#' passed to [terra::writeRaster()]
#' @param cachedir character. Path to directory where file sources for `PackedSpatRaster` objects can be stored. Default: `"geotargets_cache"` when `geotargets::geotargets_option_get("cache.dir")` is not set.
#' @param ... Additional arguments not yet used
#' @seealso [geotargets_destroy_cache()] [geotargets_init_cache()]
#' @note Although you may pass any supported GDAL vector driver to the
#' `filetype` argument, not all formats are guaranteed to work with
#' `geotargets`. At the moment, we have tested `GTiff` and `GPKG` and
#' they appear to work generally.
#'
#' @inheritParams targets::tar_target
#' @importFrom rlang %||% arg_match0
#' @seealso [targets::tar_target_raw()]
#' @export
#' @examples
#' if (Sys.getenv("TAR_LONG_EXAMPLES") == "true") {
#' targets::tar_dir({ # tar_dir() runs code from a temporary directory.
#' library(geotargets)
#' targets::tar_script({
#' list(
#' geotargets::tar_terra_rast(
#' terra_rast_example,
#' system.file("ex/elev.tif", package = "terra") |> terra::rast()
#' )
#' )
#' })
#' targets::tar_make()
#' x <- targets::tar_read(terra_rast_example)
#' })
#'}
tar_terra_rast_wrap <- function(name,
command,
pattern = NULL,
filetype = "GTiff",
gdal = geotargets::geotargets_option_get("gdal.raster.creation.options"),
cachedir = geotargets::geotargets_option_get("cache.dir"),
...,
tidy_eval = targets::tar_option_get("tidy_eval"),
packages = targets::tar_option_get("packages"),
library = targets::tar_option_get("library"),
repository = targets::tar_option_get("repository"),
iteration = targets::tar_option_get("iteration"),
error = targets::tar_option_get("error"),
memory = targets::tar_option_get("memory"),
garbage_collection = targets::tar_option_get("garbage_collection"),
deployment = targets::tar_option_get("deployment"),
priority = targets::tar_option_get("priority"),
resources = targets::tar_option_get("resources"),
storage = targets::tar_option_get("storage"),
retrieval = targets::tar_option_get("retrieval"),
cue = targets::tar_option_get("cue")) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cue = targets::tar_option_get("cue")) {
cue = targets::tar_option_get("cue"),
description = targets::tar_option_get("description")) {

filetype <- filetype %||% "GTiff"
gdal <- gdal %||% character(0)
cachedir <- cachedir %||% "geotargets_cache"

#check that filetype option is available
drv <- get_gdal_available_driver_list("raster")
filetype <- rlang::arg_match0(filetype, drv$name)

check_pkg_installed("terra")

name <- targets::tar_deparse_language(substitute(name))

envir <- targets::tar_option_get("envir")

command <- targets::tar_tidy_eval(
expr = as.expression(substitute(command)),
envir = envir,
tidy_eval = tidy_eval
)

pattern <- targets::tar_tidy_eval(
expr = as.expression(substitute(pattern)),
envir = envir,
tidy_eval = tidy_eval
)

.format_terra_rast_wrap_write <- eval(substitute(function(object, path) {
# TODO: provide mapping of major file types to extensions
extension <- switch(filetype,
"GTiff" = ".tif",
"GPKG" = ".gpkg",
"")
Comment on lines +88 to +91
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? We don't need file extensions when writing to _targets/, why do we need them here?

saveRDS(terra::wrapCache(object,
filename = file.path(cachedir,
basename(path),
paste0(basename(path),
extension)),
filetype = filetype,
gdal = gdal,
overwrite = TRUE),
file = path)
}, list(cachedir = cachedir, filetype = filetype, gdal = gdal)))

geotargets_init_cache(name = name)

# rast_cache_init <- targets::tar_target_raw(
# paste0(name, "_cache_init"),
# str2expression(paste0("normalizePath(file.path(",
# shQuote(cachedir), ", ",
# shQuote(name), "))")),
# format = "file"
# )

rast_wrap <- targets::tar_target_raw(
name = name,
command = command,
pattern = pattern,
packages = packages,
library = library,
format = targets::tar_format(
read = function(path) terra::unwrap(readRDS(path)),
write = .format_terra_rast_wrap_write,
marshal = function(object) terra::wrap(object),
unmarshal = function(object) terra::unwrap(object)
),
repository = repository,
iteration = iteration,
error = error,
memory = memory,
garbage_collection = garbage_collection,
deployment = deployment,
priority = priority,
resources = resources,
storage = storage,
retrieval = retrieval,
cue = cue
)

rast_cache_files <- targets::tar_target_raw(
paste0(name, "_cache_files"),
str2expression(paste0("
list.files(
file.path(", shQuote(cachedir), ", ", shQuote(name),"),
full.names = TRUE,
recursive = TRUE
)")),
format = "file_fast",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could allow the option of either "file" or "file_fast" here, but I'm guessing it doesn't really matter since it seems like nothing will ever depend on this target.

deps = name
)
Comment on lines +138 to +148
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm questioning whether this target even needs to exist. Unless there is a way to make it be "upstream" of the wrapCache target, then it doesn't really serve a purpose. Invalidating this target will never do anything, and there's no reason to use this target rather than the upstream one in a pipeline. So maybe this doesn't need to return multiple targets.


list(#rast_cache_init,
rast_wrap,
rast_cache_files)
}
44 changes: 44 additions & 0 deletions man/geotargets-cache.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 7 additions & 1 deletion man/geotargets-options.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading