Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error reporting and CFA workflows #76

Open
bnlawrence opened this issue Jun 21, 2022 · 3 comments
Open

Error reporting and CFA workflows #76

bnlawrence opened this issue Jun 21, 2022 · 3 comments
Assignees
Labels

Comments

@bnlawrence
Copy link
Collaborator

Consider the following workflow:

  • User "opens" an aggregation file
  • User "subsets" into that aggregation file
  • (all lazy thus far)
  • Now user operates on the subset (the data must be present)
  • An aggregation file can point to data that is not present.
  • CFA should return a useful error if the data is not present so that tools accessing data via CFA can themselves return a usefule error.

Potentially we want a tool which can be used with a given subsetting command to generate a quark manifest and optionally migrate it to a particular "present" tier.

It is possible that the error could return a command using that tool which could make the necessary data present.

To close this ticket we need to have defined

  1. The appropriate error reporting protocol, and
  2. The necessary API for the tool.
@bnlawrence
Copy link
Collaborator Author

NB: the cfa-aggregation is described here!

@bnlawrence
Copy link
Collaborator Author

For example, if we take Example 1 in the formal document, we see something like:

...
data:
  temp = _ ;
  time = 0, 31, 59, 90, 120, 151, 181, 212, 243, 273, 304, 334 ;
  aggregation_location = 6, 6,
                         1, _,
                         73, _,
                         144, _ ;
  aggregation_file = "January-June.nc", "July-December.nc" ;
  aggregation_format = "nc" ;
  aggregation_address = "temp", "temp" ;

So there are two issues that a tool would need to address if you wanted to subset into this aggregation:

  1. How do you invoke it to do the subset, and tell it where to put the data if it is not "local", and
  2. Doing it.

@bnlawrence
Copy link
Collaborator Author

The Command Line API

Semantically there are three things to pass in: the aggregation file of interest, the subset required, and "the definition of local". Taking these one at a time:

  • The aggregation file of interest needs to be a "resolvable" URL to the aggregation file, that is, the tool needs to be able to open it using the NetCDF (or HDF) API, and be able to extract the variables and attributes so as to determine the fragment files of interest.
  • The subset required needs to be specified using "normal" operators. We'll come back to "normal".
  • The "definition of local" has one implicit and one explicit characteristic.
    • Effectively anything is local if it can be opened and read, that's the implicit part. Any fragment file which can be opened and read is local. "Can be opened and read" is another thing to come back to.
    • The explicit part is where you want to put fragment files which are not local so that they become so that a future run of the tool would have nothing to do. This is different from the implicit part because we can imagine that implicitly we can open a file using S3 or local posix, but we want to explicitly bring non-local files to the local posix OR the S3, but not both.

Can be opened and read? Well, we can do that via two routes, trying to do it, or having the tool know from the URI what is local (so the tool is pre-configured to recognise local URIs).

normal API? One option would be to consider the NCO netcdf kitchen sink API - ncks, but given this is a CF convention, we should probably use appropriately CF compliant subsetting syntax? Would these differ much? That's what we need to consider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants