-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need a better way to get dataset keys #67
Comments
Since #91, the |
@dougiesquire is there any reason that |
Is there value in having them as methods on the class since they don't mutate the state of the object? |
I think so. Having them as a
I don't know if this is likely in the future, but it also has the added bonus of making life easier if a Builder ever needs a customized |
I'm probably being dense, but I'm not sure how it makes sense for those to be |
Just noticed this, I actually used a similar approach in my fork to add a builder class for data psot-processed with our tool mopper: I made the parser method a class method so I could pass, for example https://github.com/paolap/access-nri-intake-catalog/blob/aus2200/config/access-mopper.yaml
This information then gets used to create a regex string. The dates in the filenames for me are defined by CMOR itself when it writes the data so I don't need to pass that information as I know the logic already. |
Currently Intake-ESM dataset keys are constructed using a flaky approach of trying to redact time stamp information from filenames to construct a file id that is combined with the frequency to uniquely define a dataset, see e.g.
access-nri-intake-catalog/src/access_nri_intake/source/utils.py
Line 71 in 581633e
I'm not sure how to reliably get dataset keys when the data are so messy. Going forward, better solutions might be to require that those generating model output:
file_id
attribute in files that ids a file as part of a datasetThe text was updated successfully, but these errors were encountered: