Proposal to use NVS P07 as the canonical source of CF standard names #366
Replies: 7 comments 9 replies
-
I am particularly interested in the units question -- CF has referred to UDUNITS for units since the beginning (yes?) -- but over that time, UDINITS saw a major upgrade that re-worked the unit database, and, at least as far as I could find there is no nice human readable form, nor a clear definition of what variations are acceptable (e.g. both "second" and "seconds" are accepted). So the Conical source could be the XML in the UDUNITS source -- but we really should have a way to extractg from that something human-readable. I'm willing to take a look at that, unless someone points us to something I missed :-). |
Beta Was this translation helpful? Give feedback.
-
I get a 404 error when I try to reach the xml file: https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.xml Where does it show the names of aliases on the P07 file ( http://vocab.nerc.ac.uk/collection/P07/current/ )? Why is the JSON-LD version of the file so complicated. Couldn't we simply host the standard name information in a simpler structure with the keys being the standard names and for each we'd have a few "attributes": canonical units, description, aliases, and a flag indicating whether or not the name has been deprecated? I'm sure I'm hopelessly naive, but it seems like we should host the standard name information in a form that other tools could build on without learning how to interact with the NERC vocabulary server. Is it wise to tie CF to the support of a single institution (which has been a critically important and reliable partner for so long now), which might at some time be unable to continue with CF. In that event, how hard would it be for someone to take over responsibility for the NERC-hosted vocabulary? |
Beta Was this translation helpful? Give feedback.
-
Dear standard names hackathon group While I recognise the advantages of the NERC vocabulary server, I think we should change things more gradually than implied by this summary. When we discussed this possibility in the CF committee, my understanding was that the proposal was initially only to "designate" the NERC vocab server as the primary repository. The primary one is the one you'd trust in the case of an inconsistency, but mostly there would be no inconsistency. Designating NVS as primary would not change our actual practice significantly, I believe, since the standard names team prepare the updates in the CF editor and publish them to the NERC vocab server and the CF website at the same time. In the earlier discussions, I believe we understood that we would keep the HTML and XML versions on the CF website, just as they are now. I don't see why we shouldn't do that, even if we don't regard them as primary. If we change the process of preparing updates to use NVS facilities instead, could we not continue to publish to our XML and HTML as well? There isn't a need to change Appendix B if we keep our XML files. Our HTML file is very useful for searching, especially since Abel @bzah made large and widely applauded improvements in its capabilities within the last year. In #296, Andrew @DocOtak and Antonio @cofinoa have between them devised ways in which we could keep all versions of HTML and XML accessible online by generating them on the fly from the GitHub repo. That would solve the the problem with space on GitHub Pages. The first advantage listed is the unique permanent identifier for each standard name. I agree that is valuable. When Alison and I discussed this with Gwen, some months ago, we agreed that CF would like unique URIs containing the standard name. My memory is that this had formerly been offered, although it isn't now. For instance, as an alternative to https://vocab.nerc.ac.uk/collection/P07/current/CFSN0023, we should be able to use https://vocab.nerc.ac.uk/collection/P07/current/air_temperature. For some purposes that would be more convenient, and it's definitely more CF-like. If the standard names are stored in the NVS as well as our existing files, we can explore the advantages of NVS without losing any of our current functionality. I'm sure there are advantages, such as you outline. In time we may find that NVS can take things over from us in a satisfactory way; then we can phase out our own. Despite my reservations, I appreciate your exploration of this subject! Thanks. Best wishes Jonathan |
Beta Was this translation helpful? Give feedback.
-
I've done a first attempt at making the standard name xml from information contained only in the NVS. A commented on jupyter notebook is here: https://github.com/DocOtak/nvs-to-cf-xml/blob/master/nvs_to_xml.ipynb and the output is here: https://github.com/DocOtak/nvs-to-cf-xml/blob/master/nvs-to-std-names.xml I found most of the implementation to be pretty straightforward but did find some issues that I think would need to be resolved or addressed before P07 is blessed. Header infoNVS lacks a contact email, has a different institution, and does not have separate first_published and last_modified. Standard NamesThese were mostly straight forward to create, NVS lacks GRIB and AIMP mappings, so these were not included. I found that 12 of the non-deprecated standard names have no unit information at all, I didn't check the deprecated names for units: AliasesI also found it somewhat difficult to map deprecated names to non deprecated names (aliasing), not all the deprecated names have a "replaced by" relationship or a "same as" relationship and I got caught by some cyclical relationships. I suspect "same as" will be better for our use cases since I believe that "replaced by" is a 1:1 relationship. The notebook has some more information on what I found difficult but I didn't fully explore this. I think we do need a consistent way/mechanism that we will represent the aliases in NVS, this might just be clarification and cleanup on their side (Gwen?). UnitsI only used the content in P06 and found the altLabel to be pretty close to udunits already. There were some (15ish) exceptions that I manually mapped in the notebook. When investigating QUDT it appeared that many of the concepts lack the udunitsCode property, examples: J m-2, mol kg-1. We would need to contribute these mappings back to QUDT and probably check/verify the P06 to QUDT relationships are all in place too if we want to enable full programatic creation of the xml. |
Beta Was this translation helpful? Give feedback.
-
I expressed some concern above about using the NERC server P07 as the reference source for standard name information. I now understand that the P07 NERC server provides useful information about standard names well beyond just recording the standard name and describing what it means, specifying its canonical units, and listing any aliases (and indicating whether it has been depricated). But I'm not convinced it is wise to consider the additional information provided by P07 of essential importance to the conventions. I would much prefer that we host on github a simple JSON file (or some similarly popular format) containing the essential standard name information. We would encourage others (e.g., NERC) to build on that the very valuable services they provide. We could steer users to the NERC services if they want to interact with other vocabularies without making their services essential for the existence of CF. I really appreciate that NERC supports CF, but can't we make the CF standard and any underlying databases independent of those who support it? |
Beta Was this translation helpful? Give feedback.
-
Along the same lines as @taylor13's comment: I can't quite tell from context, but all the references to IRIs with P07 in the name have me worried. I distinguish where the standard names (meaning the local identifier parts) are served, from the persistent identifiers of the standard names. I think those two things are entirely separable. WIth apologies if this is decided somewhere already, I hope the 'final' identifier of CF standard names (vocabulary and individual names) has a relatively short domain name+locator. Implementation-specific elements like P07 are undesirable in this context. I strongly prefer a vocabulary-specific IRI a la sweetontology.net, though something like http://vocab.nerc.ac.uk/ would be bearable if it's forever the authoritative location. The key point being that where you serve the names from shouldn't affect their identifiers, which can be redirected to the actual service. This allows the service to change its DNS over time and insulates it from vagaries of the supporting organization. |
Beta Was this translation helpful? Give feedback.
-
Would it be useful to provide a URL for each standard name, like https://vocab.nerc.ac.uk/standard_name/air_temperature (which is an alias for the P07 page for that standard name - thanks, @DocOtak), but with only the CF information? In that page, we could give the link to https://vocab.nerc.ac.uk/standard_name/air_temperature for the sake of the extra information and facilities that offers. If it is useful, I suppose those URLs could be provided on GitHub only as static pages - is that right? That might not be ideal. If it were hosted somewhere else, could we create a subdomain of https://cfconventions.org to point to it? I don't remember who acquired and is paying for our domain! I hope that https://cfconventions.org will continue to exist for as long as the conventions are useful. |
Beta Was this translation helpful? Give feedback.
-
Topic for discussion
Currently the "official" source of the CF standard names is the XML formatted file, e.g., https://cfconventions.org/Data/cf-standard-names/current/src/cf-standard-name-table.xml. A new version of the XML file is produced every time the standard name table is updated. At the same time as publishing the XML file, identical content is submitted to the NERC Vocabulary Server (NVS) where standard names are collection P07: http://vocab.nerc.ac.uk/collection/P07/current/. The proposal is to move to recognising P07 as the canonical source of standard names.
The points below were raised in a hackathon at the CF 2024 workshop.
Advantages of adopting P07 as canonical source of standard names
Possible consequences of moving to P07
Practical steps we will need to take
Beta Was this translation helpful? Give feedback.
All reactions