Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference UGRID conventions in CF #153

Closed
rsignell-usgs opened this issue Dec 13, 2018 · 102 comments · Fixed by #459
Closed

Reference UGRID conventions in CF #153

rsignell-usgs opened this issue Dec 13, 2018 · 102 comments · Fixed by #459
Labels
change agreed Issue accepted for inclusion in the next version and closed enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format

Comments

@rsignell-usgs
Copy link
Member

As discussed in Trac ticket 171 we would like to associate a specific version of UGRID with each version of CF.

We propose to simply add a section 1.5 to the Conventions Document called "Relationship to the UGRID Conventions" which would say:

UGRID is a convention for unstructured (e.g. triangular) grids that supplements the CF Conventions, including specification of grid topology and location of data on grid elements. Each version of CF is associated with a particular version of UGRID through the Conventions attribute in 2.6.1.

Then in Section 2.6.1, modify the beginning to read:

We recommend that netCDF files that follow these conventions indicate this by setting the NUG defined global attribute Conventions to the string value "CF-1.8" which also implies "UGRID-1.0".

@JonathanGregory JonathanGregory added the enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format label Dec 13, 2018
@davidhassell
Copy link
Contributor

I support this change. The aim (mentioned on the Trac ticket) of "ensuring they [CF and UGRID] remain consistent and complementary" is a very good goal. A couple of questions:

How will UGRID ensure that it is up to date with CF? I presume that this would not be the responsibility of proposers of changes to CF. And vice versa.

What would be the policy of matching versions of CF and UGRID? CF-1.8 and UGRID-1.0 are proposed for now - when CF-1.9 arises, would the latest UGRID version be specified by default, or would it it stay at 1.0?

Thanks, David

@rsignell-usgs
Copy link
Member Author

@davidhassell thanks for the comments. I think this would be worked out on a case-by-base basis with joint approval by the CF and UGRID Governance committees.

@davidhassell
Copy link
Contributor

OK - in that case, I think an explicit addition to the governance rules (https://github.com/cf-convention/cf-convention.github.io/blob/master/rules.md) would be useful.

I've been thinking about the similar issue that arises when considering if a change to CF is compatible with the CF data model, for which I will soon be suggesting these additional rules (just think "UGRID" instead of "CF data model"):

* All new proposals will be assessed to see if the new features defined in the proposal map onto the CF data model.

* The assessment will be carried out by a member of the conventions committee or another suitably qualified person. If no-one volunteers, the chairman of the committee will ask someone to do it.

For the CF data model case, there are more rules than this, about what to do depending on the result of the assessment - which I'm happy to share - but you get the drift.

UGRID also has the issue (unlike the CF data model) that it may change independently of CF, so there needs to be some sort of a symmetry in these rules, too.

These rules may seem at bit over the top, but they're not really more draconian than the existing rules for changes to CF, and only a very few people will ever need to worry about them, i.e. the UGRID (or CF data model) experts, and it gives those people a checklist to make sure that we don't wander off piste.

Thanks, David

@rsignell-usgs
Copy link
Member Author

rsignell-usgs commented Jan 29, 2019

@davidhassell, that sounds reasonable. It's also a reminder to us at UGRID that we need to add rules similar to https://github.com/cf-convention/cf-convention.github.io/blob/master/rules.md. I've raised the issue here.

@davidhassell
Copy link
Contributor

I think that the biggest issue concerning versioning is that every enhancement to CF needs to be checked for UGRID inconsistencies, and every enhancement to UGRID needs to be checked for CF inconsistencies.

How much a proposer of a change needs to know about this is up for discussion.

For the CF data model, I have suggested (cf-convention/cf-convention.github.io@a170505) that the proposer does not need to know about data model issues - the default someone will look at the proposal on their behalf and decide if any action needs to be taken. Action could mean changing the proposal or changing the data model. This approach could also work for how CF changes might affect UGRID, and vice versa, but there might be alternative approaches that are better suited to the UGRID (and CF data model) case - I would welcome any thoughts on this.

For the most part, I imagine the no action would be required - all 28 tickets that contributed to CF-1.7 had no impact on the CF data model.

That said, there are two issues for CF-1.8 that will require changes to the data model - geometries and UGRID. These changes have been worked out and are backwards compatible (more on that another time), so all is well! Such structurally challenging changes are unusual, though.

Thanks, David

@davidhassell
Copy link
Contributor

davidhassell commented Jan 8, 2020

Hello,

I have been looking at the Finite Element based CF proposal for Unstructured Grid data model (https://publicwiki.deltares.nl/display/NETCDF/Finite+Element+based+CF+proposal+for+Unstructured+Grid+data+model) which was written up some time ago by Bert. This proposes an encoding for the information required for a consistent spatial interpretation of the values.

Given that UGRID is going to be incorporated into CF [1], I was looking at this to see if backward incompatible changes could occur if this new proposal (or something like it) become part of UGRID at a later date.

[1] it seems like this will indeed happen, once the management side is sorted out ...

My conclusion from a quick read of Bert's document was that adding the "Function Space" capability, would not impact on files encoded using UGRID 1.0 - which would be good. However, the proposal does suggest renaming certain special attributes (e.g. face_node_connectivity becomes element_vertex_connectivity). This could be bad for CF backwards compatibility, but, I presume, is easily avoided with a little thought at this stage.

Does all this sound like a reasonable assessment?

Thanks,
David

@dham
Copy link

dham commented Jan 8, 2020

I think that this is essentially correct. The naming issues are also not insurmountable. Finite element has its own conventions, but they are just naming conventions. If UGRID has finite difference naming then this will merely be a bit confusing for finite element users.

In the specific cases above, there are two relevant differences to note:

  1. Vertex vs node. Node means something quite different in finite element (a node is a basis function in the dual space to the finite element space), which is why finite element codes usually talk about vertices when discussing the mesh topology.

  2. face vs element or cell. node, edge, face, volume are names for mesh entities of a given dimension. Finite element is often more concerned with codimension, which counts downwards from the mesh dimension. A cell (or element) is an entity of codimension 0, i.e. an entity of maximal dimension. On a 3D mesh the cells are volumes and on a 2D mesh they are faces (cell is also defined for 2D or 1D meshes). A facet is an entity of codimension 1. Facets form the boundaries between cells. On a 3D mesh, the facets are faces while on a 2D mesh they are edges, and on a 1D mesh they are nodes. Given that a UGRID mesh knows its dimension, it is possible for software to identify the cells or facets so the difference in naming convention is not so significant.

@davidhassell
Copy link
Contributor

Hello,

Some colleagues were asking after the status of this proposal. As far as I'm aware, there are no outstanding objections other than the requirement to spell out some rules for the co-management of the two conventions: CF and UGRID.

The CF data model has now been accepted, and the rules for its management will be in CF-1.9. I think there are some similarities between the requirements rules for evolving the data model, and for evolving UGRID.

I am happy to draft some rules for UGRID, if that helps to get the ball rolling.

Thanks,
David

@JonathanGregory
Copy link
Contributor

Yes, please. That would be most helpful. Jonathan

@davidhassell
Copy link
Contributor

Here are my proposed additions to https://github.com/cf-convention/cf-convention.github.io/blob/master/rules.md. I probably haven't quite got it yet, but it's a start.

A key thing to note is the second line: "The assessment will be carried out by a member of the conventions committee or another suitably qualified person. If no-one volunteers, the chairman of the committee will ask someone to do it."

What this means is that every enhancement proposal must be "signed off" for any UGRID issues. Almost always this will be a trivial task (e.g. the introduction a new grid mapping attribute parameter - no problem!), but it needs to be done.

For simple cases, as in the example just mentioned, someone who is familiar (rather than expert) with UGRID could take care of this, but for more complicated proposals, the opinion of an expert from the UGRID community must be available.

The first and last sentences covers how to increment the version of UGRID that is acceptable to a give version of CF.


Additional rules relating to the UGRID conventions

All new proposals will be assessed to see if the new features defined in the proposal are compatible with the named version of UGRID that is defined for the current version of the CF conventions.

The assessment will be carried out by a member of the conventions committee or another suitably qualified person. If no-one volunteers, the chairman of the committee will ask someone to do it.

If the proposal is deemed to be not compatible with UGRID in some way, then an attempt must be made to modify the proposal so that its new features are compatible with UGRID, and in such a way that the proposal's intent is not compromised.

If the proposal cannot be acceptably modified to conform to the UGRID conventions, then UGRID will need to be modified to accommodate the new features. If UGRID is extended or generalized in some way that allows the new features but does not affect its existing structure and functionality, the proposal is considered backwards compatible. This is the preferred solution.

Any such changes to UGRID must be defined in general terms, and preferably with a detailed description of the UGRID alterations. However, to facilitate the progress of a proposal that requires UGRID changes, it is sufficient for the general nature of the UGRID modifications to be identified, on the understanding that the UGRID conventions will be updated in detail at a later date, possibly after the proposal has been accepted in all other aspects. Final acceptance will always rely on the completion of changes to the UGRID conventions, which is at the discretion of the UGRID community.

The UGRID conventions exist independently from CF and have their own repository and governance. Therefore the acceptance of a new version of UGRID, whether it arises from a change to CF or from an independent change to UGRID itself, must be raised and discussed in its own enhancement proposal in the usual manner. It follows that a change to CF that requires a change to UGRID will be associated with two GitHub issues - one for the change to CF and one for accepting the new version of UGRID.


@erget
Copy link
Member

erget commented Oct 2, 2020

In general, I'm not a huge fan of tight coupling but I don't have objections to the thrust of these changes - they aren't very onerous and won't force us to increment if we have no reason to.

The specifics make me slightly more nervous. My understanding is that, if proposed changes to CF are not compatible with the currently referenced version of UGRID,

then UGRID will need to be modified to accommodate the new features

via a proposal to UGRID, but

to facilitate the progress of a proposal that requires UGRID changes, it is sufficient for the general nature of the UGRID modifications to be identified, on the understanding that the UGRID conventions will be updated in detail at a later date, possibly after the proposal has been accepted in all other aspects. Final acceptance will always rely on the completion of changes to the UGRID conventions, which is at the discretion of the UGRID community.

Emphasis is my own.

If this is the case, does that mean that we would accept the proposal formally from CF's side when we have verified that a proposal to UGRID has been made, and then simply wait to merge the CF proposal until the UGRID proposal has been accepted? The danger is that this might take a long time, and then the CF baseline might have drifted, so that the original CF proposal would need revisiting. In all cases I think that this would have to be checked again to ensure that we don't introduce inconsistencies into the Conventions.

Alternatively we could merge immediately in the hopes that the UGRID proposal would be accepted - a bit too optimistic for my taste though and the risk would be that the two standards diverge.

@davidhassell
Copy link
Contributor

Hi @erget,

Thanks for highlighting these concerns. I agree with them. I guess I was thinking that the UGRID community is more integrated into the CF community than, say, the CRS-WKT community. In the latter example, we don't claim any influence on the "other" community, don't specify a CRS-WKT version, and have carefully instructed the user of a dataset on what to expect.

I checked back to Trac ticket 171, and there is not really an in depth discussion there, nor is there here other than on these governance rules. These concerns were discussed very carefully for the CRS-WKT case, and should be given more thought here, as well.

The CRS-WKT case is simpler, as it is more self contained. There are, however, many aspects of CF which, if changed, could affect UGRID. Perhaps the recent proposal for a domain variable could be one, for example.

It would be great to hear from some folks who work on UGRID.

I would like to see the two conventions evolve simultaneously, but also do not want to see the delay of new CF features that are needed by user communities that have no interest in UGRID.

It should be noted that if UGRID was formally moved into CF (e.g. as a new chapter 10), then all of this governance stuff goes away. I don't know if this has been discussed as an option elsewhere, but it should be stated in this issue why that's not desirable (if that is indeed the case).

Thanks,
David

@davidhassell
Copy link
Contributor

It would be great to hear from some folks who work on UGRID.

I see that the UGRID conventions GitHub repository has not been updated for 2 years, and the version being recommended for CF is UGRID 1.0, which was released 4 1/2 years ago and is also the latest version. At this time CF was at CF-1.6. This is absolutely not a criticism (CF has gone for longer periods in the past with no readily available signs of advancement, although progress was always going on), but we do need to be sure that UGRID 1.0 is compatible with the draft of CF-1.9. Has anyone looked at this?

@davidhassell
Copy link
Contributor

The original proposal suggests

We recommend that netCDF files that follow these conventions indicate this by setting the NUG defined global attribute Conventions to the string value "CF-1.8" which also implies "UGRID-1.0".

This says to me that the proposal is functionally equivalent to UGRID 1.0 being incorporated into CF in a new chapter that describes the UGRID that CF recognizes.

It follows, I think, that CF checkers will be expected to check the UGRID conventions (unlike CRS-WKT, which does not need to be checked). Therefore there is a need for conformance rules and recommendations that cover UGRID in relation to CF. Does UGRID already have it's own conformance rules?

For example, when a data variable has mesh topology, it would need to be stated unambiguously whether or not the coordinates attribute is mandatory, and when it is present whether or not it should always contain the coordinate variables implied by the values of the mesh, location, and location_index_set data variable attributes.

I do very much support UGRID being a part of CF, but think there are some important structural details that need to be worked out first.

At the current time, having thought about all this again, I wonder if the best two options are:

  • Incorporate UGRID into CF into as a new chapter.

  • If that is not acceptable, simply drop this issue, and if people want to use UGRID they should say so in the Convenions attribute "CF-1.9 UGRID-1.0". This would be the status quo approach, I think. In this case it would be up to the UGRID developers alone to ensure consistency with CF.

Thanks,
David

@JonathanGregory
Copy link
Contributor

Dear @davidhassell

The proposal was made because the UGRID developers at that time decided that they wanted UGRID to become part of CF. Regarding UGRID as part of CF, even though not in the same document, means that new developments in CF are obliged to consider UGRID, as well as the reverse. For both conventions it has both costs and benefits. However it was proposed not to incorporate it as a new chapter in the CF document because that would require reworking and rehosting their document, which would take substantial effort. Could we frame the rules as though UGRID was in effect a chapter of CF? We could even insert a chapter in CF that says "Chapter N, UGRID. See UGRID document at URL".

I agree with you that the conformance document needs rules relating to UGRID. That is necessary to making it part of CF. I also agree that it would be very helpful to hear from the UGRID developers and users about what they now think is the best way to proceed.

Best wishes

Jonathan

@hrajagers
Copy link

Dear David, Jonathan,

Thank you for moving this forward. In the very beginning we were thinking to move the definiiton for unstructured grids into the CF conventions, later the discussion moved more in the direction of a separate but affiliated convention, and in the end the discussion seems to have settled on a separately documented but linked convention. Since documentation for the CF conventions has shifted to GitHub as well I don't know whether the effort of porting the documents over to the CF repository is still a major issue, but there is a bit of a difference in documentation style. Since UGRID has received quite some uptake already, it may be advantagious to keep the separate identy whereas being more integrated into CF may increase the uptake in the wider OGC and GIS domains ... although via MDAL support (closely linked to CF) we have already made a good step forward in that direction. By keeping the conventions separate it helps to lower the threshold for implementing either convention. The coin can still flip either way for me, but the concept of modularity introduces a slight preference for keeping the conventions somewhat separate. Many of the core developers of the UGRID conventions are well embedded in the CF conventions, so I think it would be quite acceptable to set out some formal rules as you describe to assure that the two conventions develop in a mutually compatible way.

The UGRID 1.0 convention has indeed remained quite stable, the main reason for this is that we already had quite some discussion and uptake before we released version 1.0 ... and ... conventions for specific topics can remain stable for much longer periods than the CF conventions that cover a wide range of features. I don't know of any recent changes in CF that would make UGRID less compatible with 1.9 than with 1.6. We use it ourselves in combination with 1.8 on a daily basis. The geometries are the only development closely related to UGRID and at some time it used a very similar definition format, but given the considered use cases it made more sense there to drop the concept of shared nodes in favour of simplicity and consistency with other storage formats for GIS feature sets. At Deltares we have a draft extension to allow the 1D discretisations within UGRID to be defined on a curved 1D space ... for this extension we build on those geometries introduced in CF 1.8 to describe the shape of the space (i.e. river) before descretisation. So, we try to adopt new CF features as much as possible.

There is one element in UGRID that is not fully compatible with CF in any version and that's the use of the attribute "cf_role" to identify the "mesh_topology" and various types of connectivity variables such as the "face_node_connectivity". The name of this attribute was chosen when it was anticipated that UGRID was to move into CF, but if remains a separate convention we may have to reconsider this attribute name ... or CF would have to formally permit this type of use.

The question about a UGRID checker has popped up twice or so in discussions I had regarding UGRID over the last year. We haven't implemented a formal UGRID checker, but it would indeed be useful to have.

Best regards,

Bert

@davidhassell
Copy link
Contributor

Dear Bert,

Thank you for describing all of the history that has occurred here - it really is very helpful, particularly the interactions you have had on the geometries front.

A summary of my position would be that I would support UGRID being moved into CF ("chapter 10"), but if this is not possible then I think that we would still be able to find a way to make things work satisfactorily.

Governance

If UGRID were incorporated into the CF text, there is no governance issue. It's an issue that only arises if it lives outside.

If UGRID lived outside, I genuinely do not expect any problems in the CF/UGRID working relationship, for the same reasons described by @hrajagers, but the part of the point of the governance rules is to prevent problems arising, however unlikely. We would need to cover situations thorny situations, such those raised by @erget.

One of my own concerns with UGRID not being inside CF is its general invisibility to users. The proposed introductory text is very short and will surely not result in most users seeking out and reading the full UGRID conventions. For instance, what would happen if a user applied mesh and location attributes to a non-UGRID data variable in good faith, having checked in Appendix A that these attributes are not standardised? The UGRID-aware CF checker might tell them their dataset is broken. OK, they could then rename the attributes with this new-found knowledge, but they need to be able to ascertain this before creating datasets, and I don't think that this is easy enough were UGRID to exist elsewhere.

cf_role

This comes down to variable identification, clearly. From a CF perspective, we know that a data variable employs UGRID because it has location_index_set, or both mesh and location, attributes. One of these attributes identifies the relevant mesh container variable, that in turn identifies other required variables (such as edge node connectivity variables). I don't need cf_role to make any of these connections, so the use of cf_role comprises a redundancy.

So, if we were starting ab initio, I would argue for dropping the cf_role attributes altogether. But we are not! So I can see there is an argument for keeping it for backwards UGRID-compatibility, i.e. extending the use of cf_role to include UGRID (as well as DSGs). Or perhaps it could be dropped without problems?

What do others think?

Conformance

I presume from what you say that there are no conformance rules (rather than there are rules but no existing software)? Is that right. We only need the rules to get UGRID (in any form) into CF. If it were helpful, I would be happy to draft some CF-style conformance rules.

@JonathanGregory
Copy link
Contributor

Dear all

@davidhassell makes a good point about the attributes. If UGRID is to be regarded as part of CF (whether within the document or as linked document with a consistent version) it would make sense for UGRID's attributes to be included in Appendix A, or listed in a separate Appendix (like the grid-mapping ones are) since they aren't general-purpose attributes. As David also said, its requirements should appear as a section of the CF conformance document. Also any important terms which it needs to define could be added to section 1.2, and throughout the CF document any relevant references to UGRID should be inserted. These things would naturally be done if the main UGRID description were included in the CF document, and would help with visibility and consistency to do them in any case. If UGRID isn't completely moved into the CF document as a new chapter, I think it would be worthwhile adding a subsection to describe it. That could be in section 1, like the subsection we have there about COARDS.

I agree with David that if the presence or function of UGRID variables can be identified by the presence of particular attributes, cf_role isn't needed. It's redundant and therefore could cause inconsistency. A possible approach would be to deprecate it, which means the CF checker (when made UGRID-aware) would emit a warning if cf_role was included in these roles. The checker should also give an error if cf_role is present and wrongly used - that would be a consistency rule that would appear in the conformance document.

Best wishes

Jonathan

@hrajagers
Copy link

Dear @davidhassell and @JonathanGregory,

I'm not afraid of UGRID getting too little traction as it has already received quite some uptake outside CF, but always formulated as closely linked to CF. However, it would definitely be good to discuss the different use cases for pure CF and CF extended with UGRID somewhere clearly in the CF document.

Governance

I agree on the need for governance documents when UGRID wouldn't be integrated. Regarding the comment made by @erget, I do understand the need for the statement "Final acceptance will always rely on the completion of changes to the UGRID conventions, which is at the discretion of the UGRID community." since one community can never decide what another community must do, but I also agree that your progress shouldn't be limited by another slower community. That's exactly the same reason -- but looking from the other side -- why some may prefer to keep UGRID as a lean and mean separate convention instead of moving it into the bigger CF community. The CF community can always decide at any time to take all the UGRID ideas and merge them into the CF docs (the reverse is extremely unlikely), but my expectation is that as long as the main contributers for the two conventions don't change significantly we'll be able to work out a shared path forward. We can ask developers from the other convention to evaluate proposed modifications if we expect that such changes may cause incompatibilities. If such incompatibilities are detected only in hindsight due to misjudgement then it's in our shared interest and collaboration to resolve them as soon as possible -- most likely by the community that caused the incompatibility in the first place, but the two communities may decide otherwise in collaboration. If desirable changes in one convention would require changes in the other convention, then we can propose such changes to the other community and discuss them together as part of the shared interests. The second community may accept those adjustments, or propose an alternative. If none of the alternatives is acceptable for both communities ... then it would be time to merge the last set of accepted UGRID conventions into CF. Someone else may be able to capture this approach in a more formal way.

cf_role

Regarding the usage of cf_role I agree that the face_node_connectivity role and similar connectivity roles could be dropped since their purpose and meaning should be clear from the corresponding attribute names pointing to them, and most UGRID implementations will probably not check them anyway. However, I'm hesitant about dropping the role mesh_topology since I expect that this will break most if not all current UGRID readers. Your reasoning that one could follow the mesh attributes on the data variables to identify the mesh(es) in the file works in many cases, but wouldn't work if there is only a mesh in the file and no data variable (except for the node, edge and face coordinates) which may or may not refer back to the mesh container variable depending on your philosophy (UGRID doesn't specify whether that is allowed, required or optional). In regular CF the mesh is also only implied by the auxiliary coordinates listed on the data variable (or the dimensions used), so that pathway would indeed be consitent but we do have use cases with only mesh and node coordinates on the file. If we must drop the use of cf_role for CF compatibility, I would be in favour of introducing a new attribute called mesh_type or grid_type to replace it ... the only allowed value would initially be ugrid, but ...

sgrid

... it could be extended with sgrid for staggered structured meshes (see decription of the SGRID conventions). The style of attribute was copied from the geometry_type attribute introduced for the geometry container variable.

conformance

Reading through the UGRID conventions document, I realize that we have sometimes used the word "should" when we actually intended "must", but overall we tried to be very explicit about required and optional attributes so writing down the conformance requirements in a form consistent with these CF pages should be fairly straightforward.

Best regards,

Bert

@davidhassell
Copy link
Contributor

Dear Bert,

Keeping UGRID separate clearly has some advantages, as you describe. In addition (and I think that you implied this), CF could always stick to an older version of UGRID, if newer features are not to its liking. The "nuclear" option, of merging the last set of accepted UGRID conventions into CF if resolution can not be reached, is a good backstop.

Assuming that UGRID were to remain separate, I think that the governance framework that you describe would work well - thanks.

So with my CF hat on I would prefer incorporation, but I'm fine with UGRID being separate if we can work out how to deal with any structural items that have been mentioned, such as the issue of how to let the rest of CF know about attributes reserved for UGRID.

Another area of possible friction is highlighted by the use of datasets with meshes but no data. Right now, storing a mesh without a data variable is not allowed - there's no encoding for it, and (more pertinently) it is not allowed by the CF data model. However, help is at hand with the proposed introduction of a new "domain" variable (#301), that will allow domains (meshes) to exist on their own. Would UGRID want to change to adopt the domain variable approach, or could we allow a special case for UGRID meshes?

Thanks,
David

@davidhassell
Copy link
Contributor

Dear Bert, Jonathan, and all,

I would like to try to summarize the ideas that have been discussed in the form of some broad proposals that I hope could be acceptable to allow us conclude this issue. I welcome your feedback.

In no particular order:

(A) The governance is written up along the lines of @hrajagers ideas: #153 (comment)

(B) Comprehensive conformance rules are written up for UGRID. These should be maintained alongside UGRID in its repository, and referenced from (not copied into) the CF conformance document.

(C) Update the aforementioned CF Appendix A to include the relevant UGRID attributes, thereby making them visible to all users. Mention in the governance rules that this table needs maintaining.

(D) Based on @hrajagers previous comment, dropping the standardisation of cf_role on the "connectivity" variables, but retaining it on the mesh topology variable. This is related to my previous comments about the use of datasets with meshes but no data, which I now withdraw. A mesh topology variable can actually contain multiple domains in the CF-sense, one of which can "picked out" by a data variable. This makes it sufficiently different, I realise, to the proposed CF domain variable (#301) that we shouldn't to unify them at this time.

(E) Add some text to CF 5.8 (Domain Variables) (currently being proposed in #301) to explain the UGRID mesh topology variable and how it relates to a domain variable. It may the case that the occasional note relating to UGRID would be useful in other sections. I don't propose to review for these, but they could always be added as when it was felt to be useful.

Thanks,
David

@davidhassell
Copy link
Contributor

A note on the CF data model:

I think that UGRID need not affect the CF data model at this time.

This is because CF does not currently formalise connections between data variables, on the same or different domains. A mesh topology variable collates multiple domains (one for faces, one for edges, etc.), but a given data variable only refers to one of them (e.g. data:location = "face" ;). How you relate a "face" data variable to an "edge" one is moot when you abstract out the netCDF encoding - you get to the same place if you do it by inspection of the coordinate values, or by inspection of the mesh topology and data variable attributes.

I realise that you could say that the point of UGRID is to make these relations explicit, but if that is to be the case then it should be propagated to other areas of CF (e.g. as SGRID proposes), and so should be considered in the round at a later stage.

Does sound reasonable?

Thanks,
David

@rabernat
Copy link

rabernat commented Oct 19, 2020

Thanks to everyone who is working through these important issues.

I strongly support incorporating SGRID into this same framework.

@JonathanGregory
Copy link
Contributor

Dear @davidhassell

Thanks for this summary. I agree with (A), (B) and (E) as given.

Re (C), I would suggest that only the UGRID attributes which can appear on data variables should be added to CF Appendix A. If I have read the document correctly, these are mesh, location and location_index_set. The other attributes belong to mesh variables. To prevent accidental collision with CF, it would nonetheless be useful to tabulate them, but I'd suggest we put them in a new CF appendix specifically about the UGRID mesh topology variable, consisting of the table with an introductory sentence or two. This would be like the treatment of the attributes of the grid mapping variable, which are in a table in Appendix F, not in Appendix A. I note that the geometry variable attributes appear in Appendix A, but there are only five of them, whereas there are 18 attributes of the mesh topology variable.

Re (D), if Bert @hrajagers and UGRID colleagues are OK with dropping cf_role for connectivity variables, that's good. They could continue to be allowed but deprecated, rather than disallowed. That's a decision to be made when the conformance rules (B) are written. It seems to me that cf_role is also redundant on the mesh topology variable, because it must also have a topology_dimension attribute, it seems. Couldn't that be used as the defining characteristic of a mesh topology variable? If so, this cf_role could also be deprecated; it can't be disallowed if current software depends on it, as Bert says.

(F) I proposed earlier that we could add a short subsection of CF section 1 to introduce UGRID and its purpose, to make clear its special synergy with CF, to remark on the appearance of attributes in CF appendices, and to say that it has its own conformance document which complements the CF conformance document. What do you think?

I agree with you that the relationship between domains of different data variables is not currently considered in the CF data model, but not inconsistent with the data model. If UGRID is not being included in CF, we don't have to consider it at the moment.

Regarding Ryan @rabernat's comment, I think it would be fine to consider SGRID as well, but let's do it as a separate issue, and perhaps after UGRID, because we may not have enough mental capacity to deal with both at once.

Best wishes

Jonathan

@davidhassell
Copy link
Contributor

Dear @JonathanGregory,

Thanks for these comments

I agree with your updated (C).

I'm also fine in (D) with a mesh topology variable getting its canonical identity from the topology_dimension

@ChrisBarker-NOAA
Copy link
Contributor

So I wonder if that, as far as the data model is concerned, the data array is unconditionally optional.

Is it? Or is it optional in the data model? Don't you need SOME way of specifying the location of the data at hand?

@davidhassell
Copy link
Contributor

I think that @ChrisBarker-NOAA is right - we do need some way of specifying the cell locations, and that can be with either coordinates (C), or bounds (B), or both coordinates and bounds. I.e. omitting both coordinates and bounds is disallowed, but any other combination is OK.

Other constructs, such as domain ancillary constructs, insist on having a data array present. So it makes sense that we insist on that for coordinate constructs - but noting that the data can be provided in one of three ways (i.e. C, B, or C & B).

Then also note, as @JonathanGregory said, "Although the data array of the coordinate values is optional in the data model, it is mandatory in CF-netCDF, with two exceptions: simple geometries, and UGRID cells described by a mesh topology variable."

@ChrisBarker-NOAA
Copy link
Contributor

@JonathanGregory wrote:

So I wonder if that, as far as the data model is concerned, the data array is unconditionally optional.

Maybe getting sidetracked here, but why is it unconditionally optional in the data model? Isn't a variable with no specification of its location kind of useless?

But I may be getting confused about what exactly is being talked about.

@JonathanGregory
Copy link
Contributor

JonathanGregory commented Apr 19, 2023

Dear @davidhassell and @ChrisBarker-NOAA

I can't think of an existing case where you would use a coordinate construct with neither coordinates nor bounds, so I agree with David that we must have one or the other although they are individually optional. So I was wrong to suggest that the coordinates are unconditionally optional. The first paragraph of the coordinate constructs in the data model could be something like this:

Coordinate constructs (Figure I.3) provide information which locate the cells of the domain and which depend on a subset of the domain axis constructs. A coordinate construct consists of an optional data array of the coordinate values spanning the subset of the domain axis constructs, properties to describe the coordinates (in the same sense as for the field construct), an optional data array of cell bounds recording the extents of each cell, and any extra arrays needed to interpret the cell bounds values. In the data model, all the components of the coordinate construct are optional, but it is mandatory to include either the coordinate array or the bounds array, and both may be included. In CF-netCDF, the coordinate array is mandatory, except for simple geometries and UGRID cells described by mesh topology variables.

Best wishes

Jonathan

@davidhassell
Copy link
Contributor

davidhassell commented Jul 18, 2023

Hello,

Since the beginning of May there has been a lot of discussion on this topic over at ugrid-conventions/ugrid-conventions#65 and ugrid-conventions/ugrid-conventions#66 and off-line - many thanks to @ChrisBarker-NOAA, @JonathanGregory, @hrajagers, @pp-mo and @drf5n for taking the time to think about this.

The result of all this is that we've had to change the nature of the CF data model Domain Topology construct, and need to make a note about "boundary_node_connectivity". None of this changes the agreed principles and approach of incorporating UGRID into CF.

It is very desirable to get this into CF-1.11 which will be released later this year (potentially as soon as the end of September), not least because a variety of general circulation models that need to archive in CF-netCDF are now starting to use UGRID to store their outputs.


Domain Topology construct description

Here is the new description of the Domain Topology construct (see #153 (comment) for the original description):

Domain topology construct

A domain topology construct describes logically and explicitly the contiguity of domain cells indexed by a single domain axis construct, where two cells are described as contiguous if and only if they share at least one common boundary vertex. A domain construct allows contiguity to be ascertained without comparison of boundary vertices, which may be co-located for non-contiguous cells. A domain construct may contain at most one domain topology construct.

A domain topology construct contains an array that spans a single domain axis construct with the addition of an extra dimension that indexes the cell bounds for the corresponding coordinates. Identical array values indicate that the corresponding cell vertices map to the same node of the domain, but otherwise the array values are arbitrary.

In CF-netCDF a domain topology can only be provided for a domain defined by a UGRID mesh topology variable, supplied by a node connectivity variable, such as is named by a "face_node_connectivity" attribute. The indices contained in a node connectivity variable may be used directly to create a domain topology construct but the CF data model attaches no significance to the values, other than the fact that not all indices are the same.

The old version described a Boolean array which indicated which pairs of cells were contiguous, whilst the new version is an array that that has the exact form of a UGRID node connectivity variable.

The old version seemed at the time like a clear abstraction of what is going on, but there were still some subtleties which hadn't been considered (such as co-located but non-contiguous cells); and it turned out that with the old version it was not always possible to make the round trip from UGRID dataset -> CF data model constructs -> UGRID dataset. The first step was OK, but the second step still relied on inspection of cell bounds and examples were found which did not work, even given unambiguous bounds comparisons.


Edit: sent too soon - carrying on the next post!

@davidhassell
Copy link
Contributor

... carrying on from the last post!


Boundary Node Connectivity

The UGRID boundary_node_connectivity attribute provides metadata towards describing the boundary conditions that constrain the data, and which are distributed over the same sampling domain as the field itself. The boundary conditions provide information on the processes that produced the data, and may inform operations on the the continuity of the data across them. Note that such boundary conditions are constrained by the domain definition, but does not contribute to the domain's definition.

I propose to explicitly exclude Boundary Node Connectivity from being recognised by CF, given no current use cases for it, for the following reasons:

  1. Practically, UGRID currently lacks a mechanism for associating the variable containing the nature of the boundary conditions with the variable containing their locations.

  2. The nature of the boundary conditions is part of the model formulation. Whilst the formulation is clearly important, CF transcends this: it says "yes, you could compare/combine these two datasets and get a meaningful result", but that the interpretation of that result is a function of model formulation, experimental design, instrument type, etc., and these are up to the user to determine outside of CF. In this feature, UGRID is geared towards storing information needed to configure a model that uses the grid (Some questions about Boundary Edges ugrid-conventions/ugrid-conventions#65 (comment)). I think that this is beyond CF at this time (but is clearly an area that could be explored later).

  3. The current UGRID specification only allows for interface conditions along (N-1)d cell interfaces of Nd cells (e.g. edges of faces), but it is already being considered to extend this to interfaces at nodes of any cell type (edge/face/volume); at volume edges; and at volume faces. As this feature is under development, CF should consider incorporating it into the data model (for it would require a new data model construct) when it is more complete.

To that end, I suggest adding this new text to the new "Mesh topology" section of chapter 5 (see https://github.com/cf-convention/cf-conventions/pull/353/files#diff-3c189abe47ef902923e4a6126a2fe909ed568bcacae933778144094935c0a9d8 for the existing changes):

The UGRID conventions <<UGRID>> allow for the specification of boundary conditions that applied to the creation of the data, via the boundary_node_connectivity attribute, but this feature is not included in this version of CF.

Also, modifying the mention of boundary_node_connectivity from Appendix K (https://github.com/cf-convention/cf-conventions/pull/353/files#diff-d67e5a9a0f7dc06129dad9631f241b99f4fa8d962d5e81a15c2bb0776149f745).

| **`boundary_node_connectivity`**
| S
| MT
| Specifies an index variable identifying the nodes that each boundary element
(i.e. the nodes that define each edge of a face, or the nodes that define each face of a volume).
This attribute is not recognized by this version of CF.

@davidhassell
Copy link
Contributor

davidhassell commented Jul 18, 2023

... and lastly ...

If you could cast an eye over these changes (which are pretty small, once you strip out all of the recorded thinking!) that would be great help. I'll rework the PR (#353) only when the proposed changes are OK with all.

Thank you!

@davidhassell
Copy link
Contributor

@ChrisBarker-NOAA, is it indeed the case that data defined at nodes (which do not have bounds) are considered to be connected to the other nodes defined by whichever [edge|face|volume]_node_connectivity variables are present? If so, then the new Domain Topology construct definition does not work for this case, although the old definition would ....

@davidhassell
Copy link
Contributor

Perhaps "contiguity" and "connectedness" are different concepts for which we need two CF data model constructs? The case I mentioned above maybe feels more "connected" than "contiguous".

@JonathanGregory
Copy link
Contributor

Dear @davidhassell

Thanks for your work on this and the new text. Some aspect of it are unclear to me. They probably are clear given knowledge of the UGRID spec, which unfortunately I have forgotten! But the CF data model text should not depend on knowledge of the UGRID spec.

  • What does "which may be co-located for non-contiguous cells" mean? If two vertices are co-located - effectively identical - surely the cells which have these vertices are contiguous? Similarly, what are "co-located cells", mentioned later? Does this mean two cells which occupy exactly the same space (of the appropriate dimensionality: space, face, line or vertex)?

  • "an extra dimension that indexes the cell bounds for the corresponding coordinates". What are the "corresponding coordinates"? Does this mean the cells? Is the index over all the bounds i.e. any cell can refer to any bound (so this dimension is at least twice as large as the dimension of the domain axis), or is it a ragged array in which each cell refers to its own bounds only?

  • What is a "node of the domain"?

Excluding the boundary node connectivity from CF for the moment seems reasonable to me, since the definition is fluid. This information might later be treated as data in CF, or perhaps ancillary variables, rather than metadata of the domain.

Best wishes

Jonathan

@davidhassell
Copy link
Contributor

Hello,

Since July, Chris Barker, Jonathan Gregory and myself have had many off-list discussions on getting UGRID within CF, and the three of us are now happy that we now have a data model extension which properly works for UGRID, with no unaccounted for corner cases.

I shall close the original PR and open a new one (#459) that contains the full integration of this new model into the conventions text, but for ease of reference, the new data model Domain topology and Cell connectivity constructs that we have devised are reproduced in this message.

We are considering the 3 week cooling-off period for merging this change as starting from now, but of course welcome feedback of any sort in the usual manner, and if that exposes anything we've missed and the clock is reset, so much the better!

All the best,
David

Domain topology construct

A domain topology construct defines the geospatial topology of cells arranged in two or three dimensions in real space but indexed by a single (discrete) domain axis construct, and at most one domain topology construct may be associated with any such domain axis.
The topology describes topological relationships between the cells - spatial relationships which do not depend on the cell locations - and is represented by an undirected graph, i.e. a mesh in which pairs of nodes are connected by links.
Each node has a unique arbitrary identity that is independent of its spatial location, and different nodes may be spatially co-located.

The topology may only describe cells that have a common spatial dimensionality, one of:

  • Point: A point is zero-dimensional and has no boundary vertices.
  • Edge: An edge is one-dimensional and corresponds to a line connecting two boundary vertices.
  • Face: A face is two-dimensional and corresponds to a surface enclosed by a set of edges.

Each type of cell implies a restricted topology for which only some kinds of mesh are allowed.
For point cells, every node corresponds to exactly one cell; and two cells have a topological relationship if and only if their nodes are connected by a mesh link.
For edge and face cells, every node corresponds to a boundary vertex of a cell; the same node can represent vertices in multiple cells; every link in the mesh connects two cell boundary vertices; and two cells have a topological relationship if and only if they share at least one node.

mesh_figure
Figure I.5 A topology defined by a mesh with five nodes and six links.

For example, the mesh depicted in Figure I.5 may be used with any of three domain topology constructs for domains comprising two face cells (one triangle and one quadrilateral), six edge cells, and five point cells respectively.

A domain topology construct contains an array defining the mesh, and properties to describe it.
There must be a property indicating the spatial dimensionality of the cells.
The array values comprise the node identities, and all array elements that refer to the same node must contain the same value, which must differ from any other value in the array.
The array spans the domain axis construct and also has a ragged dimension, whose function depends on the spatial dimensionality of the cells.

For each point cell, the first element along the ragged dimension contains the node identity of the cell, and the following elements contain in arbitrary order the identities of all the cells to which it is connected by a mesh link.

For each edge or face cell, the elements along the ragged dimension contain the node identities of the boundary vertices of the cell, in the same order that the boundary vertices are stored by the auxiliary coordinate constructs.
Each boundary vertex except the last is connected by a mesh link to the next vertex along the ragged dimension, and the last vertex is connected to the first.

When a domain topology construct is present it is considered to be definitive and must be used in preference to the topology implied by inspection of any other constructs, which is not guaranteed to be the same.

In CF-netCDF a domain topology construct can only be provided for a UGRID mesh topology variable.
The information in the construct array is supplied by the UGRID "edge_nodes_connectivity" variable (for edge cells) or "face_nodes_connectivity" variable (for face cells).
The topology for node cells may be provided by any of these three UGRID variables.
The integer indices contained in the UGRID variable may be used as the mesh node identities, although the CF data model attaches no significance to the values other than the fact that some values are the same as others.
The spatial dimensionality property is provided by the "location" attribute of a variable that references the UGRID mesh topology variable, i.e. a data variable or a UGRID location index set variable.

A single UGRID mesh topology defines multiple domain constructs and defines how they relate to each other.
For instance, when "face_node_connectivity" and "edge_node_connectivity" variables are both present there are three implied domain constructs - one each for face, edge and point cells - all of which have the same mesh and so are explicitly linked (e.g. it is known which edge cells define each face cell).
The CF data model has no mechanism for explicitly recording such relationships between multiple domain constructs, however whether or not two domains have the same mesh may be reliably deternined by inspection, thereby allowing the creation of netCDF datasets containing UGRID mesh topology variables.

The restrictions on the type of mesh that may be used with a given cell spatial dimensionality excludes some meshes which can be described by an undirected graph, but is consistent with UGRID encoding within CF-netCDF.
UGRID also describes meshes for three-dimensional volume cells that correspond to a volume enclosed by a set of faces, but how the nodes relate to volume boundary vertices is undefined and so volume cells are currently omitted from the CF data model.

Cell connectivity construct

A cell connectivity construct defines explicitly how cells arranged in two or three dimensions in real space but indexed by a single domain (discrete) axis are connected.
Connectivity can only be provided when the domain axis construct also has a domain topology construct, and two cells can only be connected if they also have a topological relationship.
For instance, the connectivity of two-dimensional face cells could be characterised by whether or not they have shared edges, where the edges are defined by connected nodes of the domain topology construct.

The cell connectivity construct consists of an array recording the connectivity, and properties to describe the data.
There must be a property indicating the condition by which the connectivity is derived from the domain topology.
The array spans the domain axis construct with the addition of a ragged dimension.
For each cell, the first element along the ragged dimension contains the unique identity of the cell, and the following elements contain in arbitrary order the identities of all the other cells to which the cell is connected.
Note that the connectivity array for point cells is, by definition, equivalent to the array of the domain topology construct.

When cell connectivity constructs are present they are considered to be definitive and must be used in preference to the connectivities implied by inspection of any other constructs, apart from the domain topology construct, which are not guaranteed to be the same.

In CF-netCDF a cell topology construct can only be provided by a UGRID mesh topology variable.
The construct array is supplied either indirectly by any of the UGRID variables that are used to define a domain topology construct, or directly by the UGRID "face_face_connectivity" variable (for face cells).
In the direct case, the integer indices contained in the UGRID variable may be used as the cell identities, although the CF data model attaches no significance to the values other than the fact that some values are the same as others.

Restricting the types of connectivity to those implied by the geospatial topology of the cells precludes connectivity derived from any other sources, but is consistent with UGRID encoding within CF-netCDF.

@davidhassell
Copy link
Contributor

Hello

We are considering the 3 week cooling-off period for merging this change as starting from now, but of course welcome feedback of any sort in the usual manner, and if that exposes anything we've missed and the clock is reset, so much the better!

Just a reminder that these changes will be merged in a couple of weeks unless any non-editorial questions are raised, so please do have a look at the PR #459 if you are interested.

Thanks,
David

@JonathanGregory
Copy link
Contributor

Dear David

Thanks very much. I agree with the contents. I have two small editorial suggestions:

  • I stumbled on "however whatever". I suggest deleting "however" and starting a new sentence with "Whatever".

  • In the sentence

When cell connectivity constructs are present they are considered to be definitive and must be used in preference to the connectivities implied by inspection of any other constructs, apart from the domain topology construct, which are not guaranteed to be the same.

it's unclear what "which" refers to. I suggest

When cell connectivity constructs are present they are considered to define the connectivity of the cells. Exactly the same connectivity information could be derived from the domain topology construct. Connectivity information inferred from inspection of any other constructs is not guaranteed to be the same.

Cheers

Jonathan

@taylor13
Copy link

When you edit, you might insert a comma after the opening clause in the first sentence of the above quoted text i.e., "When cell connectivity constructs are present, they ...."

@davidhassell
Copy link
Contributor

Thanks @JonathanGregory and @taylor13, I have included your suggestions in PR #459.

@davidhassell
Copy link
Contributor

Hello - just a reminder that PR #459 is due to be merged in two days time, along with the PR that updates the rules: https://github.com/cf-convention/cf-convention.github.io/pull/210/files

@sadielbartholomew
Copy link
Member

sadielbartholomew commented Oct 11, 2023

I'm clearly quite late to the party here, but I have been carefully reading through the Issue here and the connected PR, and I wanted to register my support for this. I've also reviewed the PR #459 from a high-level perspective and it seems very sensible to me.

I particularly think it is good that we are not hard-coding aspects of the UGRID conventions into the CF Conventions, but instead referencing them, to loosen the coupling and making it difficult for the two standards can get out-of-sync; and that the data models for both have been considered at the forefront for establishing and maintaining a formal association.

I raised a few questions as comments on the PR, #459 (review) which occurred to me, but on further reading here might have been answered already in the above thread (there are quite a lot of comments to work through!).

@davidhassell
Copy link
Contributor

Many thanks for your review, @sadielbartholomew - there's a lot of stuff here to wade through!

In response to your good point about the Conventions attribute, I have added a line to section 2.6.1:

"The UGRID conventions, which are fully incorporated into the CF conventions, do not need to be included in the Conventions attribute. "

This was always accepted as the case, as you also point out, so I don't think we need to reset the clock for this addition.

@davidhassell
Copy link
Contributor

As far as I'm aware, there are no outstanding comments on this issue and pull request, so I'll give it another 24 hours just in case and then merge #459, alsongside cf-convention/cf-convention.github.io#210 (update the rules for UGRID).

So many thanks to everyone who spent time on this (a lot of time, in some cases!) - we have, I think, arrived at a very satisfactory outcome.

David

@sadielbartholomew
Copy link
Member

I am glad to see this can be merged. 🎉

As some wash-up/tidying, I see that UGRID currently state in their README:

Note: NetCDF files using this convention can be be given the global attribute Conventions = 'CF-1.6, UGRID-1.0' if they are CF- and UGRID-compliant, or just Conventions = 'UGRID-1.0' if they are not CF-compliant.

which can be updated now in light of the new advice in #153 (comment). Should we open an Issue or PR to suggest this over on their repository?

@JonathanGregory JonathanGregory added the change agreed Issue accepted for inclusion in the next version and closed label Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change agreed Issue accepted for inclusion in the next version and closed enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format
Projects
None yet