implicit bounds for contiguous cells #380

TomLav · 2024-10-14T20:23:30Z

TomLav
Oct 14, 2024

Question

Hello,

I am aware of the grid_mapping mechanism to map from (regularly spaced) xc,yc coordinates to lat,lon using a Coordinate Reference System. This is covered in 5.6.

I am also aware of the mechanism with bounds to describe that a value is not representative of a point, but of a grid cell. This is covered in 7.1

I am worried that fully describing the bounds of my grid cells, both in xc/yc and in lat/lon will require a lot of storage space if I must explicitely have both the 1D xc, yc, the 2D lat, lon, the associated xc_bounds(xc, 2), yc_bounds(yc, 2) and the associated lat_bounds(xc, yc, 4) and lon_bounds(xc, yc, 4).

I thus have the following questions:

Is there a CF-way to indicate that my xc, yc are not points but grid cells, and that these grid cells are contiguous, without explicitly writing their bounds?
Since CF-1.8 it is no more mandatory to provide 2D lat and lon variables when the CRS and xc and yc are provided. Can this be extended to the lat/lon bounds when the CRS and xc_bounds and yc_bounds are provided? An application should be able to compute the bounds directly from the 1D bounds just like it can compute the lat,lon from xc,yc?

Combining the two ideas above, I am wondering if we could design a simple scheme where all the bounds can be computed by an application assuming that the 1D xc and yc values are the centers of contiguous cells (and providing the CRS).

I am aware of chapter 8.3 that describes Compression by Coordinate Sub-sampling, that handles coordinates and bounds interpolation from tiepoints. It could cover all the aspects above, but I am maybe worried that it is not yet supported by most software.

Thank you for your guidance,

Armin-RS · 2024-10-15T06:11:28Z

Armin-RS
Oct 15, 2024

Hi @TomLav,
that is an interesting question.

But I'm wondering how you could have both x/y-bounds and lat-lon bounds?

Because in my understanding in most cases, a rectangular grid cell in x/y-coordinate space will be a distorted cell in lat-lon coordinate space where the "width" of the lat-lon cell is different on the southern side of the box compared to the northern side. And where the sides of the cell are not straight lines in lat-lon coordinate space.

2 replies

TomLav Oct 15, 2024
Author

Hei @Armin-RS,
Thanks. In 7.1 Cell Boundaries there is Figure 7.2 that shows how bounds of 2D "distorted" lat/lon grids are approximated by 4 sides. I agree with you that the "true" bounds are those of the x/y-bounds, and that the lat/lon bounds can get very distorted or messy close to projection singularities.

taylor13 Oct 15, 2024
Collaborator

A side note: Most supposedly "conservative regridding" codes construct cell bounds assuming cell edges follow great circles (connecting the corners), which is, of course, incorrect for latxlon grids where the northern and southern cell edges coincide with latitude circles. For coarse grids this can lead to errors that may not be insignificant; the property supposed to be conserved isn't. See Taylor (2024) .

ethanrd · 2024-10-15T23:50:01Z

ethanrd
Oct 15, 2024
Maintainer

The last paragraph of the Section 4 "Coordinate Types" intro, just before Section 4.1 "Latitude Coordinate", ends with this sentence:

If bounds are not provided, an application might reasonably assume the gridpoints to be at the centers of the cells, but we do not require that in this standard.

I think a lot of generic applications assume this. (I often forget it isn't actually a required part of the standard.)

So I think the answer to your first question is that there isn't a CF way to indicate this but a lot of tools make that assumption.

5 replies

davidhassell Oct 16, 2024
Maintainer

I'm not sure I ever knew that was in the standard :).

This brings up the question of "how do we indicate that cells have zero size?". Is omitting the bounds variables sufficient, or do we have to provide a bounds variable for which each each cell's vertices are the same value? I would say omitting the bounds variable is sufficient - imagine an auxiliary coordinate variable containing weather station locations. In this case, software making an assumption that Voronoi bounds would not be right, perhaps.

TomLav Oct 16, 2024
Author

Thanks @ethanrd.

I wonder if this sentence is maybe misplaced. It says something about assumptions the user / application is going to make in case there are no bounds provided. Maybe it would have more impact if it was placed (or at least restated) in the bounds section (section 7).

If bounds are not provided, an application might reasonably assume the gridpoints to be at the centers of the cells, but we do not require that in this standard.

Two information here: centers and of the cells. So in the absence of specific bounds, the user / application can rightly assume that there are cells involved (of which the center points are provided).

@davidhassell rightly asks how to then indicate discrete positions? I do not have the answer, but there are ways to indicate discrete geometries for station data and trajectories with featureType.

Going back to my question: if the "reasonable assumption" in the absence of bounds is that the coordinate values are the centers of cells, would it harm to extend this assumption to the centers of contiguous cells? Isn't this the most natural assumption when providing data on locations that you assume are the center of cells? Of course the cells can overlap and be disjointed, but would you assume that? I think not.

For the time being, we are just exploring what the reasonable assumptions are. But I think we might also want to think of a way to actually tell the user / application that "this coordinate variable is the center of a cell and all cells are contiguous" without the overhead of providing bounds (this is an overhead for the data producer, for data storage, and for the user / application that has to test/decide if the bounds are contiguous or not).

While we are at it, we could try and give a way for the data producer to unambiguously say that the data are for discrete locations (unless we say that all data that are at discrete locations have to use one of the ``featureType`).

taylor13 Oct 16, 2024
Collaborator

I haven't studied feature types, but for cell-size of zero, wouldn't cell_methods indicate that it is "area: point", and you'd simply omit bounds?

JonathanGregory Oct 16, 2024
Maintainer

I agree with Karl. If the data-writer intends the data to be interpreted as located at a point, with no associated cell, along a particular dimension (space, time or anything else), they should indicate it with point for that dimension in cell_methods. Of course, that does not prevent a processing or visualisation program from assuming there is a cell, and imagining some bounds for it.

TomLav Oct 16, 2024
Author

Thanks @taylor13 and @JonathanGregory : this clarifies things for point data (in space or time or whatever).

ChrisBarker-NOAA · 2024-10-16T16:58:43Z

ChrisBarker-NOAA
Oct 16, 2024
Collaborator

Is there a CF-way to indicate that my xc, yc are not points but grid cells, and that these grid cells are contiguous, without explicitly writing their bounds?

I'm a bit confused here -- for a rectangular grid, specifying the bounds, rather than the center points (?) of the cells requires only a single extra value per axis.

If data really is "on the cell", rather than at a point in the center of cell, I would think that the coordinates of the point are less important than the coordinates of the center points.

grid cells are contiguous

contiguous is clear, but where the cell boundaries are defined in not, unless you specify it -- you could assume that the cells are rectangular and the points are in the middle of the cells, and then calculate the bounds, but doesn't it make more sense to do it the other way around?

I guess what I'm saying is that it doesn't seem like a good practice to define a grid by the center points of the cells, rather than by the nodes -- should we be recommending that in CF??

Maybe it's time for us to consider the SGRID standard:

https://sgrid.github.io/sgrid/

it goes further, but, the OP's situation (I think) would be:

location: face
and
padding: none

I'm actually a bit surprised that this simple case isn't already covered in CF -- though perhaps that's what the standard cell bounds is.

2 replies

TomLav Oct 16, 2024
Author

Is there a CF-way to indicate that my xc, yc are not points but grid cells, and that these grid cells are contiguous, without explicitly writing their bounds?

I'm a bit confused here -- for a rectangular grid, specifying the bounds, rather than the center points (?) of the cells requires only a single extra value per axis.

I think not. For a 1D coordinate variable (say, xc of size n), you need a 2D coordinate variable (xc_bounds (n,2)) to encode the :bounds (the vertices of all cells). If you want the bounds of a non-regular lat/lon grid you need lat2d_bounds(m,n,4). This is example 7.2 and 7.3.

The rest of your message I am not sure I understand enough to answer without adding some confusion. The CF convention works fine. I am just worried that the data storage (and interpretation) of the :bounds mechanism is heavy for the (very common) case of contiguous cells whose center point is provided. I wished there was a shortcut to storing the :bounds explicitly.

ChrisBarker-NOAA Oct 16, 2024
Collaborator

I am not sure I understand enough to answer without adding some confusion.

It's probably me that's adding the confusion :-)

But I've always been confused by the cell bounds case in CF. It sure seems to me that for a simple rectangular grid (or not quite as simple logically rectangular curvilinear grid) there is no need for a full multi-dim specification of the cell bounds.

In practice, folks do this all the time in what I thought were CF compliant files, e.g.:

dimensions:
	lat_node = 101 ;
	lon_node = 201 ;
	lat_cell = 100 ;
	lon_cell = 200 ;

variables:

float lat(lat_node) ;
    lat:long_name = "latitude of the nodes" ;
    lat:units = "degrees_north" ;
    lat:standard_name = "latitude" ;

float lon(lon_node) ;
    lon:long_name = "longitude of the nodes" ;
    lon:units = "degrees_east" ;
    lon:standard_name = "longitude" ;

float some_data(lon_cell, lat_cell);
    some_data:long_name = "some data on the cells";    
}

This is all you need to define data on a rectangular grid -- and maybe this is what folks mean then they say that while the bounds are not defined, it's assumed that the bounds are implicitly defined by the nodes.

NOTE: to be CF compliant, I think the above would need a coordinates attribute for the some_data variable, and then you'd need to provide lon and lat for the cells (usually the cell centers, though I've always thought that that was interpreted at the data being at teh cell centers, rather than "on the cells".

If, in fact, CF doesn't actually clearly define a simple way to specify rectangular and logically rectangular grids, perhaps it is time to look at the SGRID convention (https://sgrid.github.io/sgrid/). I had thought that it was needed for the "staggered" case -- but maybe it's helpful for the non staggered case as well.

JonathanGregory · 2024-10-16T19:32:45Z

JonathanGregory
Oct 16, 2024
Maintainer

Dear @TomLav et al.

Thomas wrote that, since it isn't mandatory to provide 2D lat and lon auxiliary coordinate variables when the CRS and non-lat-lon 1D coordinate variables are provided, the same should apply to lat/lon bounds, because an application should be able to compute the bounds directly from the 1D bounds just like it can compute the lat/lon coordinates from the 1D coordinates. I agree, that is true. It's impossible anyway to provide lat/lon bounds if there are no lat/lon coordinate variables to attach them to. I suggest that we should insert a statement in Sec 5.6 "Horizontal Coordinate Reference Systems, Grid Mappings, and Projections" that the grid mapping information can be used to convert bounds as well as coordinates. At the moment it says nothing about bounds. Would that be useful, @TomLav?

Regarding @Armin-RS's point, which Thomas and Karl agreed with, I suggest we should state in Sect 7.1 that the boundaries are straight lines in the space of the 1D coordinates, which means that in general they are not straight lines in lat-lon. If the grid mapping is a continuous transformation, in principle you could approximately trace the curved lines in lat-lon space by joining a lot of points along the straight lines, individually converted from x-y to lat-lon. We could say that as well if it would be helpful.

Regarding the sentence which Ethan quoted from the preamble of Sect 4, I agree with Thomas that it would be useful to repeat it in Sect 7.1. I would say that it's useful in both places.

Thomas's main point is correct. The CF convention that bounds for 1D coordinate variables are dimensioned (n,2) means that in the common case of contiguous cells every bound is duplicated. Each of the (n,m,4) bounds of auxiliary lat-lon coordinate variables on a rectangular x-y grid has four copies. This wasn't an accident, but a deliberate decision at the start of CF. It was debated several times in early years. It is explicitly explained and acknowledged in Sect 7.1:

Note that the boundary variable for a set of N contiguous intervals is an array of shape (N,2). Although in this case there will be a duplication of the boundary coordinates between adjacent intervals, this representation has the advantage that it is general enough to handle, without modification, non-contiguous intervals, as well as intervals on an axis using the unlimited dimension.

Also it allows overlapping cells, for which we have use-cases. There is no CF way to indicate that the cells are contiguous. To find out if they are, you have to examine the bounds. We could provide a convention to indicate it, but that would redundant, and therefore liable to become inconsistent. We have a principle of not doing that in Sect 1.2.

To avoid potential inconsistency within the metadata, the conventions should minimise redundancy.

Although the (N,2) convention is wasteful of space in the common cases, it makes the document and software simpler if we have only one convention. I think there would need to be a very strong reason to add any alternative convention, for the reasons summarised Sect 1.2:

Because all previous versions must generally continue to be supported in software for the sake of archived datasets, and in order to limit the complexity of the conventions, there is a strong preference against introducing any new capability to the conventions when there is already some method that can adequately serve the same purpose (even if a different method would arguably be better than the existing one).

The wastefulness is not great for 2D grids made of 1D coordinates, because the bounds take size of order N, while the data is N^2. It's worse for auxiliary 2D coordinates and their bounds, which are N^2, like the data variables. However, they occupy only the size of two 2D fields, and are often shared by many fields in the file which have the same horizontal grid. Sometimes there's only a small number of data variables, or even only one. If I remember correctly, that was the main reason for changing 2D lat-lon coordinates from mandatory to optional if a grid mapping is supplied.

As you say, @TomLav, sect 8.3 is a new possibility for saving space. Probably not all software supports it, indeed, but that would be even more true of any new convention we introduce now.

I've prepared a PR about sect 7.1 for conventions issue #527. If you like, we could add extra text in there as part of that issue, if any of the above seems helpful.

Best wishes

Jonathan

4 replies

TomLav Oct 16, 2024
Author

Dear @JonathanGregory,

As often (always?) your summary and context are great to conclude discussions and propose actions.

I agree with all you write:

It would be very useful to insert a statement in Sec 5.6 "Horizontal Coordinate Reference Systems, Grid Mappings, and Projections" that the grid mapping information can be used to convert bounds as well as coordinates.
I agree that repeating the sentence from the preamble of Sect 4 into Sect 7.1 would be useful. We might add this to your current PR for Sect 7.1.
I understand and agree that we should not bring new ways of doing the same things. I also see that if user / applications can use the CRS to compute lat/lon bounds, then I can live with adding the xc/yc bounds of my 1D axis.

The only additional idea I would bring is about the sentence from the preamble of Sect 4 (the same we want to repeat in Sect 7.1). It currently reads:

If bounds are not provided, an application might reasonably assume the gridpoints to be at the centers of the cells, but we do not require that in this standard.

I would suggest to change it to (highlighting the text I added in italics):

If bounds are not provided (using the attribute bounds), an application might reasonably assume the gridpoints to be at the centers of the cells and cells to be contiguous, but we do not require that in this standard.

In the case where a data producer does not use bounds, and does not use a cell_method, I think this is what is meant. Then a consumer / application can reasonably assume that the center coordinate is provided, that the axis defines a cell, and that cells are contiguous. This information can be useful to draw cell boundaries or do more accurate reprojections.

But if people do not agree with my addition, it is not critical: I am now convinced to add bounds.

TomLav Oct 17, 2024
Author

After a good night sleep, I will propose a slightly reworded sentence (same meaning):

If bounds are not provided (using attribute bounds), an application might reasonably assume the gridpoints to be at the centers of ~~the~~ cells and the cells to be contiguous, but we do not require that in this standard.

JonathanGregory Oct 17, 2024
Maintainer

Dear @TomLav

Thanks for your positive comments! I agree that, if no bounds are supplied, assuming the gridpoints to be at the centre of the cells also implies that you imagine the cells to be contiguous, and it would be reasonable to say so. I think that's fine, since it's only a suggestion anyway, not a CF recommendation or requirement.

We could also point out that assuming the gridpoints to be at the centre of the cells does not imply that the bounds lie half-way between the gridpoints. That's consistent if the cells are of equal width, but not in general. One situation where this comes up is with ocean layers. For instance, if you have layers of depth 0--10 m, 10--30 m, 30--60 m, of thickness 10 m, 20 m and 30 m, centred gridpoints are at depths of 5 m, 20 m and 45 m. The boundary at 10 m is 5 m below the first gridpoint, and 10 m above the second gridpoint, not half-way between them at 12.5 m.

Best wishes

Jonathan

TomLav Oct 17, 2024
Author

Hello @JonathanGregory, Thanks.

For your 2nd comment: maybe we can add something. but we never wrote that the user / application could assume that the coordinate variable is equally spaced. Actually, we do give all the center points, so why would they jump to this conclusion? I am open for this, but at the same time if a user / applications wants to act on the reasonable assumption then do we also guard him against doing further (unreasonable) assumptions?

We can leave this (the whole thread) open a bit for people to react, then we propose a list of actions? As far as I am concerned, this discussion was very useful and I do not have any further question / suggestion on this topic.

ChrisBarker-NOAA · 2024-10-16T20:49:09Z

ChrisBarker-NOAA
Oct 16, 2024
Collaborator

If bounds are not provided (using the attribute bounds), an application might reasonably assume the gridpoints to be at the centers of the cells and cells to be contiguous, but we do not require that in this standard.

+1 to this.

However, "center of the cells" assumes that the cells are defined, and the user can know how.

It seems we really do need something else ... But I'll maybe raise that as a separate topic.

0 replies

davidhassell · 2024-10-17T12:50:53Z

davidhassell
Oct 17, 2024
Maintainer

If bounds are not provided (using the attribute bounds), an application might reasonably assume the gridpoints to be at the centers of the cells and cells to be contiguous, but we do not require that in this standard.

I'm not so happy with this :) A missing missing bounds variable can not mean that there is an implied bounds variable which wasn't written to the file. What is the method for indicating cells with zero size (i.e. no bounds), if not omitting the bounds variable? I don't think that cell_methods can help us here, as they apply to the quantity defined on the cell, not the cell itself. A method of point means that the quantity is intensive on the cell, not that the cell has zero size.

0 replies

taylor13 · 2024-10-17T13:18:24Z

taylor13
Oct 17, 2024
Collaborator

My understanding of "point" is different from yours. I thought "point" meant literally at a single instant in time or at a particular location (not necessarily representative of any finite area). I didn't think it had any relationship to "cells". For a point measurement there is no need to define a cell.

Now suppose we have a cell_methods with "area: mean", but no bounds. Wouldn't that imply that there must have been bounds, but they were omitted from the file? In that case, I don't think we should advise the user on how to guess the bounds; that probably depends on the circumstance. For example, if there was some notation in the file that the grid defined is "gaussian", then some users would know how to construct the bounds (and, as I recall they aren't half way between the coordinate values). Also, you always have problems at the end points. Then you can't simply construct the bound half way between two coordinate values because there only exists one value.

I think we shouldn't provide any advice about the location of bounds if they are missing.

0 replies

JonathanGregory · 2024-10-20T18:17:08Z

JonathanGregory
Oct 20, 2024
Maintainer

I think the same as Karl, that the cell_methods of point indicates the data applies to a point, and says nothing about the cells. The bounds are not relevant to the interpretation of the data, but they might be there anyway. For example, the grid might be shared by some variables with cell_methods of point, and others with mean, for which bounds are informative.

Omitting the bounds implies nothing about where the bounds might lie, and it doesn't mean the cell has zero size. To indicate a zero-size cell, you would need to provide two coincident bounds for it.

Everyone's comments of the last couple of days persuade me that we shouldn't make suggestions for missing information. In that case, what should we do about the last sentence of the preamble of Sect 4? "If bounds are not provided, an application might reasonably assume the gridpoints to be at the centers of the cells, but we do not require that in this standard." It appears to be a statement about the points, but the points are always located by the coordinates! If it's actually included as an implicit statement about the location of the bounds, it should be in Sect 7.1, as @TomLav said.

However, in view of the previous discussion, I suggest we should delete that sentence in Sect 4, and put text in Sect 7.1 explicitly stating what isn't reasonable to assume, such as:

If cell boundaries are not provided (using the bounds attribute), an application can assume only that each gridpoint lies somewhere within or upon the boundaries of its own cell. Without boundaries, the extent of a cell is not known, nor whether adjacent cells are contiguous, separated by a gap, or overlapping.

1 reply

TomLav Oct 25, 2024
Author

Dear @JonathanGregory,

I agree that it does not seem ideal for a convention to be partly based on "reasonable assumptions", since a data consumer has to guess what the data producer meant when preparing the files. In this example, the absence of bounds can reasonably be assumed to stand for cells (of unknown dimension and shape) whose center points have been provided (the sentence in Sec. 4), or interpreted as points (as by @davidhassell in an earlier post). Both interpretation are reasonable.

If we cannot live with this uncertainty, two solutions: remove the reasonable assumption and clearly state that no assumption can be made (as you suggest above), OR turn the reasonable assumption (that has been there for some time) into an clearly stated default. I foresee there will be strong opposition to defining a default behavior, and maybe it should never be done. But I will nevertheless explore what we (data producers and consumers) would gain with a well-defined default behaviour in the absence of bounds:

This could be something like:

If bounds are not provided (using the attribute bounds), the positions in the coordinate variables are the centers of cells. If the coordinate variable has 2 elements or more, the grid cells are contiguous: the cell boundaries are exactly in between two subsequent values in the coordinate variable. Bounds are required when there is only one cell, or to describe any other cell configuration (e.g. overlapping or disjointed cells, coordinate axis values not at the center of the cell, cell with zero size,...).

In addition to clarifying the situation, such a default behavior would immediately cover the vast majority (my claim) of files prepared with CF, including model fields (on contiguous cells), gridded satellite products, etc... This would greatly simplify the work for data producers and consumer / applications if they can safely infer cells and cell bounds, rather than reading and inferring. The data producer still has the freedom to describe other types of cells (using bounds). Of course other types of observations with discrete geometries such as stations or trajectories will use the featureType mechanism.

Something tells me the above will not be agreeable. But I thought I would write it anyway so that we can capture why it was not possible to create a default behavior around this.

ChrisBarker-NOAA · 2024-10-21T23:18:18Z

ChrisBarker-NOAA
Oct 21, 2024
Collaborator

I really think we need something new here. The fact is that rectangular grid models are very common. And logically rectangular (curvilinear grids) are quite common. So to have CF not clearly define how to specify these is a major gap.

Yes -- the existing cell bounds approach can indeed, describe either of these grid types, but not in a clear or easy way. And honestly, I've NEVER seen anyone use them. So we are left with making assumption, and they are not good assumptions, e.g., from this thread:

We could also point out that assuming the gridpoints to be at the centre of the cells does not imply that the bounds lie half-way between the gridpoints

Exactly -- only providing the center of the cells is not helpful, and if you provide a cell_method, such as mean, then you are really dead in the water if you don't know the cell bounds.

Yes, it's true that:

Note that the boundary variable for a set of N contiguous intervals is an array of shape (N,2). Although in this case there will be a duplication of the boundary coordinates between adjacent intervals, this representation has the advantage that it is general enough to handle, without modification, non-contiguous intervals, as well as intervals on an axis using the unlimited dimension.

And I understand that:

We have a principle of not doing that in Sect 1.2.

To avoid potential inconsistency within the metadata, the conventions should minimise redundancy.

Although the (N,2) convention is wasteful of space in the common cases, it makes the document and software simpler if we have only one convention. I think there would need to be a very strong reason to add any alternative convention, for the reasons summarized Sect

1.2:

Because all previous versions must generally continue to be supported in software for the sake of archived datasets, and in order to limit the complexity of the conventions, there is a strong preference against introducing any new capability to the conventions when there is already some method that can adequately serve the same purpose (even if a different method would arguably be better than the existing one).

I think there is also the principle of "meeting data providers where they are" -- and where model results providoer are is specifying the grid in terms of nodes, etc, and NOT providing bounds.

it makes the document and software simpler if we have only one convention.

In principle, yes, but a convention that is overly complex makes software harder, not simpler, and this is an example of that.

For a rectangular grid -- it can be defined as such:

dimensions:
	lat_node = 101 ;
	lon_node = 201 ;
	lat_cell = 100 ;
	lon_cell = 200 ;

variables:

float lat(lat_node) ;
    lat:long_name = "latitude of the nodes" ;
    lat:units = "degrees_north" ;
    lat:standard_name = "latitude" ;

float lon(lon_node) ;
    lon:long_name = "longitude of the nodes" ;
    lon:units = "degrees_east" ;
    lon:standard_name = "longitude" ;

float some_data(lon_cell, lat_cell);
    some_data:long_name = "some data on the cells";    
}

And this is, in fact, how I've seen every rectangular grid model I've used described. So I think we should codify it.

Although the (N,2) convention is wasteful of space in the common cases

I don't think that should be so lightly dismissed -- not so much the wasted space, but the "common cases" -- yes, redundancy is bad, but so is complexity -- I think it's perfectly reasonable to make the simple things simple, and the complex things possible -- with more than one standard -- use the simple one for the simple (and more common) case, and use the complex one where you need it.

Anyway, space considerations are the least of it:

There is no CF way to indicate that the cells are contiguous. To find out if they are, you have to examine the bounds.

Exactly -- when working with these data, it is VERY helpful to know that the cells are contiguous, and regular (or not). In fact, I could bet almost all software that deals with these grid special-cases the rectangular case (so that, for example, cell location is trivial: index = x - start_x / delta_x, etc., and probably doesn't even support the "general" case. That is, software that can handle arbitrary cells that may or may not overlap, and may or may not be contiguous, and may be in any order in the file is rare indeed.

So proper software that wanted to work with gridded data in a "CF correct" way would need to:

Examine all the cell bounds, determine whether or not the cells are rectangular, or curvilinear, and if they are contiguous, and put them in the correct order. And then reject the file if it doesn't work.

And all this to then possibly reject the data as unusable, and/or have errors due to coordinates not exactly matching due to floating point errors.

I would bet that there is exactly zero software that does this -- I sure know that mine doesn't.

A key point is that:

"these two cells happen to share coordinate values to some precision"

and

"these two cells are known to be adjacent" aren't exactly the same thing. Even if they were, then having to write and run a bunch of code to figure out the structure of the grid is just asking too much.

Anyway -- for your consideration:

https://sgrid.github.io/sgrid/

I note that that doesn't actually cover the simpler, grid-aligned rectangular grids -- we may want to capture that too.

Should I start a new discussion for this?

0 replies

JonathanGregory · 2024-10-29T22:55:06Z

JonathanGregory
Oct 29, 2024
Maintainer

Dear @TomLav

Thanks for setting out the question: for clarity, either we should state a default, or we should state there is no default. I don't remember why the "reasonable assumption" was included. Possibly it's related to a question that has been asked by data-writers who are interested in cells and bounds but not points e.g. for an interval of rainfall accumulation, what should the coordinate be, if really only the bounds matter? The answer has been, putting it in the middle is a reasonable choice.

If the points are evenly spaced, it's indeed usually the case that the points are exactly in the middle of the cells. Also, it is usual for the bounds to be exactly half-way between the points. That is, even spacing usually goes with equal-sized cells. It is the latter property which tells you the bounds, given the points; it's only for working out the outermost bounds that you need to know that the points are in the middle of the cells.

If the points are not evenly spaced, saying that the cells are contiguous and the points are in the middle of the cells is not enough information to work out the bounds, because there are n points but n+1 bounds. For instance, consider the points (1,2,4). One possible set of bounds is [0.5,1.5], [1.5,2.5], [2.5,5.5]]. Two others are [0.6,1.4], [1.4,2.6], [2.6,5.4] and [0.0,2.0], [2.0,2.0], [2.0,6.0]. In general the bounds could be [1-x,1+x], [1+x,3-x], [3-x,5+x], for 0<x<1, and in all cases the points are in the middle of the cells. If there are only two points it's not meaningful to say they are "evenly spaced" but you can assume the cells are the same size and hence work out what size. If there is just one point, there is obviously no way to work out the bounds.

Therefore implicit bounds for contiguous cells would only work for equal-size cells when n>1. That may describe the majority of gridded data, but there's a substantial minority which have unevenly spaced points and cells of unequal size.

You anticipated that there would be opposition to introducing a convention for implicit bounds. (I realise that you weren't proposing it; you're just asking the question.) I wouldn't be in favour of it, for these reasons:

We haven't had this convention up to now. This would be a problem for the principle, "Because many datasets remain in use for a long time after production, it is desirable that metadata written according to previous versions of the convention should also be compliant with and have the same interpretation under later versions." We wouldn't be entitled to assume that previously written data had implicit bounds, but it would certainly be assumed by some data-users.
The existing convention, of requiring bounds to be explicit if they are wanted, is adequate for all situations, including equal-sized cells. On principle, "there is a strong preference against introducing any new capability to the conventions when there is already some method that can adequately serve the same purpose (even if a different method would arguably be better than the existing one)," because it makes the convention more complex. The latter point, about complexity, would apply even if we were starting with a blank sheet of paper and no existing CF data. Why have two conventions when one is enough? There's no use-case which requires this change.
Furthermore, having two conventions would make software more complicated for use of the data, since both methods would need to be supported. Also, the implicit-bounds convention is more complicated for software to implement. It would make less work for some data-writers (those who only write data with equal-sized cells), but more work for most data-users, who are the majority.

Therefore I'd prefer us to state that there is no default.

Best wishes

Jonathan

0 replies

ChrisBarker-NOAA · 2024-10-29T23:15:51Z

ChrisBarker-NOAA
Oct 29, 2024
Collaborator

Therefore I'd prefer us to state that there is no default.

+1 from me -- a "default" or "implication" is a Bad Idea.

If you want cells, you need to define the bounds. Simple as that.

Is there a consensus to remove the statement?:

If bounds are not provided, an application might reasonably assume the gridpoints to be at the centers of the cells, but we do not require that in this standard.

"reasonable assumptions" are not great in metadata convention

And in this case, as has been discussed, the bounds of the cells at the edges are not defined, and it might not be clear what they should be -- even if you assume points at the center of cells.

Note: this isn't hypotheses -- at least for curvilinear quadrilateral cells, I'm struggled with defining bounds when they were not provided, in regions of high curvature.

0 replies

taylor13 · 2024-10-29T23:20:21Z

taylor13
Oct 29, 2024
Collaborator

I too vote for no default, and for removing the statement. Anybody who wants to make up bounds could come up with this "reasonable assumption"; we don't have to suggest it in the convention.

0 replies

TomLav · 2024-10-30T08:27:30Z

TomLav
Oct 30, 2024
Author

Dear @JonathanGregory, dear colleagues,

Thank you for the response to my question. I feel we went to the bottom of things with this discussion.

Therefore I'd prefer us to state that there is no default.

+1 from me. We should remove the sentence from Section 4, and add a sentence in Section 7.

You suggested the following for Section 7 earlier, I think it does the job.

If cell boundaries are not provided (using the bounds attribute), an application can assume only that each gridpoint lies somewhere within or upon the boundaries of its own cell. Without boundaries, the extent of a cell is not known, nor whether adjacent cells are contiguous, separated by a gap, or overlapping.

You mentioned earlier your PR for clarifying section 7. I have no preference if the change above would be part of your existing PR or if we should open a new one (since the change touches two section, and removes a reasonable assumption that others might have taken for granted. What do you say, Jonathan and others?

Still, let us remember that we progressed on two other aspect in this thread:

In Section 7.1, Jonathan suggested that we would state "that the boundaries are straight lines in the space of the 1D coordinates, which means that in general they are not straight lines in lat-lon". And possibly that "If the grid mapping is a continuous transformation, in principle you could approximately trace the curved lines in lat-lon space by joining a lot of points along the straight lines, individually converted from x-y to lat-lon". We could say that as well if it would be helpful. Would that go in your existing Sect 7 PR, Jonathan?
Since it isn't mandatory to provide 2D lat and lon auxiliary coordinate variables when the CRS and non-lat-lon 1D coordinate variables are provided, the same should apply to lat/lon bounds, because an application should be able to compute the bounds directly from the 1D bounds just like it can compute the lat/lon coordinates from the 1D coordinates. It's impossible anyway to provide lat/lon bounds if there are no lat/lon coordinate variables to attach them to. [I suggest that] We should insert a statement in Sec 5.6 "Horizontal Coordinate Reference Systems, Grid Mappings, and Projections" that the grid mapping information can be used to convert bounds as well as coordinates. At the moment it says nothing about bounds. Should this then be a new PR? If yes, I could make it happen (never done that, but no reason I can't find it out)

Finally, @ChrisBarker-NOAA mentioned the SGRID convention twice (https://sgrid.github.io/sgrid/). But to me, it sounds like a whole new discussion thread should be started to dicuss possible adoption of SGRID or subsets of it for CF. I could join that new discussion, once we've closed this one.

0 replies

davidhassell · 2024-10-30T08:34:28Z

davidhassell
Oct 30, 2024
Maintainer

I support "no default", too.

0 replies

JonathanGregory · 2024-10-30T14:59:04Z

JonathanGregory
Oct 30, 2024
Maintainer

Dear @TomLav

Apparently there is agreement so far that it would good to remove the statement about the reasonable assumption in sect 4 and add an explicit statement of no default in 7.1. This is something between correcting a defect and making an improvement. I'm happy to add it to my conventions issue #527, which is similarly about clarifying the intention in 7.1. Does that seem appropriate?

On the other hand, the suggested improvements in 7.1 and 5.6 regarding curvilinear lat-lon bounds are a somewhat different matter, I feel. If you'd like to start a conventions issue to propose that change, please go ahead! We might overlap in 7.1, but I'm sure we can negotiate amicably.

We open issues for any change except the most trivial (like a typo), to agree on what should be done. There's a template for enhancement issues, which is supplied when you open one here. (Looking at that menu, I notice that it needs tidying up, since the introduction of discussions. I'll do that.) The issues are useful as a record of our detailed discussions, and they're indexed by the history at the end of the document. I think it generally works well to discuss the words in the issue, especially when it affects various paragraphs, possibly in different sections, because it's helpful to see them all together, but that depends on the subject and the proposer. A PR can be started and linked to the issue at any time when it seems useful to do so. One is required in the end, but not at the beginning! In another issue, I wrote a recipe for doing a PR, in case it's helpful.

Best wishes and thanks

Jonathan

0 replies

ChrisBarker-NOAA · 2024-10-31T16:48:54Z

ChrisBarker-NOAA
Oct 31, 2024
Collaborator

Finally, @ChrisBarker-NOAA mentioned the SGRID convention twice (https://sgrid.github.io/sgrid/). But to me, it sounds like a whole new discussion thread should be started to discuss possible adoption of SGRID or subsets of it for CF.

Agreed.

I could join that new discussion, once we've closed this one.

Thanks!

In the meantime, I've found that there is confusion (at least to me) on how CF handles even simple rectangular grids.

So I've started writing up some notes on that:

https://github.com/ChrisBarker-NOAA/CF_conventions_notes/blob/main/rect_grid/rectangular_grids_in_CF.md

(I've only just started - and I'm not sure when I'll have time to move it forward, but I more than welcome help!)

The goal(s) of that document is to:

Document how to specify rectangular grids in CF, and how to use a CF compliant file to work with rectangular grids.
Determine, after documenting that, if CF is complete and robust for this use case

If the answer to (2) is not quite, then perhaps I'll come up with a proposal

Again, I welcome help early and often, but otherwise, once I sure I'll be reaching out here for review / help!

NOTE: That proposal may be SGRID -- to goes farther than the basics, but does cover the basics as well, so maybe that's the way forward -- but I want to start at the bottom for now.

NOTE 2: I think this kind of document -- call it a "How To" or a "Best Practices" -- would be great to have as a more formal part of the CF documentation -- maybe we can move that forward some day.

BTW: SGRID is not my work -- it was developed by others, but I've found it useful in my work.

0 replies

JonathanGregory · 2024-11-06T23:06:03Z

JonathanGregory
Nov 6, 2024
Maintainer

Following the above comment

Apparently there is agreement so far that it would good to remove the statement about the reasonable assumption in sect 4 and add an explicit statement of no default in 7.1. This is something between correcting a defect and making an improvement.

I have added these two changes to the PR of conventions issue 527.

Please could you review, comment and (I hope) suppprt the changes proposed there, so that they can be included in CF 1.12, those who are interested e.g. @TomLav @taylor13? Thanks.

0 replies

CF Conventions

implicit bounds for contiguous cells #380

TomLav Oct 14, 2024

Question

Replies: 17 comments · 14 replies

Armin-RS Oct 15, 2024

TomLav Oct 15, 2024 Author

taylor13 Oct 15, 2024 Collaborator

ethanrd Oct 15, 2024 Maintainer

davidhassell Oct 16, 2024 Maintainer

TomLav Oct 16, 2024 Author

taylor13 Oct 16, 2024 Collaborator

JonathanGregory Oct 16, 2024 Maintainer

TomLav Oct 16, 2024 Author

ChrisBarker-NOAA Oct 16, 2024 Collaborator

TomLav Oct 16, 2024 Author

ChrisBarker-NOAA Oct 16, 2024 Collaborator

JonathanGregory Oct 16, 2024 Maintainer

TomLav Oct 16, 2024 Author

TomLav Oct 17, 2024 Author

JonathanGregory Oct 17, 2024 Maintainer

TomLav Oct 17, 2024 Author

ChrisBarker-NOAA Oct 16, 2024 Collaborator

davidhassell Oct 17, 2024 Maintainer

taylor13 Oct 17, 2024 Collaborator

JonathanGregory Oct 20, 2024 Maintainer

TomLav Oct 25, 2024 Author

ChrisBarker-NOAA Oct 21, 2024 Collaborator

JonathanGregory Oct 29, 2024 Maintainer

ChrisBarker-NOAA Oct 29, 2024 Collaborator

taylor13 Oct 29, 2024 Collaborator

TomLav Oct 30, 2024 Author

davidhassell Oct 30, 2024 Maintainer

JonathanGregory Oct 30, 2024 Maintainer

ChrisBarker-NOAA Oct 31, 2024 Collaborator

JonathanGregory Nov 6, 2024 Maintainer

TomLav
Oct 14, 2024

Replies: 17 comments 14 replies

Armin-RS
Oct 15, 2024

TomLav Oct 15, 2024
Author

taylor13 Oct 15, 2024
Collaborator

ethanrd
Oct 15, 2024
Maintainer

davidhassell Oct 16, 2024
Maintainer

TomLav Oct 16, 2024
Author

taylor13 Oct 16, 2024
Collaborator

JonathanGregory Oct 16, 2024
Maintainer

TomLav Oct 16, 2024
Author

ChrisBarker-NOAA
Oct 16, 2024
Collaborator

TomLav Oct 16, 2024
Author

ChrisBarker-NOAA Oct 16, 2024
Collaborator

JonathanGregory
Oct 16, 2024
Maintainer

TomLav Oct 16, 2024
Author

TomLav Oct 17, 2024
Author

JonathanGregory Oct 17, 2024
Maintainer

TomLav Oct 17, 2024
Author

ChrisBarker-NOAA
Oct 16, 2024
Collaborator

davidhassell
Oct 17, 2024
Maintainer

taylor13
Oct 17, 2024
Collaborator

JonathanGregory
Oct 20, 2024
Maintainer

TomLav Oct 25, 2024
Author

ChrisBarker-NOAA
Oct 21, 2024
Collaborator

JonathanGregory
Oct 29, 2024
Maintainer

ChrisBarker-NOAA
Oct 29, 2024
Collaborator

taylor13
Oct 29, 2024
Collaborator

TomLav
Oct 30, 2024
Author

davidhassell
Oct 30, 2024
Maintainer

JonathanGregory
Oct 30, 2024
Maintainer

ChrisBarker-NOAA
Oct 31, 2024
Collaborator

JonathanGregory
Nov 6, 2024
Maintainer