implicit bounds for contiguous cells #380
Replies: 17 comments 14 replies
-
Hi @TomLav, But I'm wondering how you could have both x/y-bounds and lat-lon bounds? Because in my understanding in most cases, a rectangular grid cell in x/y-coordinate space will be a distorted cell in lat-lon coordinate space where the "width" of the lat-lon cell is different on the southern side of the box compared to the northern side. And where the sides of the cell are not straight lines in lat-lon coordinate space. |
Beta Was this translation helpful? Give feedback.
-
The last paragraph of the Section 4 "Coordinate Types" intro, just before Section 4.1 "Latitude Coordinate", ends with this sentence:
I think a lot of generic applications assume this. (I often forget it isn't actually a required part of the standard.) So I think the answer to your first question is that there isn't a CF way to indicate this but a lot of tools make that assumption. |
Beta Was this translation helpful? Give feedback.
-
I'm a bit confused here -- for a rectangular grid, specifying the bounds, rather than the center points (?) of the cells requires only a single extra value per axis. If data really is "on the cell", rather than at a point in the center of cell, I would think that the coordinates of the point are less important than the coordinates of the center points.
contiguous is clear, but where the cell boundaries are defined in not, unless you specify it -- you could assume that the cells are rectangular and the points are in the middle of the cells, and then calculate the bounds, but doesn't it make more sense to do it the other way around? I guess what I'm saying is that it doesn't seem like a good practice to define a grid by the center points of the cells, rather than by the nodes -- should we be recommending that in CF?? Maybe it's time for us to consider the SGRID standard: https://sgrid.github.io/sgrid/ it goes further, but, the OP's situation (I think) would be:
I'm actually a bit surprised that this simple case isn't already covered in CF -- though perhaps that's what the standard cell bounds is. |
Beta Was this translation helpful? Give feedback.
-
Dear @TomLav et al. Thomas wrote that, since it isn't mandatory to provide 2D lat and lon auxiliary coordinate variables when the CRS and non-lat-lon 1D coordinate variables are provided, the same should apply to lat/lon bounds, because an application should be able to compute the bounds directly from the 1D bounds just like it can compute the lat/lon coordinates from the 1D coordinates. I agree, that is true. It's impossible anyway to provide lat/lon bounds if there are no lat/lon coordinate variables to attach them to. I suggest that we should insert a statement in Sec 5.6 "Horizontal Coordinate Reference Systems, Grid Mappings, and Projections" that the grid mapping information can be used to convert bounds as well as coordinates. At the moment it says nothing about bounds. Would that be useful, @TomLav? Regarding @Armin-RS's point, which Thomas and Karl agreed with, I suggest we should state in Sect 7.1 that the boundaries are straight lines in the space of the 1D coordinates, which means that in general they are not straight lines in lat-lon. If the grid mapping is a continuous transformation, in principle you could approximately trace the curved lines in lat-lon space by joining a lot of points along the straight lines, individually converted from x-y to lat-lon. We could say that as well if it would be helpful. Regarding the sentence which Ethan quoted from the preamble of Sect 4, I agree with Thomas that it would be useful to repeat it in Sect 7.1. I would say that it's useful in both places. Thomas's main point is correct. The CF convention that bounds for 1D coordinate variables are dimensioned
Also it allows overlapping cells, for which we have use-cases. There is no CF way to indicate that the cells are contiguous. To find out if they are, you have to examine the bounds. We could provide a convention to indicate it, but that would redundant, and therefore liable to become inconsistent. We have a principle of not doing that in Sect 1.2.
Although the
The wastefulness is not great for 2D grids made of 1D coordinates, because the bounds take size of order N, while the data is N^2. It's worse for auxiliary 2D coordinates and their bounds, which are N^2, like the data variables. However, they occupy only the size of two 2D fields, and are often shared by many fields in the file which have the same horizontal grid. Sometimes there's only a small number of data variables, or even only one. If I remember correctly, that was the main reason for changing 2D lat-lon coordinates from mandatory to optional if a grid mapping is supplied. As you say, @TomLav, sect 8.3 is a new possibility for saving space. Probably not all software supports it, indeed, but that would be even more true of any new convention we introduce now. I've prepared a PR about sect 7.1 for conventions issue #527. If you like, we could add extra text in there as part of that issue, if any of the above seems helpful. Best wishes Jonathan |
Beta Was this translation helpful? Give feedback.
-
+1 to this. However, "center of the cells" assumes that the cells are defined, and the user can know how. It seems we really do need something else ... But I'll maybe raise that as a separate topic. |
Beta Was this translation helpful? Give feedback.
-
I'm not so happy with this :) A missing missing bounds variable can not mean that there is an implied bounds variable which wasn't written to the file. What is the method for indicating cells with zero size (i.e. no bounds), if not omitting the bounds variable? I don't think that |
Beta Was this translation helpful? Give feedback.
-
My understanding of "point" is different from yours. I thought "point" meant literally at a single instant in time or at a particular location (not necessarily representative of any finite area). I didn't think it had any relationship to "cells". For a point measurement there is no need to define a cell. Now suppose we have a cell_methods with "area: mean", but no bounds. Wouldn't that imply that there must have been bounds, but they were omitted from the file? In that case, I don't think we should advise the user on how to guess the bounds; that probably depends on the circumstance. For example, if there was some notation in the file that the grid defined is "gaussian", then some users would know how to construct the bounds (and, as I recall they aren't half way between the coordinate values). Also, you always have problems at the end points. Then you can't simply construct the bound half way between two coordinate values because there only exists one value. I think we shouldn't provide any advice about the location of bounds if they are missing. |
Beta Was this translation helpful? Give feedback.
-
I think the same as Karl, that the Omitting the Everyone's comments of the last couple of days persuade me that we shouldn't make suggestions for missing information. In that case, what should we do about the last sentence of the preamble of Sect 4? "If bounds are not provided, an application might reasonably assume the gridpoints to be at the centers of the cells, but we do not require that in this standard." It appears to be a statement about the points, but the points are always located by the coordinates! If it's actually included as an implicit statement about the location of the bounds, it should be in Sect 7.1, as @TomLav said. However, in view of the previous discussion, I suggest we should delete that sentence in Sect 4, and put text in Sect 7.1 explicitly stating what isn't reasonable to assume, such as:
|
Beta Was this translation helpful? Give feedback.
-
I really think we need something new here. The fact is that rectangular grid models are very common. And logically rectangular (curvilinear grids) are quite common. So to have CF not clearly define how to specify these is a major gap. Yes -- the existing cell bounds approach can indeed, describe either of these grid types, but not in a clear or easy way. And honestly, I've NEVER seen anyone use them. So we are left with making assumption, and they are not good assumptions, e.g., from this thread:
Exactly -- only providing the center of the cells is not helpful, and if you provide a cell_method, such as mean, then you are really dead in the water if you don't know the cell bounds. Yes, it's true that:
And I understand that:
I think there is also the principle of "meeting data providers where they are" -- and where model results providoer are is specifying the grid in terms of nodes, etc, and NOT providing bounds.
In principle, yes, but a convention that is overly complex makes software harder, not simpler, and this is an example of that. For a rectangular grid -- it can be defined as such:
And this is, in fact, how I've seen every rectangular grid model I've used described. So I think we should codify it.
I don't think that should be so lightly dismissed -- not so much the wasted space, but the "common cases" -- yes, redundancy is bad, but so is complexity -- I think it's perfectly reasonable to make the simple things simple, and the complex things possible -- with more than one standard -- use the simple one for the simple (and more common) case, and use the complex one where you need it. Anyway, space considerations are the least of it:
Exactly -- when working with these data, it is VERY helpful to know that the cells are contiguous, and regular (or not). In fact, I could bet almost all software that deals with these grid special-cases the rectangular case (so that, for example, cell location is trivial: So proper software that wanted to work with gridded data in a "CF correct" way would need to: Examine all the cell bounds, determine whether or not the cells are rectangular, or curvilinear, and if they are contiguous, and put them in the correct order. And then reject the file if it doesn't work. And all this to then possibly reject the data as unusable, and/or have errors due to coordinates not exactly matching due to floating point errors. I would bet that there is exactly zero software that does this -- I sure know that mine doesn't. A key point is that: "these two cells happen to share coordinate values to some precision" and "these two cells are known to be adjacent" aren't exactly the same thing. Even if they were, then having to write and run a bunch of code to figure out the structure of the grid is just asking too much. Anyway -- for your consideration: https://sgrid.github.io/sgrid/ I note that that doesn't actually cover the simpler, grid-aligned rectangular grids -- we may want to capture that too. Should I start a new discussion for this? |
Beta Was this translation helpful? Give feedback.
-
Dear @TomLav Thanks for setting out the question: for clarity, either we should state a default, or we should state there is no default. I don't remember why the "reasonable assumption" was included. Possibly it's related to a question that has been asked by data-writers who are interested in cells and bounds but not points e.g. for an interval of rainfall accumulation, what should the coordinate be, if really only the bounds matter? The answer has been, putting it in the middle is a reasonable choice. If the points are evenly spaced, it's indeed usually the case that the points are exactly in the middle of the cells. Also, it is usual for the bounds to be exactly half-way between the points. That is, even spacing usually goes with equal-sized cells. It is the latter property which tells you the bounds, given the points; it's only for working out the outermost bounds that you need to know that the points are in the middle of the cells. If the points are not evenly spaced, saying that the cells are contiguous and the points are in the middle of the cells is not enough information to work out the bounds, because there are n points but n+1 bounds. For instance, consider the points (1,2,4). One possible set of bounds is [0.5,1.5], [1.5,2.5], [2.5,5.5]]. Two others are [0.6,1.4], [1.4,2.6], [2.6,5.4] and [0.0,2.0], [2.0,2.0], [2.0,6.0]. In general the bounds could be [1-x,1+x], [1+x,3-x], [3-x,5+x], for 0<x<1, and in all cases the points are in the middle of the cells. If there are only two points it's not meaningful to say they are "evenly spaced" but you can assume the cells are the same size and hence work out what size. If there is just one point, there is obviously no way to work out the bounds. Therefore implicit bounds for contiguous cells would only work for equal-size cells when n>1. That may describe the majority of gridded data, but there's a substantial minority which have unevenly spaced points and cells of unequal size. You anticipated that there would be opposition to introducing a convention for implicit bounds. (I realise that you weren't proposing it; you're just asking the question.) I wouldn't be in favour of it, for these reasons:
Therefore I'd prefer us to state that there is no default. Best wishes Jonathan |
Beta Was this translation helpful? Give feedback.
-
+1 from me -- a "default" or "implication" is a Bad Idea. If you want cells, you need to define the bounds. Simple as that. Is there a consensus to remove the statement?:
"reasonable assumptions" are not great in metadata convention And in this case, as has been discussed, the bounds of the cells at the edges are not defined, and it might not be clear what they should be -- even if you assume points at the center of cells. Note: this isn't hypotheses -- at least for curvilinear quadrilateral cells, I'm struggled with defining bounds when they were not provided, in regions of high curvature. |
Beta Was this translation helpful? Give feedback.
-
I too vote for no default, and for removing the statement. Anybody who wants to make up bounds could come up with this "reasonable assumption"; we don't have to suggest it in the convention. |
Beta Was this translation helpful? Give feedback.
-
Dear @JonathanGregory, dear colleagues, Thank you for the response to my question. I feel we went to the bottom of things with this discussion.
+1 from me. We should remove the sentence from Section 4, and add a sentence in Section 7. You suggested the following for Section 7 earlier, I think it does the job.
You mentioned earlier your PR for clarifying section 7. I have no preference if the change above would be part of your existing PR or if we should open a new one (since the change touches two section, and removes a reasonable assumption that others might have taken for granted. What do you say, Jonathan and others? Still, let us remember that we progressed on two other aspect in this thread:
Finally, @ChrisBarker-NOAA mentioned the SGRID convention twice (https://sgrid.github.io/sgrid/). But to me, it sounds like a whole new discussion thread should be started to dicuss possible adoption of SGRID or subsets of it for CF. I could join that new discussion, once we've closed this one. |
Beta Was this translation helpful? Give feedback.
-
I support "no default", too. |
Beta Was this translation helpful? Give feedback.
-
Dear @TomLav Apparently there is agreement so far that it would good to remove the statement about the reasonable assumption in sect 4 and add an explicit statement of no default in 7.1. This is something between correcting a defect and making an improvement. I'm happy to add it to my conventions issue #527, which is similarly about clarifying the intention in 7.1. Does that seem appropriate? On the other hand, the suggested improvements in 7.1 and 5.6 regarding curvilinear lat-lon bounds are a somewhat different matter, I feel. If you'd like to start a conventions issue to propose that change, please go ahead! We might overlap in 7.1, but I'm sure we can negotiate amicably. We open issues for any change except the most trivial (like a typo), to agree on what should be done. There's a template for enhancement issues, which is supplied when you open one here. (Looking at that menu, I notice that it needs tidying up, since the introduction of discussions. I'll do that.) The issues are useful as a record of our detailed discussions, and they're indexed by the history at the end of the document. I think it generally works well to discuss the words in the issue, especially when it affects various paragraphs, possibly in different sections, because it's helpful to see them all together, but that depends on the subject and the proposer. A PR can be started and linked to the issue at any time when it seems useful to do so. One is required in the end, but not at the beginning! In another issue, I wrote a recipe for doing a PR, in case it's helpful. Best wishes and thanks Jonathan |
Beta Was this translation helpful? Give feedback.
-
Agreed.
Thanks! In the meantime, I've found that there is confusion (at least to me) on how CF handles even simple rectangular grids. So I've started writing up some notes on that: (I've only just started - and I'm not sure when I'll have time to move it forward, but I more than welcome help!) The goal(s) of that document is to:
If the answer to (2) is not quite, then perhaps I'll come up with a proposal Again, I welcome help early and often, but otherwise, once I sure I'll be reaching out here for review / help! NOTE: That proposal may be SGRID -- to goes farther than the basics, but does cover the basics as well, so maybe that's the way forward -- but I want to start at the bottom for now. NOTE 2: I think this kind of document -- call it a "How To" or a "Best Practices" -- would be great to have as a more formal part of the CF documentation -- maybe we can move that forward some day. BTW: SGRID is not my work -- it was developed by others, but I've found it useful in my work. |
Beta Was this translation helpful? Give feedback.
-
Following the above comment
I have added these two changes to the PR of conventions issue 527. Please could you review, comment and (I hope) suppprt the changes proposed there, so that they can be included in CF 1.12, those who are interested e.g. @TomLav @taylor13? Thanks. |
Beta Was this translation helpful? Give feedback.
-
Question
Hello,
I am aware of the
grid_mapping
mechanism to map from (regularly spaced)xc
,yc
coordinates tolat
,lon
using a Coordinate Reference System. This is covered in 5.6.I am also aware of the mechanism with
bounds
to describe that a value is not representative of a point, but of a grid cell. This is covered in 7.1I am worried that fully describing the bounds of my grid cells, both in
xc
/yc
and inlat
/lon
will require a lot of storage space if I must explicitely have both the 1Dxc
,yc
, the 2Dlat
,lon
, the associatedxc_bounds(xc, 2)
,yc_bounds(yc, 2)
and the associatedlat_bounds(xc, yc, 4)
andlon_bounds(xc, yc, 4)
.I thus have the following questions:
xc
,yc
are not points but grid cells, and that these grid cells are contiguous, without explicitly writing their bounds?lat
andlon
variables when the CRS andxc
andyc
are provided. Can this be extended to the lat/lon bounds when the CRS andxc_bounds
andyc_bounds
are provided? An application should be able to compute the bounds directly from the 1D bounds just like it can compute the lat,lon from xc,yc?Combining the two ideas above, I am wondering if we could design a simple scheme where all the bounds can be computed by an application assuming that the 1D xc and yc values are the centers of contiguous cells (and providing the CRS).
I am aware of chapter 8.3 that describes Compression by Coordinate Sub-sampling, that handles coordinates and bounds interpolation from tiepoints. It could cover all the aspects above, but I am maybe worried that it is not yet supported by most software.
Thank you for your guidance,
Beta Was this translation helpful? Give feedback.
All reactions