-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification of cell_methods #414
Comments
Karl, since you proposed it, I am happy to do the moderation! |
Thanks @bnlawrence ! |
I just noticed that I omitted my suggested revisions to section 7.3.1. I'll add those shortly. |
Perhaps we should also mention in the renumbered section 7.3.2 that the default area-weighting applied in this case refers to the area of the portion of the cell indicated by the "where" directive. |
About 5 months ago, I got started trying to create a pull request for the above proposed changes. Not knowing what I was doing, I created https://github.com/cf-convention/cf-conventions/blob/taylor13cell_methods_edits/ch07.adoc . I think @davidhassell gave me some advice offline on how to do things correctly and how to proceed, but I think I've lost that email. I need advice on how to proceed and get the proposed changes implemented. I think the revised version of the cell_methods section pointed to earlier in this paragraph reflects the text above but may need some further editing to clean up the way other portions of the conventions document are referenced. As already noted, these changes were introduced to try to clarify how exactly to interpret cell_methods for the purpose of clearly defining variables requested as part of CMIP. I would really like to see this happen before the end of summer or sooner since they are needed for CMIP7, with work on the data request well underway. |
Dear Karl @taylor13 Thanks for your proposal. It is regrettable that it hasn't received any substantive comments up to now. For me, the reason for not commenting is that it's rather a large proposal, and there have always been smaller issues to be considered. Maybe this reason applies to others as well. I suggest that this difficulty could be mitigated by splitting up the issue into smaller pieces. At the end of your first contribution, you've helpfully set out a summary in five points. These could logically constitute five separate issues, which are of different sorts, and some could be resolved more quickly than others. If they're all discussed in one issue, I suspect that multiple threads of discussion will become entangled. Of your five, I agree with you that the first is the largest and substantial change, and perhaps this is the one which you would like to see most urgently.
At the moment, as you say, there is no information in
Therefore, rather than defining defaults, I think we should introduce new syntax for indicating the weights explicitly. If no weighting was indicated, it would mean the same as now i.e. undefined. The syntax could be e.g. "name Best wishes Jonathan |
Thanks, Jonathan, for your input on how weighting might be included without violating principle 9. We would want to consider whether to include it within the parentheses (the way we include "interval:") or whether it would follow directly the "where" directive. Also, we need to think about what "key words" would be needed and the procedure for expanding the list if need be (e.g., "weighted_by mass" might not be specific enough; might need "weighted_by mass_of_snow", or "weighted_by mass_of_seaice", etc.) You are right that it is clearly specifying the weights that is highest priority, although summary point 4 is also particularly urgent and doesn't require any major revisions. Thanks for responding and suggesting we consider the 5 summary points one at a time in hopes of provoking additional input. |
I think breaking this up into multiple issues is an excellent idea, and will likely speed things up. If we do go that route, there is another (slightly tangential) issue to consider, and that is the difference between cell methods and cell output frequency. The latter appears nowhere and CF and needs to inferred by examining the time coordinate - and so in CMOR and XIOS there are other ways introduced to guide the user (eg. 3hr in the filename, or interval-write in the attributes). This is intimately related to the interval discussion above. It would be good to be clear about that relationship in whichever sub issue picks this up. |
I agree that breaking up this issue would be helpful (maybe even necessary to make it more manageable). I was just reading section 7.3.2 Recording the spacing of the original data and other information when Bryan's comment popped in. And I totally that if/when we now are dealing with this part of CF, the cell output frequency should be considered. In 7.3.2 the following appears
If I understand these two examples correctly they are to be interpreted as Now, for the standard deviation calculation the daily/annual data is the input data, but is that to be interpreted as the "original data values"? In particular, I think that this would be problematic in relation to that the default interpretation depends on whether it is an intensive or extensive variable. E.g. what does it mean if it is an intensive quantity from a model? Tentatively this could be clarified in the two examples above by rewriting them as So, sum up, I think that it is necessary to
The cell methods construct is already now complex and difficult to understand and interpret, which means that we have to be careful and keep different user communities' needs in mind when making changes. |
I have created a new issue 447 for discussion of Karl @taylor13's first point, about weighting. I hope it's OK to continue with discussion of that point in #447 rather than here. |
Moderator:
@bnlawrence
Last updated:
2022-11-22 (initiated proposal)
Requirement Summary
Current description of cell_methods attribute is unclear and sometimes less definitive than it could be. Changes are proposed to remedy this.
Technical Proposal Summary
Rewording of text of conventions is proposed to provide better guidance on how cell_methods should be defined.
Benefits:
Those writing and reading CF-compliant data will have clearer guidance and more definitive rules for interpreting the cell_methods.
Status Quo:
???
Associated pull request:
None yet.
Detailed Proposal
For more than 15 years now I have had trouble understanding exactly how to define
cell_methods
that correctly describe variables included in the CMIP request for model output. I have been recently reviewing the variables defined for CMIP6, in preparation for a possible CMIP7. Again, I'm not sure if we've definedcell_methods
consistent with the intentions of those requesting the variables. I suspect others have also had difficulties correctly defining theircell_methods
. Below I suggest specific rewording of the CF conventions text, and in a few places define default interpretation of thecell_methods
that specify more definitively how the cell methods should be interpreted. I include further rationale for the suggested changes below.The ``cell_methods" section opens with:
I have offered only minor non-substantive edits in the above, but I think the rewording reads better.
@JonathanGregory copied this part to #447
I then suggest inserting a paragraph while not modifying the paragraph immediately following:
The inserted paragraph explains how by default the grid-cell values have been computed from the contributing samples. This greatly reduces the need to include the so-called "non-standardized information" regarding the
cell_methods
.I next suggest removing the next two paragraphs:
We advise at the end that users not rely on the default (different) treatments of intensive and extensive variables, and I suspect no careful data writer has relied on this. Furthermore, I think most readers simply give up trying to understand these paragraphs, so why not delete them?
I have no suggested changes to the example (7.5) that follows.
In section 7.3.1, I have made no changes to the 1st paragraph, but I have edited the 2nd paragraph to improve clarity.
@JonathanGregory copied this part to #447
In the 3rd paragraph of 7.3.1 I define default weighting for 2-d (area) means and 3-d means, which also apply to other statistics involving sums. Without this default specification of weighting, data writers would have to provide parenthetical non-standardized information for most of the variables they write.
The next subsection (7.3.2), I suggest, should be placed after section 7.3.3. The reason is that this sub-section discusses how to record supplemental (sometimes non-standardized) information about the method. It seems to me that is much less important and less-often used than specifying what portions of a cell are reported on by a statistic, which is the subject of the current subsection 7.3.3. I have thus renumbered that section 7.3.2, and also suggest the following changes to its first two paragraphs:
The first paragraph has been modified to make it clear that "where" can also apply to the time dimension. The other changes add clarity (I hope) to exactly how to apply "where" in practice.
I have made no changes to the next paragraph or Example 7.7 that follows it, which should, however, now be renumbered Example 7.6. Here is the unaltered text (without the example):
Within the current example 7.7 is some text that really belongs outside it (following) the example. I have suggested a few changes to that text:
Most of the suggested edits follow this discussion.
The next section number 7.3.2 would become section 7.3.3 in my revision. It is largely unchanged except for the third paragraph where the example described has been modified since the original example is now already handled by the default weighting imposed in section 7.3.1.
I suggest no changes to the remaining section 7.3.4.
In summary, the proposal is to:
The text was updated successfully, but these errors were encountered: