UTC leap-seconds (again!) #304
Replies: 52 comments 13 replies
-
Nice summary @ChrisBarker-NOAA! Thanks for getting the ball rolling on this. I agree that what we are actually doing is encoding time durations, and that the spec should reflect that. I think the correct way to think about it is that time durations are real geophysical quantities, and datatimes are just arbitrary labels that you apply to time durations according to a scheme defined by a calendar and an epoch associated with them. I also agree with your assessment of the current state of affairs. Our Now, I may be missing something because I haven't had time to think about it deeply, but it seems to me that the only difference between a proper TAI calendar and the standard/kinda-utc calendar is that the epoch is defined with a UTC timestamp rather than a TAI timestamp. (I had been thinking that standard == TAI, but I hadn't taken UTC epochs into account, and I agree with your point about the need to accommodate that in existing real-world data and instruments.) So what about this idea: We define a TAI calendar in the very straightforward way we've been discussing. We then declare that the default/standard calendar is a TAI calendar, BUT with the epoch declared in UTC. All you need to do to convert from a UTC epoch to a TAI epoch is count the number of leap seconds up to that epoch, which is a known and fixed quantity that you can easily look up, so that gives us a clear interoperability path between all of our existing calendars and TAI. Going forward, people could start using TAI epochs if they want to make things simpler, but all the existing data with UTC epochs is still easy to handle. Then, separately, we define a UTC calendar that has all the caveats about leap seconds, and for which we recommend against using a unit longer than seconds. Add some discussion about how some units become variable-length under certain calendars, and now we've clarified and codified the existing situation without needing to make any big changes. Plus, anybody who has data where the time coordinates do properly handle leap seconds can fix it just by changing the calendar to "UTC". To convert from UTC to any other calendar, you need to take leap seconds into account, but anybody who's actually operating in for-real UTC (not just kinda-utc) is probably already dealing with that. Would that work? Does that match up with both the existing state of things and data we expect to be produced in the future? |
Beta Was this translation helpful? Give feedback.
-
yes, I think that's it exactly.
well, yes, but the difference is that "klubwls" specifically is based on a UTC epoch -- so weird -- epoch has leap seconds, but anything after that doesn't.
In either case though, there is an encoding/decoding process, going from a time coordinate to datetimes and back again. And depending on how the file is generated, either one could be the "original" correct, precise, value. As long as the transformation between the two is defined, then BOTH are correct, and it doesn't matter -- so that's the state I'd like to get to. But if we can't get there (I think we can), I'd rather that we say that the durations exprtessed in the time coordinate are correct, but that the timestamps that result from a decoding may not be exactly correct, if done wrong, rather than the opposite, which is what we have now. |
Beta Was this translation helpful? Give feedback.
-
Dear @ChrisBarker-NOAA and @sethmcg Thanks for the discussion. I agree with a lot of what you both say. I probably disagree with you about which calendars I am happy to use, but that's not as important as our agreeing on what calendars CF should support and their exact definitions. I'll make some brief statements here as clearly as I can, and perhaps you can say whether you think they're correct.
What do you think? Happy weekend Jonathan |
Beta Was this translation helpful? Give feedback.
-
An addition to the above list of points:
Cheers, Jonathan |
Beta Was this translation helpful? Give feedback.
-
On most *nix systems, the "tzdata" package will contain a list of leap seconds, e.g. as "/usr/share/zoneinfo/leapseconds". But at least on my Cent OS system, the "udunits2" package does not require the "tzdata" package, which explains why udunits is not aware of leap seconds. |
Beta Was this translation helpful? Give feedback.
-
OK -- on to my point:
This is what I'm trying to wrap my head around -- a time coordinate represents a bunch of points in time on the Time Axis (now defined above). And each point is encoded as being a timedelta since an epoch. What I think is that it should, well, represent exactly what it appears to represent -- how much time has passed since that epoch. And I think that should be the canonical "truth". The challenge comes in when you want to decode the values to timestamps in some calendar (or encode them from timestamps). As Jonathan points out, this is a defined 1:1 process with all calendars except the "standard" one. So that's the only issue at hand. My point is that if: "... a time coordinate value does not necessarily exactly equal the actual length of the interval of time between the reference date/time and the date/time it represents." Then that means there was an error in the encoding of coordinate values, rather than an inherent property of the way we are storing the data -- and I think in CF we should express it that way. My point is that the coordinate values should represent specific points along the Time Axis -- how those points might be expressed in a given calendar is up to the decoding software -- the entire idea of a leap second (or a leap year, or ....) is utterly irrelevant to the time coordinate when encoded this way. -- well, not utterly, as you still need to express the epoch somehow. Anyway, If everyone had leap-second aware software, we could all just use UTC "correctly" and all would be good -- but that is not the case, hence the 'standard" calendar. But I think we can do a bit better than just stating that leap seconds could mess things up. Thinking out loud now:How do we end up with these "wonky" time coordinates? They can come about because of the challenge in how to encode / decode timestamps that are in UTC, without using leap-seconds aware software. For instance:
The challenge (again, thinking out loud -- or thinking on computer) -- is when a time coordinate has a bunch of time-stamped measurements, some from before the leap-second, and some after. If we take it that all the timestamps are correct UTC, then when non-leap-second-aware software encodes them, there will be a discrepancy between the ones before and after the leap second -- this is the source of the "a time coordinate value [that] does not exactly equal the actual length of the interval of time between the reference date/time and the date/time it represents" -- either the ones after the leap-second are second long (or the other way around -- my brain hurts). But: What CAN be done is to re-create the original timestamps, by decoding the time axis using the Gregorian calendar without leap seconds (i.e. standard software) -- which is the CF "standard" calendar. OK -- I still think that an uneven time axis is an error, but an unavoidable one, given the software available now, so we do need to talk about it in CF. Also: I mentioned that we might want a "kinda-like-utc-but-without-leap-seconds" calendar -- I realize now that that was reject back when because that's what the "standard" calendar already is. So no real change / addition required, but the language could be clarified. More on that later -- it's dinner time for me. |
Beta Was this translation helpful? Give feedback.
-
Dear @ChrisBarker-NOAA Thanks for your further thoughts. Picking up your last remark, if we can add more words to the convention in order to make it clearer and avoid confusion, that would be very useful. Given how much difficulty we have in discussions of this subject, I expect it would be beneficial to explain things in several ways, in order to suit different readers' expectations or preconceptions. From my point of view (which I am afraid may baffle or enrage some others, maybe you - sorry about that), it seems that CF The confusion comes from the syntax we have adopted from UDUNITS, of "time-unit
Let's not assume it means anything in the first place. It's just a number which encodes a timestamp. It's useful to store a timestamp as a number because it takes less storage, and numbers can be easily and reliably sorted into monotonic order, as required for coordinates. The encoding we choose to use is as an elapsed time since a reference timestep. This is a very useful choice, because differences between time coordinates give us time intervals. However, it's tricky to use it in practice for UTC, because of not having convenient software. Therefore for UTC we define a different encoding, which ignores leap seconds. That means the time coordinate is not always exactly equal to the time interval, and timestamps applying to leap seconds can't be encoded. I'm not saying anything new here, just suggesting a different perspective. Instead of Timestamps should be encoded as elapsed time since a reference, and therefore the CF Best wishes Jonathan |
Beta Was this translation helpful? Give feedback.
-
I don't think it's a mistake, but I do think that the result is an "incorrect" encoding (or maybe "limited" [*]) -- but it is incorrect in a defined way, so still useful. I think we can probably find wording the avoids editorial works like "mistake" or "incorrect" :-), and rather describe the situation precisely instead. The fact is that folks for whom seconds precision matters should be using TAI, not UTC or the "standard" calendar at all. Hopefully I'll get a chance to suggest some different language soon. [*] -- limited -- while using UTC for the epoch, it is impossible to encode the point on Time Axis that is a leap-second in UTC. But that time does, in fact exist in the "real" world, so that's kind of a problem :-(. |
Beta Was this translation helpful? Give feedback.
-
Dear Chris I'd certainly agree that the I agree with you that for applications where precision to the second is important the TAI calendar is needed, or a true UTC calendar (if someone asks for that). Since we already have a use-case for TAI, does everyone agree we should add it to CF? Best wishes Jonathan |
Beta Was this translation helpful? Give feedback.
-
I think so, and I don't think anyone involved with the recent discussion has said otherwise. I didn't start a PR, 'cause I thought someone that actually used TAI should probably do it :-). Or does it need an issue first? |
Beta Was this translation helpful? Give feedback.
-
I agree that a TAI calendar is needed. I also wonder if it might be a good idea to also introduce the UTC calendar at the same time, even without a use case (do we know that the remote sensing community wouldn't want this?), because it would help make it clear what the TAI calendar is, since it is defined in relation to UTC. |
Beta Was this translation helpful? Give feedback.
-
I support having a TAI calendar in CF. We have a clear usecase for it. And, actually, I was thinking the same as David, that a TAI calendar preferably should to be complemented with a UTC calendar ("explicit is better than implicit"). Having said that, and without diminishing my support, I am still curious about how the TAI calendar is intended to be used. I do realize that there is a need to very high precision timekeeping in satellite navigation and control. But as the CF time coordinate is seconds Many thanks, |
Beta Was this translation helpful? Give feedback.
-
Hi Lars, The argument given by @JonathanGregory above is that the logical content in CF is in fact the timestamp, and that "seconds |
Beta Was this translation helpful? Give feedback.
-
In theory, this is the obvious thing to do -- it's widely used, and clearly defined (if undefined for the future). By my concern for including it is that many people think they are using UTC (OK -- are using UTC), but if their software (both on the production and consumption side) doesn't support leap seconds, then they are not really using UTC, and there will be errors in the time axis. And unlike Jonathan's point about the standard calendar -- it would be actual errors. So providing a UTC calendar would invite misuse -- is that a problem? not sure -- consenting adults and all that, and we can clearly note in the docs that the UTC calendar should only be used with software that handles UTC fully and properly. But if there's not actual use case, maybe we shouldn't provide the temptation. |
Beta Was this translation helpful? Give feedback.
-
NOTE: in a five minute Google search, the only library I found that appears to handle leap seconds is this Java libary: https://github.com/MenoData/Time4J There's also this: Though it's not clear to me if that's embedded in code that can be used to convert to/from timestamps to timedelta-since encoding. Until / if there's readily available libraries (ideally supported for C, Fortran, Python, Java, others?) available for true UTC time, we probably shouldn't add it to CF. [*] For intellectual curiosity, I took a quick look at the Python cftime library -- I don't think it would be a huge lift to add leap-seconds support for a proper UTC calendar, but someone would have to write the code, and that would only help a (subset of) python users. Now that I think about it, a small subset -- I think, for instance, that xarray / pandas uses numpy datetime64 internally -- so if it doesn't support leap-seconds, it may not work right -- though I'd have to think on that more. |
Beta Was this translation helpful? Give feedback.
-
Haven't had time to fully follow, but I am curious ... Is it true that Someone recording the time of sunset today and 50 years ago would assign the same two TIMESTAMPs to these measurements, regardless of calendar. For example in a model with a 360-day calendar, you would assign a certain time stamp for today and the same time-stamp used in the model to identify a day 50*360 days ago. It's only when we compute the interval of elapsed time between two events that we get a difference, which depends on the calendar. In models with a 360 day calendar, the elapsed time would be more than 200 fewer days, but in comparing models to observations for these dates, you would not care about the elapsed time, only the TIMESTAMPS, and the TIMESTAMPS would allow you to determine which times to compare between the model and the observed. A good model would have the sun exactly in the right place in the sky and set at the right time of day if the same time-stamp samples were compared. Is this at all helpful or trivially irrelevant? |
Beta Was this translation helpful? Give feedback.
-
I think that is correct, Karl @taylor13. The first aim of CF is to enable comparison of data from different sources. For that aim, when comparing models and observations, you would often regard data with the same timestamp as "comparable". The timestamps themselves are important for comparison of data from different calendars, when they all refer to the real world for the purposes of the comparison. The encoding of the timestamps in CF as elapsed time since a reference is irrelevant for that purpose; it's simply an encoding. |
Beta Was this translation helpful? Give feedback.
-
Dear Jonathan, Referring to you comment a few days ago, I am glad that we agree on many points, but as it seems not on all aspects regarding how to interpret/explain and use the CF
The calendar as such is of course as precise as a data producer choose when writing the data. What I meant with "imprecise" was more how it is used when a data user wants to combine data from various sources. That is, what I aimed at is more like what you wrote in your response to Karl. However, it is not possible to simply "cut off" part of a datetimestamp from the right to indicate some intrinsic [lack] of precision in the time coordinate of the data. I think that we largely agree on this. Then you continue:
Yes, I agree. And either any such clock is actually able to "display" the leap second time stamp when one occurs, in which case it is a proper UTC clock, or it will "absorb" the leaps by adjusting the internal tick mechanism (or something else), in which case the normal march of "displayed seconds" will show some kind of irregularity in connection to the leap second indicating that it is not a proper UTC clock. If I again use a table to illustrate where I think the problem is (this time as a screen clip) To the left are the SI seconds, and to the right are how I interpret your suggestion for the CF timestamp, with the "excluded" leap second, and three alternatives for how it can be encoded as In an earlier comment you explain this by writing
I think the first of your sentences would be a good starting point for improving the nebulous (at least to me!) current CF sentence "It is important to realise that a time coordinate value does not necessarily exactly equal the actual length of the interval of time between the reference date/time and the date/time it represents.". But I would say, based on the table above, that the fact that the Kind regards, |
Beta Was this translation helpful? Give feedback.
-
Hello, Was there a consensus reached on whether time coordinates represent timestamps, or represent the distance along the time axis from the origin? Or are both interpretations "allowed". You might say that the discussion answers this question, but I would find the answer most useful in interpreting the discussion!
(Admitting I'm not sure where I am on this, yet) I'm not sure about this - if it were the case, then wouldn't it follow that the CF second is not an SI second in all of the current CF calendars ( Cheers, |
Beta Was this translation helpful? Give feedback.
-
Hello David, I do not think we actually reached a consensus, neither was there an outspoken disagreement .... I would argue that the time stamp is intrinsically linked to the calendar and that the time stamp in one calendar is not the same as the timestamp in another calendar. See my comment, which only deals with differences in seconds (because of the overall focus of this issue), which often can ignore. But I would not want to equate 2023-06-21 (summer solstice) in the With respect to the current topic I would say that either we have a proper UTC calendar capable of representing the leap seconds, or we have something else. This "something else", which is what we are discussing, is not a proper UTC calendar only because it is not capable of representing the leap seconds, which means that since 1972-01-01 the number of datetimestamps is 27 fewer than that of a proper UTC clock. And this "something else" is exactly the CF |
Beta Was this translation helpful? Give feedback.
-
Dear Lars I'm happy to see that we are largely understanding each other now! That's wonderful. With this exposition, I see what you mean when you say that the I agree that's a possible way of seeing it, but not the way I think of it, which is that we don't have to take Alternatively, we could say that Let's imagine there is a parallel universe which is exactly the same as this one in all respects except for constant rotation of the Earth in our era, and history and climate are all just the same there as here despite this small difference (which you expect would actually make everything totally different after a while, because of chaos, but never mind). Because everything is the same, it would be sensible for us to compare the events of 29 June 1972 in this Earth and the other Earth, down to the hour or minute, but we neglect the leap second. To make this comparison, we use timestamps. 1972-06-29 23:00:00 and 1972-06-30 00:00:00 refer to comparable instants of time in the history of the two Earths. For convenience, we decide to encode these timestamps as Does that help, or is it too silly? Dear David I too don't think a consensus has been reached, because I don't agree with Lars's view. I would argue that the same timestamp in two different calendars can sometimes refer to instants of time which we regard as comparable (as in Karl's question, and my fanciful example above). In those situations, timestamps are primary. I think that 2023-06-21 in the Best wishes Jonathan |
Beta Was this translation helpful? Give feedback.
-
Indeed so, which suggests to me that the amount of seconds during each of those months is immaterial, and the timestamps are the important thing. |
Beta Was this translation helpful? Give feedback.
-
Dear Jonathan, Referring to your recentmost comment I fully agree with your first paragraph. Regarding the second paragraph it seems that we do not agree. You write
I am not sure I understand what you mean here. The Regarding your third and fourth paragraph, I fully agree in principle. But we are specifically dealing with the "real world" here, where those responsible for time keeping have decided to have leap seconds. And how to deal with these are precisely what we are trying to tease out here. I would be very happy if we could conclude something like Perhaps we could even include one of the tables above to illustrate impact of a leap second on different calendars. Lars |
Beta Was this translation helpful? Give feedback.
-
Having thought a bit more about the recent direction the conversation has taken, I do think that we need to refocus on the actual issue at hand. Clearly leap seconds are not an issue for current models. It would of course be possible to create a However, there are many observational datasets, either coming from operational data collection networks, or from specific field campaign having very high temporal resolution. Many observational system produce data at 1 Hz, and there are ultrasonic anemometers measuring at 100 Hz. While the latter probably require TAI time or GPS time, for the former the I think that we should focus our conversation on how to explain and how to interpret leap seconds and the |
Beta Was this translation helpful? Give feedback.
-
Sorry, Lars, but I'm struggling to follow this. The standard calendar does not have leap seconds, so surely there is no such interpretation to be made. Similarly, imagine trying to define the interpretation of leap years in the 365_day calendar - I don't think you can because that calendar has no concept of leap years. Your example on an extensive quantities is interesting, and to me supports the notion time coordinates represent timestamps. The month lengths are different in the two calendars, yet the monthly means are still comparable. This can only be the case when we regard the time information as timestamps rather than elapsed time. Thanks, |
Beta Was this translation helpful? Give feedback.
-
@davidhassell and I have submitted a new conventions issue #542 with a proposal that attempts to address the difficulties discussed above and in previous issues. The summary of the proposal is as follows:
We would welcome your comments on the issue. It would be marvellous if we could resolve this (after so many years and tears) in time for CF 1.12! |
Beta Was this translation helpful? Give feedback.
-
Thanks! this is an absolutely fantasic response to a very vexing problem -- congrats! And clearly a lot of careful work -- kudos! I put a number of comments on the PR -- but none of them are show-stoppers. |
Beta Was this translation helpful? Give feedback.
-
My only real concern is that the UTC calendar is an "attractive nuisance", and there is very little software that handles it properly, and many people use "UTC" imprecisely. But the text is very clear about the leap seconds, so buyer beware, I guess. |
Beta Was this translation helpful? Give feedback.
-
I am glad of your support, @ChrisBarker-NOAA - thanks. I will copy your comments into issue #542, where we've made the proposal. Please could anyone who wishes to comment on our proposal likewise do so in issue #542, so we keep the discussion in one place. Thanks. |
Beta Was this translation helpful? Give feedback.
-
Issue 542 is now concluded. The change will be in CF 1.12. |
Beta Was this translation helpful? Give feedback.
-
Topic for discussion
From #297, it's clear that there are still confusions / problems with CF's handling (or not handling) of leap seconds in UTC time. I am starting this discussion to, yet again, hash some of this out.
First the non-topic:
In #297 it was proposed that the TAI calendar be added to CF -- that will or will not happen, but does not need to be discussed here. What I would like to discuss is how UTC is handled/talked about in CF.
A few definitions:
Time axis: The abstraction for the passage of time. Relativity aside, the time axis captures the passage of monotonically increasing time.
Timestamp: a human-readable expression of a "point on the time axis", e.g. 2024-03-22Y09:55:32 (in ISO format, but that's a different issue). What a given timestamp means depends on the calendar used.
Calendar: A way to denote passage of time in human-friendly way: e.g. years, months, days, hours. As such, the
calendar defines the set of timestamps (year-month-day-hour-minute-second) that are permitted, and how the different calendars can be mapped to / from each other.
Timedelta: A duration of time, e.g. 23 seconds, 5 years, etc.
Epoch: The starting point for a time specification -- e.g. Unix epoch is January 1st, 1970 at 00:00:00 UTC.
The problem:
(This is all a re-hashing of what I'm sure most readers know, but to put it in one place)
As we all know, the UTC calendar is widely used, and essentially universal, and thus we have no choice but to use it in CF (not to mention years of legacy). UTC also includes "leap seconds", which are used to keep UTC in sync with the sun -- e.g. local noon. NIST discussion of leap seconds This is not so different than leap years, except one critical difference -- there is no way to know when leap seconds may occur in the future. Combine that with the fact that, well, it's only a few seconds, which is a precision that most frequently doesn't matter, and there is very little (none??) software that handles leap seconds properly. And this means that when once does computation with timestamps and timedeltas (e.g. seconds since a timestamp) -- it may be off by some number of seconds from the correct result. (up to 37 seconds, I think, at this writing).
This is a problem for CF because in CF you encode time coordinates in "timedelta since a timestamp" format, e.g. "seconds since 1970-01-01T00:00:00", which requires computing the timedelta(s) to/from timestamps, and with the UTC calendar that can only be done with seconds precision if you include leap-seconds, which most software does not.
What this means is that many (most?) actual files in the wild are not actually correct UTC down to the leap second. For most applications, it really doesn't matter -- if you have a model with a 1hr timestep running for 10 years, second-precision does not matter. If you have data collected by an instrument -- hopefully, you have used an epoch close to the time of the data collection, and thus not crossed a leap second boundary (or only one?)
Another issue is that Computers most often use UTC time, and re-set themselves to UTC periodically (usually using the NTP system. What this means is that if you ask the computer what the UTC time is now, it will usually provide the correct UTC time. it does this without accounting for leap seconds by adjusting the Unix EPOCH so that it results in the correct time now, which would be a bit off for times in the past. (and a bit off right around the leap second itself -- see the RedHat post for details).
All these complications / limitations are out of the hands of CF, of course. All we can do is use the best specifications we can so that used know clearly what they are getting.
The current state of the issue in CF:
Is this good enough?
I don't think so -- the OP in #297 will probably be satisfied with a TAI calendar, but I think we can do better with handling of UTC. Some thoughts:
@JonathanGregory wrote elsewhere:
If you are going to use a "timedelta since" encoding, rather than a timestamp encoding, it makes a LOT more sense for the "timedelta" to be, well, an actual duration of time, not a calendar concept. And we've gone halfway there by recommending that years and months, for instance, not be used, as they are defined as a specific duration, NOT a calendar year or month. Granted, that was inherited from UDUNITS, but it makes sense, and it's a lot more consistent with most software, and how people use these data.
Most software (particularly scientific software) works with time in actual time duration units, not calendar times (seconds since, microseconds, etc.)-- e.g. unix time, etc. This is both for computational efficiency (storing time in a regular data type), and because elapsed durations are usually more important than timestamps. And most CF - processing code (at least the ones I deal with) transform the CF time coordinate assuming that the values DO mean an actual elapsed duration, so a timestamp change.
In fact, with UTC and leap seconds, the accuracy of the elapsed duration may not be correct -- but I think that should be considered an error in the encoding, NOT a correct way to encode UTC times!
@JonathanGregory also write elsewhere:
When I first read this, I thought "but CF already uses UTC ?!?!", but then looked again, and indeed, it does not. And yet it does define UTC as the calendar for the epoch, which I think is an massive gap. So yes, we should specify the UTC calendar -- if no software supports it, that's OK, it would allow us to at least be clear that differences from UTC are errors (lack of precision) rather than expected.
A while back (#148, I think -- very long one that) both TAI And UTC was proposed, and IIRC, in the middle somewhere, a "kinda of like
UTC but without leap-seconds" calendar was proposed. I think that should be revived:, not because it's an actual good idea, but because it's what, in reality, many already written, and going to be written, files are encoded with.
It would actually define what the values mean, rather than simply saying "It is important to realize that a time coordinate value does not necessarily exactly equal the actual length of the interval of time between the reference date/time and the date/time it represents." -- without any way of knowing what it represents.
Essentially, the current status-quo is:
Time coordinates may be off by some unknown number of seconds, up to 37 as of this writing -- good luck with that!
Doesn't seem ideal .....
So what how would "kinda-like-utc-but-without-leap-seconds" be defined?
The epoch would be UTC. This is important, because computers (and presumably instruments) reset themselves so that "now" is correct UTC, even though they are (often) encoding it as milliseconds since 1970
it is not the same as TAI, because TAI is slowly differing from UTC over the years, so this would only be the same as TAI if the epoch was before June 30, 1972.
Any time after the epoch would be calculated with the Gregorian calendar, and not using leap seconds.
What this would allow is:
A) a time coordinate value does exactly equal the actual length of the interval of time between the reference date/time and the date/time it represents.
B) a clear way to go to/from the time coordinate and timestamps that would be correct, and the same with data producers and data consumers.
I think this is actually pretty much the state of affairs with most existing data files and processing software -- we would simply be codifying it.
Let the lengthy discussion ensue!
Beta Was this translation helpful? Give feedback.
All reactions