Relax nanosecond datetime restriction in CF time coding #9618

kmuehlbauer · 2024-10-13T19:55:39Z

Closes Interoperability with Pandas 2.0 non-nanosecond datetime #7493
Closes DataArray constructor still coerces to np.datetime64[ns], not cftime in 0.11.0 #2587
Tests added/changed
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

This is another attempt to resolve #7493. This goes a step further than #9580.

The idea of this PR is to automatically infer the needed resolutions for decoding/encoding and only keep the constraints pandas imposes ("s" - lowest resolution, "ns" - highest resolution). There is still the idea of a default resolution, but this should only take precedence if it doesn't clash with the automatic inference. This can be discussed, though. Update: I've implemented time-unit-kwarg ~~a first try to have default resolution~~ on decode, which will override the current inferred resolution only to higher resolution (eg. 's' -> 'ns').

For sanity checking, and also for my own good, I've created a documentation page on time-coding in the internal dev section. Any suggestions (especially grammar) or ideas for enhancements are much appreciated.

There still might be room for consolidation of functions/methods (mostly in coding/times.py), but I have to leave it alone for some days. I went down that rabbit hole and need to relax, too 😬.

Looking forward to get your insights here, @spencerkclark, @ChrisBarker-NOAA, @pydata/xarray.

Todo:

floating point handling
Handling in Variable constructor
update decoding tests to iterate over time_units (where appropriate)
...

kmuehlbauer · 2024-10-14T12:13:07Z

Nice, mypy 1.12 is out and breaks our typing, 😭.

TomNicholas · 2024-10-14T15:16:04Z

Nice, mypy 1.12 is out and breaks our typing, 😭

Can we pin it in the CI temporarily?

kmuehlbauer · 2024-10-14T15:28:08Z

Can we pin it in the CI temporarily?

Yes, 1.11.2 was the last version.

kmuehlbauer · 2024-10-14T19:30:32Z

This is now ready for a first round of review. I think this is already in a quite usable state.

But no rush, this should be thoroughly tested.

spencerkclark · 2024-10-18T01:46:23Z

Sounds good @kmuehlbauer! I’ll try and take an initial look this weekend.

…ore/variable.py to use any-precision datetime/timedelta with autmatic inferring of resolution

…ocessing, raise now early

…_ref_date

…o fix mypy

…t resolution, fix code and tests to allow this

kmuehlbauer · 2024-11-08T13:31:06Z

Thanks Stephan for the review. Looking into that next week.

for more information, see https://pre-commit.ci

…ut the pandas code tells us otherwise

… as default.

…numpy datetime64

…times

ChrisBarker-NOAA · 2024-11-18T18:41:48Z

Not to throw too much of a wrench in the works here -- so feel free to disregard, but there's an issue I've faced with (single precision) float time encoding:

Folks (carelessly :-( ) sometimes encode times as "days since ..." using a single precision float. The problem here is not unnecessary precision, as you get with double, but too little -- if you go more than a few years out , you lose seconds precision. (the key problem is that float time -- its precision is a function of the magnitude -- not good for this use case)

The end result is that I get things like model timesteps that are supposed to be hourly, reporting as, e.g. 12:00:18, rather than 12:00:00

One way I've dealt with this is rounding to the minute, or even to hours (if I know the output is hourly), or perhaps to 10 minutes.

Could / should xarray provide a facility for doing this? maybe?

I guess what I'm proposing is that there be some way to tell xarray to store / save a time variable with e.g. second precision, but to round it to something more coarse when decoding.

maybe this could even be automatic / inferred:

if a time is in float days since -- it almost certainly is NOT millisecond precision, or even second precision -- and you could even look at the values (the first one?) and see what the minimum precision is for that timespan.

If I've done my math right, a float can only store second precision for a little over three years. So if the values are greater than three years, you don't have second precision.

Anyway, maybe way too much magic, but it would be nice for my use cases :-)

Example:

# 15 min timestep
In [57]: dates
Out[57]: 
[datetime.datetime(2024, 1, 1, 0, 0),
 datetime.datetime(2024, 1, 1, 0, 15),
 datetime.datetime(2024, 1, 1, 0, 30),
 datetime.datetime(2024, 1, 1, 0, 45)]

# common choice of units (though a bad one :-( )
In [58]: units
Out[58]: 'days since 1970-01-01T00:00:00'

# convert to numbers, used float64 by default
In [59]: nums_double = nc4.date2num(dates, units)

# truncate to float32
In [60]: nums_float = nums_double.astype(np.float32)

# convert back to datetimes:
In [61]: dates_float = nc4.num2date(nums_float, units)

In [62]: dates_float
Out[62]: 
array([cftime.DatetimeGregorian(2024, 1, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2024, 1, 1, 0, 14, 3, 750000, has_year_zero=False),
       cftime.DatetimeGregorian(2024, 1, 1, 0, 30, 56, 250000, has_year_zero=False),
       cftime.DatetimeGregorian(2024, 1, 1, 0, 45, 0, 0, has_year_zero=False)],
      dtype=object)
In [67]: [str(dt) for dt in dates_float]
Out[67]: 
['2024-01-01 00:00:00',
 '2024-01-01 00:14:03.750000',
 '2024-01-01 00:30:56.250000',
 '2024-01-01 00:45:00']

Ouch! so what were 15 minute timesteps is now off by about one minute -- and what's too bad is that rounding to the minute wouldn't be right either -- you'd need to round to maybe 5 minutes?

Anyway, maybe this simply isn't xarray's problem to solve -- data providers shouldn't make such mistakes :-(

spencerkclark · 2024-11-18T22:14:27Z

@ChrisBarker-NOAA yeah, I agree this kind of situation is annoying, but my feeling is that trying to fix this automatically would be too much magic. Xarray has convenient functionality for rounding times, which can be used to correct this explicitly—that would be my preference. E.g. for your example it would look like:

>>> decoded
<xarray.DataArray 'time' (time: 4)> Size: 32B
array([cftime.DatetimeGregorian(2024, 1, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2024, 1, 1, 0, 14, 3, 750000, has_year_zero=False),
       cftime.DatetimeGregorian(2024, 1, 1, 0, 30, 56, 250000, has_year_zero=False),
       cftime.DatetimeGregorian(2024, 1, 1, 0, 45, 0, 0, has_year_zero=False)],
      dtype=object)
Dimensions without coordinates: time
>>> decoded.dt.round("5min")
<xarray.DataArray 'round' (time: 4)> Size: 32B
array([cftime.DatetimeGregorian(2024, 1, 1, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2024, 1, 1, 0, 15, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2024, 1, 1, 0, 30, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2024, 1, 1, 0, 45, 0, 0, has_year_zero=False)],
      dtype=object)
Dimensions without coordinates: time

ChrisBarker-NOAA · 2024-11-18T22:27:21Z

Xarray has convenient functionality for rounding times

Oh, nice: I had missed that! -- you're probably right, too much magic to do for people.

kmuehlbauer · 2024-11-18T22:38:09Z

@spencerkclark @ChrisBarker-NOAA I've implemented automated decoding of floating point data to the needed resolution, even when the wanted resolution does not apply.

Unfortunately the above outlined behaviour is too much involved to be put into the decoder. Nevertheless maybe we can distill some best practices from your vast experience with data @ChrisBarker-NOAA and create a nice example how to handle these difficulties?

ChrisBarker-NOAA · 2024-11-18T22:56:21Z

create a nice example how to handle these difficulties?

Sure -- where would be a good home for that?

kmuehlbauer · 2024-11-18T23:06:13Z

Not sure, but https://docs.xarray.dev/en/stable/user-guide/time-series.html could have a dedicated floating point date section.

TomNicholas mentioned this pull request Oct 14, 2024

Reimplement Datatree typed ops #9619

Merged

4 tasks

kmuehlbauer force-pushed the any-time-resolution-2 branch from ca5050d to f7396cf Compare October 14, 2024 16:09

kmuehlbauer marked this pull request as ready for review October 14, 2024 18:05

kmuehlbauer mentioned this pull request Oct 15, 2024

Implement default time resolution for CF time encoding/decoding #9580

Closed

4 tasks

kmuehlbauer added topic-CF conventions topic-cftime run-upstream Run upstream CI labels Oct 16, 2024

kmuehlbauer and others added 18 commits October 18, 2024 07:31

implement default_precision_timestamp, refactor coding/times.py and c…

7b5f323

…ore/variable.py to use any-precision datetime/timedelta with autmatic inferring of resolution

align tests with new time resolution behaviour

8784f33

timedelta decoding, fsspec handling

b45ab23

fixes in coding/times.py

39086ef

add docs on time coding

df49a40

attempt fixing doc tests

adb8ca3

fix issue where out-of-bounds floating point values slipped in the pr…

266b1ed

…ocessing, raise now early

convert to UTC first before stripping of tz in _unpack_time_units_and…

6d5f13b

…_ref_date

reorganize pandas compatibility code, remove unneeded code, attempt t…

5d68bfe

…o fix mypy

another attempt to finally fix mypy

07bba69

refactor out _check_date_is_after_shift

6e7f0bb

refactor out _maybe_strip_tz_from_timestamp

b4a49bb

more refactoring in coding.times.py

2e1ff4f

more refactoring in coding.times.py

d5a7da0

minor fix in time-coding.rst

821b68d

set default resolution to "s", which actually means, use pandas lowes…

d066edf

…t resolution, fix code and tests to allow this

Add section for default units, fix options

ed22da1

attempt to fix typing

8bf23f4

kmuehlbauer and others added 2 commits November 16, 2024 20:29

Merge branch 'main' into any-time-resolution-2

f487599

[pre-commit.ci] auto fixes from pre-commit.com hooks

20d6c9d

for more information, see https://pre-commit.ci

kmuehlbauer mentioned this pull request Nov 16, 2024

BUG: to_datetime wraps datetime64[ps] as if it were datetime64[ns] pandas-dev/pandas#60341

Open

3 tasks

kmuehlbauer and others added 18 commits November 16, 2024 21:36

remove outdated description

7391948

use set instead list

308091c

remove global option

5f40b4e

mypy thinks unit is Literal, because the pandas-stubs suggest so, b…

2a65d8d

…ut the pandas code tells us otherwise

ignore mypy arg-type

43f7d61

fix docstring of default_precision_timestamp

59934b9

add 'time_unit'-kwarg to decode_cf and descendent functions with "ns"…

a01f9f3

… as default.

fix tests

8b91128

fix more tests

0e351ca

fix docstring

07a8e9c

use pd.Timestamp(np.datetime64(cftime)) to convert from cftime to numpy

2be5739

use dt = np.datetime64(cftime.isoformat()) to convert from cftime to …

b9d0a8e

…numpy datetime64

fix time-coding.rst

08afc3b

use us in to_datetimeindex

edc55e1

revert back to us for datetimeindex tests

bffe919

estimate fitting resolution for floating point values, when decoding …

150b982

…times

add test

7113ceb

refactor floating point decoding

7f47f0b

Merge branch 'main' into any-time-resolution-2

512808d

simplify recursive function, update tests

63c83f4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relax nanosecond datetime restriction in CF time coding #9618

Relax nanosecond datetime restriction in CF time coding #9618

kmuehlbauer commented Oct 13, 2024 •

edited

Loading

kmuehlbauer commented Oct 14, 2024

TomNicholas commented Oct 14, 2024

kmuehlbauer commented Oct 14, 2024

kmuehlbauer commented Oct 14, 2024

spencerkclark commented Oct 18, 2024

kmuehlbauer commented Nov 8, 2024

ChrisBarker-NOAA commented Nov 18, 2024

spencerkclark commented Nov 18, 2024

ChrisBarker-NOAA commented Nov 18, 2024

kmuehlbauer commented Nov 18, 2024

ChrisBarker-NOAA commented Nov 18, 2024

kmuehlbauer commented Nov 18, 2024

Relax nanosecond datetime restriction in CF time coding #9618

Are you sure you want to change the base?

Relax nanosecond datetime restriction in CF time coding #9618

Conversation

kmuehlbauer commented Oct 13, 2024 • edited Loading

kmuehlbauer commented Oct 14, 2024

TomNicholas commented Oct 14, 2024

kmuehlbauer commented Oct 14, 2024

kmuehlbauer commented Oct 14, 2024

spencerkclark commented Oct 18, 2024

kmuehlbauer commented Nov 8, 2024

ChrisBarker-NOAA commented Nov 18, 2024

spencerkclark commented Nov 18, 2024

ChrisBarker-NOAA commented Nov 18, 2024

kmuehlbauer commented Nov 18, 2024

ChrisBarker-NOAA commented Nov 18, 2024

kmuehlbauer commented Nov 18, 2024

kmuehlbauer commented Oct 13, 2024 •

edited

Loading