Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add datetime encode/decode property test #9507

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dcherian
Copy link
Contributor

I was curious and though something lime this might help with #9498. However it seems to catch some other overflow/underflow errors.

Trying example: test_CFDatetime_coder_roundtrip_numpy(
    original_array=array(['2188-01-11T16:15:17.106233897',                           'NaT',
           '1895-10-02T16:28:00.251458088'], dtype='datetime64[ns]'),
)
>   ???
E   OverflowError: value too large

This one seems weird... (i set use_cftime=False...)

original_array = array(['2188-01-11T16:15:17.106233897',                           'NaT',
       '1895-10-02T16:28:00.251458088'], dtype='datetime64[ns]')

    @given(
        original_array=npst.arrays(
            dtype=npst.datetime64_dtypes(endianness="=", max_period="ns"),
            shape=npst.array_shapes(max_dims=1),
        )
    )
    def test_CFDatetime_coder_roundtrip_numpy(original_array) -> None:
        original = xr.Variable("time", original_array)
        coder = xr.coding.times.CFDatetimeCoder(use_cftime=False)
>       roundtripped = coder.decode(coder.encode(original))

/Users/deepak/repos/xarray/properties/test_encode_decode.py:67:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/Users/deepak/repos/xarray/xarray/coding/times.py:976: in encode
    (data, units, calendar) = encode_cf_datetime(data, units, calendar, dtype)
/Users/deepak/repos/xarray/xarray/coding/times.py:725: in encode_cf_datetime
    return _eagerly_encode_cf_datetime(dates, units, calendar, dtype)
/Users/deepak/repos/xarray/xarray/coding/times.py:800: in _eagerly_encode_cf_datetime
    num = _encode_datetime_with_cftime(dates, units, calendar)
/Users/deepak/repos/xarray/xarray/coding/times.py:649: in _encode_datetime_with_cftime
    return reshape(np.array([encode_datetime(d) for d in ravel(dates)]), dates.shape)
/Users/deepak/repos/xarray/xarray/coding/times.py:644: in encode_datetime
    else cftime.date2num(d, units, calendar, longdouble=False)
src/cftime/_cftime.pyx:252: in cftime._cftime.date2num
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   ValueError: In general, units must be one of 'microseconds', 'milliseconds', 'seconds', 'minutes', 'hours', or 'days' (or select abbreviated versions of these).  For the '360_day' calendar, 'months' can also be used, or for the 'noleap' calendar 'common_years' can also be used. Got 'nanoseconds' instead, which are not recognized.

@TomNicholas TomNicholas added topic-testing topic-cftime topic-hypothesis Strategies or tests using the hypothesis library labels Sep 17, 2024
@spencerkclark
Copy link
Member

spencerkclark commented Sep 17, 2024

Thanks @dcherian—this failure is not a complete surprise to me (although it is a bit gnarly to unpack—note that use_cftime only applies to the decoding step, and this error occurs during encoding). Our eager encoding units inference procedure simply picks the first date in the array as the reference date:

reference_date = dates[0] if len(dates) > 0 else "1970-01-01"

which is 2188-01-11T16:15:17.106233897 in this case.

The difference between 2188-01-11T16:15:17.106233897 and 1895-10-02T16:28:00.251458088 is greater than the maximum timedelta that can be represented with a np.timedelta64[ns] value (~292 years). Despite silent overflow in this step:

unique_timedeltas = np.unique(np.diff(dates))

we choose encoding units of "nanoseconds since 2188-01-11T16:15:17.106233897". Later though we are careful to raise an overflow error when trying to encode the times with pandas with these units (#2519). When this happens we fall back to encoding the times with cftime. Here things fail, because cftime cannot handle nanosecond units.

This exposes a couple of issues related to units inference:

  1. One can argue that choosing a reference date of 1970-01-01 is better than an arbitrary value of the array, since it guarantees the maximum range of datetime64[ns] values can be represented by int64 values. I'm not sure whether we want to change this or not—the flip side is that it may make encoding times with smaller integer dtypes, e.g. np.int16, np.int32, require a bit more care on the part of the user—but it's a property of the existing eager encoding approach (for lazy encoding we default to a reference date of 1970-01-01).
  2. The silent overflow in the units inference step is also concerning. It happens to lead to what I think is the correct outcome in this case—the dates do require nanosecond units to be represented faithfully, which cftime does not support—but it could potentially prevent code from working that should. For example it should be possible to encode np.array(["1800-01-01", "2262-01-01"], dtype="datetime64[ns]") by converting to datetime.datetime objects and passing to cftime.date2num, but this will currently fail if the user does not explicitly provide the encoding units—due to overflow, xarray will think nanosecond units are required when they really are not.

I have been aware of (1) for a while. I hadn't thought about (2) before. One way to address (2) would be to address (1), since it would prevent overflow in the units inference step, and fully eliminate the need to fall back to cftime to encode datetime64[ns] values for xarray-chosen units (for user-specified units that might still be required, but that would be fine).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-cftime topic-hypothesis Strategies or tests using the hypothesis library topic-testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants