Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax nanosecond datetime restriction in CF time coding #9618

Open
wants to merge 50 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 25 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
7b5f323
implement default_precision_timestamp, refactor coding/times.py and c…
kmuehlbauer Oct 10, 2024
8784f33
align tests with new time resolution behaviour
kmuehlbauer Oct 10, 2024
b45ab23
timedelta decoding, fsspec handling
kmuehlbauer Oct 10, 2024
39086ef
fixes in coding/times.py
kmuehlbauer Oct 13, 2024
df49a40
add docs on time coding
kmuehlbauer Oct 13, 2024
adb8ca3
attempt fixing doc tests
kmuehlbauer Oct 13, 2024
266b1ed
fix issue where out-of-bounds floating point values slipped in the pr…
kmuehlbauer Oct 14, 2024
6d5f13b
convert to UTC first before stripping of tz in _unpack_time_units_and…
kmuehlbauer Oct 14, 2024
5d68bfe
reorganize pandas compatibility code, remove unneeded code, attempt t…
kmuehlbauer Oct 14, 2024
07bba69
another attempt to finally fix mypy
kmuehlbauer Oct 14, 2024
6e7f0bb
refactor out _check_date_is_after_shift
kmuehlbauer Oct 14, 2024
b4a49bb
refactor out _maybe_strip_tz_from_timestamp
kmuehlbauer Oct 14, 2024
2e1ff4f
more refactoring in coding.times.py
kmuehlbauer Oct 14, 2024
d5a7da0
more refactoring in coding.times.py
kmuehlbauer Oct 14, 2024
821b68d
minor fix in time-coding.rst
kmuehlbauer Oct 14, 2024
d066edf
set default resolution to "s", which actually means, use pandas lowes…
kmuehlbauer Oct 14, 2024
ed22da1
Add section for default units, fix options
kmuehlbauer Oct 14, 2024
8bf23f4
attempt to fix typing
kmuehlbauer Oct 14, 2024
c3a2b39
attempt to fix typing
kmuehlbauer Oct 14, 2024
3c44aed
fix scalar datetime/timedelta
kmuehlbauer Oct 15, 2024
48be73a
fix user docs
kmuehlbauer Oct 15, 2024
7ac9983
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Oct 18, 2024
d86ad04
Fix variable tests, mostly datetime/timedelta is inittialized with us…
kmuehlbauer Oct 18, 2024
b5d0795
revert changes in _possible_convert_objects, this needs to be checked…
kmuehlbauer Oct 18, 2024
60324f0
fix doc link
kmuehlbauer Oct 18, 2024
6f2861a
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 8, 2024
1f07500
Apply suggestions from code review
kmuehlbauer Nov 8, 2024
798b444
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 8, 2024
f487599
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 16, 2024
20d6c9d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 16, 2024
7391948
remove outdated description
kmuehlbauer Nov 16, 2024
308091c
use set instead list
kmuehlbauer Nov 16, 2024
5f40b4e
remove global option
kmuehlbauer Nov 16, 2024
2a65d8d
mypy thinks `unit` is Literal, because the pandas-stubs suggest so, b…
kmuehlbauer Nov 17, 2024
43f7d61
ignore mypy arg-type
kmuehlbauer Nov 17, 2024
59934b9
fix docstring of `default_precision_timestamp`
kmuehlbauer Nov 17, 2024
a01f9f3
add 'time_unit'-kwarg to decode_cf and descendent functions with "ns"…
kmuehlbauer Nov 17, 2024
8b91128
fix tests
kmuehlbauer Nov 17, 2024
0e351ca
fix more tests
kmuehlbauer Nov 17, 2024
07a8e9c
fix docstring
kmuehlbauer Nov 17, 2024
2be5739
use pd.Timestamp(np.datetime64(cftime)) to convert from cftime to numpy
kmuehlbauer Nov 17, 2024
b9d0a8e
use dt = np.datetime64(cftime.isoformat()) to convert from cftime to …
kmuehlbauer Nov 18, 2024
08afc3b
fix time-coding.rst
kmuehlbauer Nov 18, 2024
edc55e1
use us in to_datetimeindex
kmuehlbauer Nov 18, 2024
bffe919
revert back to us for datetimeindex tests
kmuehlbauer Nov 18, 2024
150b982
estimate fitting resolution for floating point values, when decoding …
kmuehlbauer Nov 18, 2024
7113ceb
add test
kmuehlbauer Nov 18, 2024
7f47f0b
refactor floating point decoding
kmuehlbauer Nov 18, 2024
512808d
Merge branch 'main' into any-time-resolution-2
kmuehlbauer Nov 18, 2024
63c83f4
simplify recursive function, update tests
kmuehlbauer Nov 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/internals/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,4 @@ The pages in this section are intended for:
how-to-add-new-backend
how-to-create-custom-index
zarr-encoding-spec
time-coding
442 changes: 442 additions & 0 deletions doc/internals/time-coding.rst

Large diffs are not rendered by default.

34 changes: 20 additions & 14 deletions doc/user-guide/time-series.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,20 +21,32 @@ core functionality.
Creating datetime64 data
------------------------

Xarray uses the numpy dtypes ``datetime64[ns]`` and ``timedelta64[ns]`` to
represent datetime data, which offer vectorized (if sometimes buggy) operations
with numpy and smooth integration with pandas.
Xarray uses the numpy dtypes ``datetime64[unit]`` and ``timedelta64[unit]``
(where unit is anything of "s", "ms", "us" and "ns") to represent datetime
data, which offer vectorized (if sometimes buggy) operations with numpy and
kmuehlbauer marked this conversation as resolved.
Show resolved Hide resolved
smooth integration with pandas.

To convert to or create regular arrays of ``datetime64`` data, we recommend
using :py:func:`pandas.to_datetime` and :py:func:`pandas.date_range`:

.. ipython:: python

pd.to_datetime(["2000-01-01", "2000-02-02"])
pd.DatetimeIndex(
["2000-01-01 00:00:00", "2000-02-02 00:00:00"], dtype="datetime64[s]"
)
pd.date_range("2000-01-01", periods=365)
pd.date_range("2000-01-01", periods=365, unit="s")

.. note::
Care has to be taken to create the output with the wanted resolution.
For :py:func:`pandas.date_range` the ``unit``-kwarg has to be specified
and for :py:func:`pandas.to_datetime` the selection of the resolution
isn't possible at all. For that :py:class:`pd.DatetimeIndex` can be used
directly.

Alternatively, you can supply arrays of Python ``datetime`` objects. These get
converted automatically when used as arguments in xarray objects:
converted automatically when used as arguments in xarray objects (with us-resolution):

.. ipython:: python

Expand All @@ -51,7 +63,7 @@ attribute like ``'days since 2000-01-01'``).
.. note::

When decoding/encoding datetimes for non-standard calendars or for dates
before year 1678 or after year 2262, xarray uses the `cftime`_ library.
before 1582-10-15, xarray uses the `cftime`_ library.
It was previously packaged with the ``netcdf4-python`` package under the
name ``netcdftime`` but is now distributed separately. ``cftime`` is an
:ref:`optional dependency<installing>` of xarray.
Expand All @@ -68,15 +80,9 @@ You can manual decode arrays in this form by passing a dataset to
ds = xr.Dataset({"time": ("time", [0, 1, 2, 3], attrs)})
xr.decode_cf(ds)

One unfortunate limitation of using ``datetime64[ns]`` is that it limits the
native representation of dates to those that fall between the years 1678 and
2262. When a netCDF file contains dates outside of these bounds, dates will be
returned as arrays of :py:class:`cftime.datetime` objects and a :py:class:`~xarray.CFTimeIndex`
will be used for indexing. :py:class:`~xarray.CFTimeIndex` enables a subset of
the indexing functionality of a :py:class:`pandas.DatetimeIndex` and is only
fully compatible with the standalone version of ``cftime`` (not the version
packaged with earlier versions ``netCDF4``). See :ref:`CFTimeIndex` for more
information.
From xarray 2024.11 the resolution of the dates can be tuned between "s", "ms", "us" and "ns". One limitation of using ``datetime64[ns]`` is that it limits the native representation of dates to those that fall between the years 1678 and 2262, which gets increased significantly with lower resolutions. When a netCDF file contains dates outside of these bounds (or dates < 1582-10-15), dates will be returned as arrays of :py:class:`cftime.datetime` objects and a :py:class:`~xarray.CFTimeIndex` will be used for indexing.
:py:class:`~xarray.CFTimeIndex` enables a subset of the indexing functionality of a :py:class:`pandas.DatetimeIndex` and is only fully compatible with the standalone version of ``cftime`` (not the version packaged with earlier versions ``netCDF4``).
kmuehlbauer marked this conversation as resolved.
Show resolved Hide resolved
See :ref:`CFTimeIndex` for more information.

Datetime indexing
-----------------
Expand Down
23 changes: 9 additions & 14 deletions doc/user-guide/weather-climate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Weather and climate data

import xarray as xr

Xarray can leverage metadata that follows the `Climate and Forecast (CF) conventions`_ if present. Examples include :ref:`automatic labelling of plots<plotting>` with descriptive names and units if proper metadata is present and support for non-standard calendars used in climate science through the ``cftime`` module(Explained in the :ref:`CFTimeIndex` section). There are also a number of :ref:`geosciences-focused projects that build on xarray<ecosystem>`.
Xarray can leverage metadata that follows the `Climate and Forecast (CF) conventions`_ if present. Examples include :ref:`automatic labelling of plots<plotting>` with descriptive names and units if proper metadata is present and support for non-standard calendars used in climate science through the ``cftime`` module (explained in the :ref:`CFTimeIndex` section). There are also a number of :ref:`geosciences-focused projects that build on xarray<ecosystem>`.

.. _Climate and Forecast (CF) conventions: https://cfconventions.org

Expand Down Expand Up @@ -64,8 +64,7 @@ Through the standalone ``cftime`` library and a custom subclass of
:py:class:`pandas.Index`, xarray supports a subset of the indexing
functionality enabled through the standard :py:class:`pandas.DatetimeIndex` for
dates from non-standard calendars commonly used in climate science or dates
using a standard calendar, but outside the `nanosecond-precision range`_
(approximately between years 1678 and 2262).
using a standard calendar, but outside the `precision range`_ and dates prior 1582-10-15.

.. note::

Expand All @@ -75,18 +74,14 @@ using a standard calendar, but outside the `nanosecond-precision range`_
any of the following are true:

- The dates are from a non-standard calendar
- Any dates are outside the nanosecond-precision range.
- Any dates are outside the nanosecond-precision range (prior xarray version 2024.11)
- Any dates are outside the time span limited by the resolution (from xarray version v2024.11)

Otherwise pandas-compatible dates from a standard calendar will be
represented with the ``np.datetime64[ns]`` data type, enabling the use of a
:py:class:`pandas.DatetimeIndex` or arrays with dtype ``np.datetime64[ns]``
and their full set of associated features.
represented with the ``np.datetime64[unit]`` data type (where unit can be any of ["s", "ms", "us", "ns"], enabling the use of a :py:class:`pandas.DatetimeIndex` or arrays with dtype ``np.datetime64[unit]`` and their full set of associated features.

As of pandas version 2.0.0, pandas supports non-nanosecond precision datetime
values. For the time being, xarray still automatically casts datetime values
to nanosecond-precision for backwards compatibility with older pandas
versions; however, this is something we would like to relax going forward.
See :issue:`7493` for more discussion.
values. From xarray version 2024.11 the relaxed non-nanosecond precision datetime values will be used.

For example, you can create a DataArray indexed by a time
coordinate with dates from a no-leap calendar and a
Expand Down Expand Up @@ -115,7 +110,7 @@ instance, we can create the same dates and DataArray we created above using:
Mirroring pandas' method with the same name, :py:meth:`~xarray.infer_freq` allows one to
infer the sampling frequency of a :py:class:`~xarray.CFTimeIndex` or a 1-D
:py:class:`~xarray.DataArray` containing cftime objects. It also works transparently with
``np.datetime64[ns]`` and ``np.timedelta64[ns]`` data.
``np.datetime64`` and ``np.timedelta64`` data (with "s", "ms", "us" or "ns" resolution).

.. ipython:: python

Expand All @@ -137,7 +132,7 @@ Conversion between non-standard calendar and to/from pandas DatetimeIndexes is
facilitated with the :py:meth:`xarray.Dataset.convert_calendar` method (also available as
:py:meth:`xarray.DataArray.convert_calendar`). Here, like elsewhere in xarray, the ``use_cftime``
argument controls which datetime backend is used in the output. The default (``None``) is to
use `pandas` when possible, i.e. when the calendar is standard and dates are within 1678 and 2262.
use `pandas` when possible, i.e. when the calendar is standard and dates starting with 1582-10-15.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should note that with a proleptic Gregorian calendar there is no date restriction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is only a limitation for standard/gregorian. This needs change.


.. ipython:: python

Expand Down Expand Up @@ -241,6 +236,6 @@ For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:

da.resample(time="81min", closed="right", label="right", offset="3min").mean()

.. _nanosecond-precision range: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations
.. _precision range: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations
.. _ISO 8601 standard: https://en.wikipedia.org/wiki/ISO_8601
.. _partial datetime string indexing: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#partial-string-indexing
18 changes: 4 additions & 14 deletions xarray/coding/cftime_offsets.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@
from xarray.core.pdcompat import (
NoDefault,
count_not_none,
nanosecond_precision_timestamp,
default_precision_timestamp,
no_default,
)
from xarray.core.utils import emit_user_level_warning
Expand All @@ -83,21 +83,13 @@
T_FreqStr = TypeVar("T_FreqStr", str, None)


def _nanosecond_precision_timestamp(*args, **kwargs):
# As of pandas version 3.0, pd.to_datetime(Timestamp(...)) will try to
# infer the appropriate datetime precision. Until xarray supports
# non-nanosecond precision times, we will use this constructor wrapper to
# explicitly create nanosecond-precision Timestamp objects.
return pd.Timestamp(*args, **kwargs).as_unit("ns")


def get_date_type(calendar, use_cftime=True):
"""Return the cftime date type for a given calendar name."""
if cftime is None:
raise ImportError("cftime is required for dates with non-standard calendars")
else:
if _is_standard_calendar(calendar) and not use_cftime:
return _nanosecond_precision_timestamp
return default_precision_timestamp

calendars = {
"noleap": cftime.DatetimeNoLeap,
Expand Down Expand Up @@ -1475,10 +1467,8 @@ def date_range_like(source, calendar, use_cftime=None):
if is_np_datetime_like(source.dtype):
# We want to use datetime fields (datetime64 object don't have them)
source_calendar = "standard"
# TODO: the strict enforcement of nanosecond precision Timestamps can be
# relaxed when addressing GitHub issue #7493.
source_start = nanosecond_precision_timestamp(source_start)
source_end = nanosecond_precision_timestamp(source_end)
source_start = default_precision_timestamp(source_start)
source_end = default_precision_timestamp(source_end)
else:
if isinstance(source, CFTimeIndex):
source_calendar = source.calendar
Expand Down
2 changes: 1 addition & 1 deletion xarray/coding/cftimeindex.py
Original file line number Diff line number Diff line change
Expand Up @@ -646,7 +646,7 @@ def to_datetimeindex(self, unsafe=False):
CFTimeIndex([2000-01-01 00:00:00, 2000-01-02 00:00:00],
dtype='object', length=2, calendar='standard', freq=None)
>>> times.to_datetimeindex()
DatetimeIndex(['2000-01-01', '2000-01-02'], dtype='datetime64[ns]', freq=None)
DatetimeIndex(['2000-01-01', '2000-01-02'], dtype='datetime64[us]', freq=None)
"""

if not self._data.size:
Expand Down
Loading
Loading