Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for custom seasons spanning calendar years #423

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

tomvothecoder
Copy link
Collaborator

@tomvothecoder tomvothecoder commented Mar 3, 2023

Description

TODO:

  • Support custom seasons that span calendar years (_shift_spanning_months())
    • Requires detecting order of the months in a season. Currently, order does not matter.
    • For example, for custom_season = ["Nov", "Dec", "Jan", "Feb", "Mar"]:
      • ["Nov", "Dec"] are from the previous year since they are listed before "Jan"
      • ["Jan", "Feb", "Mar"] are from the current year
    • We can potentially extend _shift_decembers() to shift other months too. This method shifts the previous year December to the current year so xarray can properly group "DJF" seasons spanning calendar years.
  • Detect and drop incomplete seasons (_drop_incomplete_seasons())
    • Right now xCDAT only detects incomplete "DJF" seasons with _drop_incomplete_djf()
    • Replace boolean config drop_incomplete_djf with drop_incomplete_season
    • A possible solution for detecting incomplete seasons is to check if a season has all of the required months. If the count of months for that season does not match the expected count, then drop that season.
  • Remove requirement for all 12 months to be included in a custom season

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@tomvothecoder tomvothecoder added the type: enhancement New enhancement request label Mar 3, 2023
@tomvothecoder tomvothecoder self-assigned this Mar 3, 2023
@tomvothecoder tomvothecoder force-pushed the feature/416-custom-season-span branch from 96c5eca to fa087b7 Compare March 6, 2023 20:31
@tomvothecoder
Copy link
Collaborator Author

Example result of _drop_incomplete_seasons():

# Before dropping
# -----------------
# 2000-1, 2000-2, and 2001-12 months in incomplete "DJF" seasons" so they are dropped
ds.time
<xarray.DataArray 'time' (time: 15)>
array([cftime.DatetimeGregorian(2000, 1, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 2, 15, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 3, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 4, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 5, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 6, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 7, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 8, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 9, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 10, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 11, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 12, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2001, 1, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2001, 2, 15, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2001, 12, 16, 12, 0, 0, 0, has_year_zero=False)],
      dtype=object)
Coordinates:
  * time     (time) object 2000-01-16 12:00:00 ... 2001-12-16 12:00:00
Attributes:
    axis:           T
    long_name:      time
    standard_name:  time
    bounds:         time_bnds

# After dropping
# -----------------
ds_new.time
<xarray.DataArray 'time' (time: 12)>
array([cftime.DatetimeGregorian(2000, 3, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 4, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 5, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 6, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 7, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 8, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 9, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 10, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 11, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 12, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2001, 1, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2001, 2, 15, 0, 0, 0, 0, has_year_zero=False)],
      dtype=object)
Coordinates:
  * time     (time) object 2000-03-16 12:00:00 ... 2001-02-15 00:00:00
Attributes:
    axis:           T
    long_name:      time
    standard_name:  time
    bounds:         time_bnds

@tomvothecoder
Copy link
Collaborator Author

Hey @lee1043, this PR seemed to be mostly done when I stopped working on it last year. I just had to fix a few things and update the tests.

Would you like to check out this branch to test it out on real data? Also a code review would be appreciated.

@tomvothecoder tomvothecoder marked this pull request as ready for review April 4, 2024 21:50
Copy link

codecov bot commented Apr 4, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (cdae826) to head (388f4da).

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #423   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           15        15           
  Lines         1544      1577   +33     
=========================================
+ Hits          1544      1577   +33     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lee1043
Copy link
Collaborator

lee1043 commented Apr 4, 2024

@tomvothecoder sure, I will test it out and review. Thank you for the update!

@lee1043
Copy link
Collaborator

lee1043 commented Apr 4, 2024

@tomvothecoder Can this be considered for v0.7.0?

Copy link
Collaborator Author

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My PR self-review

Comment on lines +1012 to +1016
warnings.warn(
"The `season_config` argument 'drop_incomplete_djf' is being "
"deprecated. Please use 'drop_incomplete_seasons' instead.",
DeprecationWarning,
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Need to a specify a specific version that we will deprecate drop_incomplete_djf. Probably v0.8.0 or v0.9.0.

xcdat/temporal.py Show resolved Hide resolved
Comment on lines -1025 to -1028
if len(input_months) != len(predefined_months):
raise ValueError(
"Exactly 12 months were not passed in the list of custom seasons."
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed requirements for all 12 months to be included in a custom season

@tomvothecoder
Copy link
Collaborator Author

tomvothecoder commented Apr 4, 2024

@tomvothecoder Can this be considered for v0.7.0?

This PR still needs thorough review before I'm confident in merging it. I'll probably tag Steve at some point.

The release after v0.7.0 is more realistic and reasonable. We can always initiate a new release for this feature whenever it is merged.

@lee1043
Copy link
Collaborator

lee1043 commented Apr 4, 2024

@tomvothecoder Can this be considered for v0.7.0?

This PR still needs thorough review before I'm confident in merging it. I'll probably tag Steve at some point.

The release after v0.7.0 is more realistic and reasonable. We can always initiate a new release for this feature whenever it is merged.

@tomvothecoder no problem. Thank you for consideration.

@lee1043
Copy link
Collaborator

lee1043 commented Apr 4, 2024

@tomvothecoder it looks like when custom season go beyond calendar year (Nov, Dec, Jan) there is error as follows.

import os
import xcdat as xc

input_data = os.path.join(
    "/p/css03/esgf_publish/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/psl/gn/v20181218/",
    "psl_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201301-201312.nc")

ds = xc.open_mfdataset(input_data)

# Example of custom seasons in a three month format:
custom_seasons = [
    ['Dec', 'Jan'],
]

season_config = {'custom_seasons': custom_seasons, 'dec_mode': 'DJF', 'drop_incomplete_djf': True}

ds.temporal.group_average("psl", "season", season_config=season_config)
CPU times: user 448 ms, sys: 32.8 ms, total: 481 ms
Wall time: 471 ms
xarray.Dataset
Dimensions:
lat: 192bnds: 2lon: 384time: 4
Coordinates:
lat
(lat)
float64
-89.28 -88.36 ... 88.36 89.28
lon
(lon)
float64
0.0 0.9375 1.875 ... 358.1 359.1
time
(time)
object
2013-02-01 00:00:00 ... 2013-11-...
Data variables:
lat_bnds
(lat, bnds)
float64
dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds
(lon, bnds)
float64
dask.array<chunksize=(384, 2), meta=np.ndarray>
psl
(time, lat, lon)
float64
dask.array<chunksize=(1, 192, 384), meta=np.ndarray>
Indexes: (3)
Attributes: (44)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py:862, in DataArray._getitem_coord(self, key)
    861 try:
--> 862     var = self._coords[key]
    863 except KeyError:

KeyError: 'time.year'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<timed exec> in ?()

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, data_var, freq, weighted, keep_weights, season_config)
    400         }
    401         """
    402         self._set_data_var_attrs(data_var)
    403 
--> 404         return self._averager(
    405             data_var,
    406             "group_average",
    407             freq,

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, data_var, mode, freq, weighted, keep_weights, reference_period, season_config)
    870 
    871         if self._mode == "average":
    872             dv_avg = self._average(ds, data_var)
    873         elif self._mode in ["group_average", "climatology", "departures"]:
--> 874             dv_avg = self._group_average(ds, data_var)
    875 
    876         # The original time dimension is dropped from the dataset because
    877         # it becomes obsolete after the data variable is averaged. When the

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, ds, data_var)
   1295         dv = _get_data_var(ds, data_var)
   1296 
   1297         # Label the time coordinates for grouping weights and the data variable
   1298         # values.
-> 1299         self._labeled_time = self._label_time_coords(dv[self.dim])
   1300 
   1301         if self._weighted:
   1302             time_bounds = ds.bounds.get_bounds("T", var_key=data_var)

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, time_coords)
   1470         >>>       dtype='datetime64[ns]')
   1471         >>> Coordinates:
   1472         >>> * time     (time) datetime64[ns] 2000-01-01T00:00:00 ... 2000-04-01T00:00:00
   1473         """
-> 1474         df_dt_components: pd.DataFrame = self._get_df_dt_components(
   1475             time_coords, drop_obsolete_cols=True
   1476         )
   1477         dt_objects = self._convert_df_to_dt(df_dt_components)

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, time_coords, drop_obsolete_cols)
   1534 
   1535         # Use the TIME_GROUPS dictionary to determine which components
   1536         # are needed to form the labeled time coordinates.
   1537         for component in TIME_GROUPS[self._mode][self._freq]:
-> 1538             df[component] = time_coords[f"{self.dim}.{component}"].values
   1539 
   1540         # The season frequency requires additional datetime components for
   1541         # processing, which are later removed before time coordinates are

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py in ?(self, key)
    869     def __getitem__(self, key: Any) -> Self:
    870         if isinstance(key, str):
--> 871             return self._getitem_coord(key)
    872         else:
    873             # xarray-style array indexing
    874             return self.isel(indexers=self._item_key_to_dict(key))

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py in ?(self, key)
    861         try:
    862             var = self._coords[key]
    863         except KeyError:
    864             dim_sizes = dict(zip(self.dims, self.shape))
--> 865             _, key, var = _get_virtual_variable(self._coords, key, dim_sizes)
    866 
    867         return self._replace_maybe_drop_dims(var, name=key)

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataset.py in ?(variables, key, dim_sizes)
    212     if _contains_datetime_like_objects(ref_var):
    213         ref_var = DataArray(ref_var)
    214         data = getattr(ref_var.dt, var_name).data
    215     else:
--> 216         data = getattr(ref_var, var_name).data
    217     virtual_var = Variable(ref_var.dims, data)
    218 
    219     return ref_name, var_name, virtual_var

AttributeError: 'IndexVariable' object has no attribute 'year'
CPU times: user 240 ms, sys: 13.2 ms, total: 253 ms
Wall time: 249 ms
xarray.Dataset
Dimensions:
lat: 192bnds: 2lon: 384time: 1
Coordinates:
lat
(lat)
float64
-89.28 -88.36 ... 88.36 89.28
lon
(lon)
float64
0.0 0.9375 1.875 ... 358.1 359.1
time
(time)
object
2013-02-01 00:00:00
Data variables:
lat_bnds
(lat, bnds)
float64
dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds
(lon, bnds)
float64
dask.array<chunksize=(384, 2), meta=np.ndarray>
psl
(time, lat, lon)
float64
dask.array<chunksize=(1, 192, 384), meta=np.ndarray>
Indexes: (3)
Attributes: (44)
CPU times: user 106 ms, sys: 6.96 ms, total: 113 ms
Wall time: 110 ms
xarray.Dataset
Dimensions:
lat: 192bnds: 2lon: 384time: 1
Coordinates:
lat
(lat)
float64
-89.28 -88.36 ... 88.36 89.28
lon
(lon)
float64
0.0 0.9375 1.875 ... 358.1 359.1
time
(time)
object
2013-02-01 00:00:00
Data variables:
lat_bnds
(lat, bnds)
float64
dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds
(lon, bnds)
float64
dask.array<chunksize=(384, 2), meta=np.ndarray>
psl
(time, lat, lon)
float64
dask.array<chunksize=(1, 192, 384), meta=np.ndarray>
Indexes: (3)
Attributes: (44)

@tomvothecoder
Copy link
Collaborator Author

@lee1043 Thanks for trying to this out and providing an example script! I'll debug the stack trace.

@oliviermarti
Copy link

Hello,

All this features are very nice, and needed, improvements !

Wiil this be available soon ?

I've made some tests : it stiil doesn't work in xcdat 0.7.1.

Olivier

@tomvothecoder tomvothecoder force-pushed the feature/416-custom-season-span branch from 8262a8a to 45b5cb0 Compare July 30, 2024 18:44
@tomvothecoder
Copy link
Collaborator Author

Hi @oliviermarti, thanks for your interest in this feature. I've had minimal time to work on it and plan to ramp up development soon. I don't have a set timeline on when it will be done, but I'm hoping within the next few months.

@tomvothecoder
Copy link
Collaborator Author

tomvothecoder commented Jul 30, 2024

@tomvothecoder it looks like when custom season go beyond calendar year (Nov, Dec, Jan) there is error as follows.

import os
import xcdat as xc

input_data = os.path.join(
    "/p/css03/esgf_publish/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/psl/gn/v20181218/",
    "psl_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201301-201312.nc")

ds = xc.open_mfdataset(input_data)

# Example of custom seasons in a three month format:
custom_seasons = [
    ['Dec', 'Jan'],
]

season_config = {'custom_seasons': custom_seasons, 'dec_mode': 'DJF', 'drop_incomplete_djf': True}

ds.temporal.group_average("psl", "season", season_config=season_config)
CPU times: user 448 ms, sys: 32.8 ms, total: 481 ms
Wall time: 471 ms
xarray.Dataset
Dimensions:
lat: 192bnds: 2lon: 384time: 4
Coordinates:
lat
(lat)
float64
-89.28 -88.36 ... 88.36 89.28
lon
(lon)
float64
0.0 0.9375 1.875 ... 358.1 359.1
time
(time)
object
2013-02-01 00:00:00 ... 2013-11-...
Data variables:
lat_bnds
(lat, bnds)
float64
dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds
(lon, bnds)
float64
dask.array<chunksize=(384, 2), meta=np.ndarray>
psl
(time, lat, lon)
float64
dask.array<chunksize=(1, 192, 384), meta=np.ndarray>
Indexes: (3)
Attributes: (44)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py:862, in DataArray._getitem_coord(self, key)
    861 try:
--> 862     var = self._coords[key]
    863 except KeyError:

KeyError: 'time.year'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<timed exec> in ?()

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, data_var, freq, weighted, keep_weights, season_config)
    400         }
    401         """
    402         self._set_data_var_attrs(data_var)
    403 
--> 404         return self._averager(
    405             data_var,
    406             "group_average",
    407             freq,

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, data_var, mode, freq, weighted, keep_weights, reference_period, season_config)
    870 
    871         if self._mode == "average":
    872             dv_avg = self._average(ds, data_var)
    873         elif self._mode in ["group_average", "climatology", "departures"]:
--> 874             dv_avg = self._group_average(ds, data_var)
    875 
    876         # The original time dimension is dropped from the dataset because
    877         # it becomes obsolete after the data variable is averaged. When the

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, ds, data_var)
   1295         dv = _get_data_var(ds, data_var)
   1296 
   1297         # Label the time coordinates for grouping weights and the data variable
   1298         # values.
-> 1299         self._labeled_time = self._label_time_coords(dv[self.dim])
   1300 
   1301         if self._weighted:
   1302             time_bounds = ds.bounds.get_bounds("T", var_key=data_var)

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, time_coords)
   1470         >>>       dtype='datetime64[ns]')
   1471         >>> Coordinates:
   1472         >>> * time     (time) datetime64[ns] 2000-01-01T00:00:00 ... 2000-04-01T00:00:00
   1473         """
-> 1474         df_dt_components: pd.DataFrame = self._get_df_dt_components(
   1475             time_coords, drop_obsolete_cols=True
   1476         )
   1477         dt_objects = self._convert_df_to_dt(df_dt_components)

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, time_coords, drop_obsolete_cols)
   1534 
   1535         # Use the TIME_GROUPS dictionary to determine which components
   1536         # are needed to form the labeled time coordinates.
   1537         for component in TIME_GROUPS[self._mode][self._freq]:
-> 1538             df[component] = time_coords[f"{self.dim}.{component}"].values
   1539 
   1540         # The season frequency requires additional datetime components for
   1541         # processing, which are later removed before time coordinates are

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py in ?(self, key)
    869     def __getitem__(self, key: Any) -> Self:
    870         if isinstance(key, str):
--> 871             return self._getitem_coord(key)
    872         else:
    873             # xarray-style array indexing
    874             return self.isel(indexers=self._item_key_to_dict(key))

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py in ?(self, key)
    861         try:
    862             var = self._coords[key]
    863         except KeyError:
    864             dim_sizes = dict(zip(self.dims, self.shape))
--> 865             _, key, var = _get_virtual_variable(self._coords, key, dim_sizes)
    866 
    867         return self._replace_maybe_drop_dims(var, name=key)

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataset.py in ?(variables, key, dim_sizes)
    212     if _contains_datetime_like_objects(ref_var):
    213         ref_var = DataArray(ref_var)
    214         data = getattr(ref_var.dt, var_name).data
    215     else:
--> 216         data = getattr(ref_var, var_name).data
    217     virtual_var = Variable(ref_var.dims, data)
    218 
    219     return ref_name, var_name, virtual_var

AttributeError: 'IndexVariable' object has no attribute 'year'
CPU times: user 240 ms, sys: 13.2 ms, total: 253 ms
Wall time: 249 ms
xarray.Dataset
Dimensions:
lat: 192bnds: 2lon: 384time: 1
Coordinates:
lat
(lat)
float64
-89.28 -88.36 ... 88.36 89.28
lon
(lon)
float64
0.0 0.9375 1.875 ... 358.1 359.1
time
(time)
object
2013-02-01 00:00:00
Data variables:
lat_bnds
(lat, bnds)
float64
dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds
(lon, bnds)
float64
dask.array<chunksize=(384, 2), meta=np.ndarray>
psl
(time, lat, lon)
float64
dask.array<chunksize=(1, 192, 384), meta=np.ndarray>
Indexes: (3)
Attributes: (44)
CPU times: user 106 ms, sys: 6.96 ms, total: 113 ms
Wall time: 110 ms
xarray.Dataset
Dimensions:
lat: 192bnds: 2lon: 384time: 1
Coordinates:
lat
(lat)
float64
-89.28 -88.36 ... 88.36 89.28
lon
(lon)
float64
0.0 0.9375 1.875 ... 358.1 359.1
time
(time)
object
2013-02-01 00:00:00
Data variables:
lat_bnds
(lat, bnds)
float64
dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds
(lon, bnds)
float64
dask.array<chunksize=(384, 2), meta=np.ndarray>
psl
(time, lat, lon)
float64
dask.array<chunksize=(1, 192, 384), meta=np.ndarray>
Indexes: (3)
Attributes: (44)

The issue is that this dataset only has a single year of data and does not have a complete ["Dec", "Jan"] season with drop_incomplete_djf=True.

<xarray.DataArray 'time' (time: 12)> Size: 96B
array([cftime.DatetimeProlepticGregorian(2013, 1, 16, 12, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 2, 15, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 3, 16, 12, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 4, 16, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 5, 16, 12, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 6, 16, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 7, 16, 12, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 8, 16, 12, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 9, 16, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 10, 16, 12, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 11, 16, 0, 0, 0, 0, has_year_zero=True),
       cftime.DatetimeProlepticGregorian(2013, 12, 16, 12, 0, 0, 0, has_year_zero=True)],
      dtype=object)
Coordinates:
  * time     (time) object 96B 2013-01-16 12:00:00 ... 2013-12-16 12:00:00
Attributes:
    bounds:         time_bnds
    axis:           T
    long_name:      time
    standard_name:  time

Resulting time coordinates after subsetting for Dec and Jan coordinates and options below:

  • custom_seasons=["Dec", "Jan"] is a season that spans the calendar year. Months that span the calendar year will shift from the previous year to the next year for proper seasonal grouping.
  • drop_incomplete_djf=True (now drop_incomplete_seasons=True) drops incomplete seasons (not all monthly time coordinates found to make up a season)
   year  season  month  expected_months  actual_months
0  2013  DecJan      1                2              1
1  2014  DecJan     12                2              1

Since expected and actual months don't align, the coordinates are dropped. This results in no time coordinates remaining leading to AttributeError: 'IndexVariable' object has no attribute 'year'

Workaround -- specify drop_incomplete_seasons=False

I also added a clear error message when no complete seasons are found (commit).

@tomvothecoder tomvothecoder force-pushed the feature/416-custom-season-span branch from f6858ff to ec14495 Compare July 30, 2024 19:32
@tomvothecoder
Copy link
Collaborator Author

After thorough testing, I actually think this PR is close to complete.

@tomvothecoder
Copy link
Collaborator Author

Hey @oliviermarti, @arfriedman, and @DamienIrving, would any of you be interested in checking out this branch to test the custom season functionality? If so, please check out our contributing guide. Thanks!

@oliviermarti
Copy link

@tomvothecoder : this version seems perfect to me :-)

(I've got the branch feature/416-custom-season-span : correct ? )

Thanks !!

I join my very simple test case, which runs well :

import os, sys
import numpy as np, xarray as xr, xcdat as xc

# Version check
print ( f'{sys.version = }' )

print ( f'{np.__version__ = }' )
print ( f'{xr.__version__ = }' )
print ( f'{xc.__version__ = }' )
print ( f'{xc.__file__ = }' )

# Creates time axis
nt=36

time = cftime.num2date ( np.arange(36,dtype=float)*30.0+15, units="day since 2000-01-01", calendar='360_day', only_use_cftime_datetimes=True,
                only_use_python_datetimes=False, has_year_zero=None)

time = xr.DataArray (time, dims=('time',), coords=(time,) )

# Creates two tests variables
nmonth = np.arange (nt,dtype=float) + 1
nmod   = np.mod (np.arange (nt,dtype=float), 12) +1

nmonth = xr.DataArray (nmonth, dims=('time',), coords=(time,) )
nmod   = xr.DataArray (nmod  , dims=('time',), coords=(time,) )

# Creates dataset
dd = xr.Dataset ( {'nmonth': nmonth, 'nmod':nmod})

# Writes dataset and open it with xcdat
if os.path.exists ('test.nc') : os.remove ('test.nc')
dd.to_netcdf ( 'test.nc', mode='w')
dx = xc.open_dataset ('test.nc',  use_cftime=True, decode_times=True).bounds.add_missing_bounds()

print ( 'Check variables')
print (dx.nmonth.values)
print (dx.nmod.values)

print ( 'Year means are correct')
print (dx.temporal.group_average("nmonth", freq="year", weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod"  , freq="year", weighted=True)['nmod'  ].values)


print ( ' Standard seasonnal means are correct')
print (dx.temporal.group_average("nmonth", freq="season", weighted=True,
                                season_config={'drop_incomplete_djf':False, 'dec_mode': 'DJF',})['nmonth'].values)
print (dx.temporal.group_average("nmod"  , freq="season", weighted=True,
                                season_config={'drop_incomplete_djf':False, 'dec_mode': 'DJF',})['nmod'  ].values)

print (dx.temporal.group_average("nmonth", freq="season", weighted=True, 
                                 season_config={'drop_incomplete_djf':True, 'dec_mode': 'DJF',})['nmonth'].values)
print (dx.temporal.group_average("nmod"  , freq="season", weighted=True, 
                                 season_config={'drop_incomplete_djf':True, 'dec_mode': 'DJF',})['nmod'  ].values)


print ( '3 months custom seasons are correct' )

custom_seasons = [  ["Dec", "Jan", "Feb"], ]

print (dx.temporal.group_average("nmonth", freq="season", 
                                 season_config={'drop_incomplete_djf':False, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',}, 
                                 weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod"  , freq="season", 
                                 season_config={'drop_incomplete_djf':False, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',}, 
                                 weighted=True)['nmod'  ].values)

print (dx.temporal.group_average("nmonth", freq="season", 
                                 season_config={'drop_incomplete_djf':True, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',}, 
                                 weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod"  , freq="season", 
                                 season_config={'drop_incomplete_djf':True, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',}, 
                                 weighted=True)['nmod'  ].values)


print ( '4 months custom seasons are correct' )

custom_seasons = [  ["Dec", "Jan", "Feb", "Mar"], ]

print (dx.temporal.group_average("nmonth", freq="season", 
                                 season_config={'drop_incomplete_djf':False, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',}, 
                                 weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod"  , freq="season", 
                                 season_config={'drop_incomplete_djf':False, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',}, 
                                 weighted=True)['nmod'  ].values)


print (dx.temporal.group_average("nmonth", freq="season", 
                                 season_config={'drop_incomplete_djf':True, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',}, 
                                 weighted=True)['nmonth'].values)
print (dx.temporal.group_average("nmod"  , freq="season", 
                                 season_config={'drop_incomplete_djf':True, 'custom_seasons':custom_seasons, 'dec_mode': 'DJF',}, 
                                 weighted=True)['nmod'  ].values)

Results :

sys.version = '3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:45:13) [Clang 16.0.6 ]'
np.__version__ = '1.26.4'
xr.__version__ = '2024.7.0'
xc.__version__ = '0.7.1'
xc.__file__ = '/Users/marti/Unix/XCDAT/xcdat/__init__.py'
Check variables
[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17. 18.
 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36.]
[ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12.  1.  2.  3.  4.  5.  6.
  7.  8.  9. 10. 11. 12.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12.]
Year means are correct
[ 6.5 18.5 30.5]
[6.5 6.5 6.5]
 Standard seasonnal means are correct
[ 1.5  4.   7.  10.  13.  16.  19.  22.  25.  28.  31.  34.  36. ]
[ 1.5  4.   7.  10.   5.   4.   7.  10.   5.   4.   7.  10.  12. ]
[ 4.  7. 10. 13. 16. 19. 22. 25. 28. 31. 34.]
[ 4.  7. 10.  5.  4.  7. 10.  5.  4.  7. 10.]
3 months custom seasons are correct
[ 1.5 13.  25.  36. ]
[ 1.5  5.   5.  12. ]
[13. 25.]
[5. 5.]
4 months custom seasons are correct
[ 2.  13.5 25.5 36. ]
[ 2.   4.5  4.5 12. ]
[13.5 25.5]
[4.5 4.5]

@tomvothecoder
Copy link
Collaborator Author

@tomvothecoder : this version seems perfect to me :-)

(I've got the branch feature/416-custom-season-span : correct ? )

Thanks !!

I join my very simple test case, which runs well :

Yes that is the correct branch. I'm happy to see your test cases pass! If you're interested, it'd be helpful to test on some real data too.

@oliviermarti
Copy link

@tomvothecoder : this version seems perfect to me :-)
(I've got the branch feature/416-custom-season-span : correct ? )
Thanks !!
I join my very simple test case, which runs well :

Yes that is the correct branch. I'm happy to see your test cases pass! If you're interested, it'd be helpful to test on some real data too.

Tom,

On real data, version 0.7.1 returns a result, but a wrong result. It's hards to have an easy check on real data. Hence this academic test case.

I've made a fews test on real data, from ESGF, and raw data from my IPSL model. It seems OK.

I've got problems with the IPSL model, but it's an another story outside the present scope : IPSL raw outputs have one time axis 'time_counter', but two time variables : time_counter and time_centered. add_boundsworks nicely on both, but in the end I got :
File "/Users/marti/Unix/XCDAT/xcdat/temporal.py", line 1163, in _subset_coords_for_custom_seasons coords_by_month = ds.time.groupby(f"{self.dim}.month").groups
I have to rename time_counter to time variable to make it work.

IPSL outputs processed and available on ESGF are OK.

@tomvothecoder
Copy link
Collaborator Author

On real data, version 0.7.1 returns a result, but a wrong result. It's hards to have an easy check on real data. Hence this academic test case.

I've made a fews test on real data, from ESGF, and raw data from my IPSL model. It seems OK.

Thank you. @lee1043 is also going to test on CMIP data.

I've got problems with the IPSL model, but it's an another story outside the present scope : IPSL raw outputs have one time axis 'time_counter', but two time variables : time_counter and time_centered. add_boundsworks nicely on both, but in the end I got : File "/Users/marti/Unix/XCDAT/xcdat/temporal.py", line 1163, in _subset_coords_for_custom_seasons coords_by_month = ds.time.groupby(f"{self.dim}.month").groups I have to rename time_counter to time variable to make it work.

IPSL outputs processed and available on ESGF are OK.

I pushed 388f4da (#423) to fix #684. Your example code in that issue now works.

@arfriedman
Copy link

Hey @oliviermarti, @arfriedman, and @DamienIrving, would any of you be interested in checking out this branch to test the custom season functionality? If so, please check out our contributing guide. Thanks!

Thank you @tomvothecoder! -- I will take a look.

@dcherian
Copy link

Here's an upstream version: pydata/xarray#9524 . I could use some help testing it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New enhancement request
Projects
Status: In Review
5 participants