Add support for custom seasons spanning calendar years #423

tomvothecoder · 2023-03-03T00:55:53Z

Description

TODO:

Support custom seasons that span calendar years (_shift_spanning_months())
- Requires detecting order of the months in a season. Currently, order does not matter.
- For example, for custom_season = ["Nov", "Dec", "Jan", "Feb", "Mar"]:
  - ["Nov", "Dec"] are from the previous year since they are listed before "Jan"
  - ["Jan", "Feb", "Mar"] are from the current year
  - Therefore, ["Nov", "Dec"] need to be shifted a year forward for correct
    grouping.
Detect and drop incomplete seasons (_drop_incomplete_seasons())
- Right now xCDAT only detects incomplete "DJF" seasons with _drop_incomplete_djf()
- Replace boolean config drop_incomplete_djf with drop_incomplete_season
- A possible solution for detecting incomplete seasons is to check if a season has all of the required months. If the count of months for that season does not match the expected count, then drop that season.
Remove requirement for all 12 months to be included in a custom season
Refactor logic for shifting months to use Xarray instead of Pandas
The current logic maps the custom season to its middle month, represented as cftime time coordinates. Does it make sense to also keep the custom seasons with the time coordinates, similar to what Xarray does?
```
  <xarray.DataArray 'season' (season: 2)> Size: 32B
  array(['DJFM', 'AMJ'], dtype='<U4')
  Coordinates:
    * season   (season) <U4 32B 'DJFM' 'AMJ'
```

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules

If applicable:

I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass with my changes (locally and CI/CD build)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

Additional Context

Google Slides explaining logic
Refactoring this PR to use Add SeasonGrouper, SeasonResampler pydata/xarray#9524 will most likely require addressing [Refactor]: Consider using flox and xr.resample() to improve temporal averaging grouping logic #217, which involves extensive refactoring in how the time coordinates are pre-processed based on the averaging mode and frequency. As of xarray >=2024.09.0, Xarray supports grouping by multiple variables too.
https://docs.xarray.dev/en/stable/user-guide/groupby.html#grouping-by-multiple-variables

tomvothecoder · 2023-03-06T20:43:38Z

Example result of _drop_incomplete_seasons():

# Before dropping
# -----------------
# 2000-1, 2000-2, and 2001-12 months in incomplete "DJF" seasons" so they are dropped
ds.time
<xarray.DataArray 'time' (time: 15)>
array([cftime.DatetimeGregorian(2000, 1, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 2, 15, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 3, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 4, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 5, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 6, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 7, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 8, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 9, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 10, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 11, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 12, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2001, 1, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2001, 2, 15, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2001, 12, 16, 12, 0, 0, 0, has_year_zero=False)],
      dtype=object)
Coordinates:
  * time     (time) object 2000-01-16 12:00:00 ... 2001-12-16 12:00:00
Attributes:
    axis:           T
    long_name:      time
    standard_name:  time
    bounds:         time_bnds

# After dropping
# -----------------
ds_new.time
<xarray.DataArray 'time' (time: 12)>
array([cftime.DatetimeGregorian(2000, 3, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 4, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 5, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 6, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 7, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 8, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 9, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 10, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 11, 16, 0, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2000, 12, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2001, 1, 16, 12, 0, 0, 0, has_year_zero=False),
       cftime.DatetimeGregorian(2001, 2, 15, 0, 0, 0, 0, has_year_zero=False)],
      dtype=object)
Coordinates:
  * time     (time) object 2000-03-16 12:00:00 ... 2001-02-15 00:00:00
Attributes:
    axis:           T
    long_name:      time
    standard_name:  time
    bounds:         time_bnds

tomvothecoder · 2024-04-04T21:50:11Z

Hey @lee1043, this PR seemed to be mostly done when I stopped working on it last year. I just had to fix a few things and update the tests.

Would you like to check out this branch to test it out on real data? Also a code review would be appreciated.

codecov · 2024-04-04T21:51:20Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (1cfd369) to head (8d156c2).

Additional details and impacted files

@@            Coverage Diff            @@
##              main      #423   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           15        15           
  Lines         1555      1614   +59     
=========================================
+ Hits          1555      1614   +59

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lee1043 · 2024-04-04T21:52:26Z

@tomvothecoder sure, I will test it out and review. Thank you for the update!

lee1043 · 2024-04-04T21:53:14Z

@tomvothecoder Can this be considered for v0.7.0?

tomvothecoder

My PR self-review

tomvothecoder · 2024-04-04T21:52:47Z

xcdat/temporal.py

+            warnings.warn(
+                "The `season_config` argument 'drop_incomplete_djf' is being "
+                "deprecated. Please use 'drop_incomplete_seasons' instead.",
+                DeprecationWarning,
+            )


TODO: Need to a specify a specific version that we will deprecate drop_incomplete_djf. Probably v0.8.0 or v0.9.0.

xcdat/temporal.py

tomvothecoder · 2024-04-04T21:55:48Z

xcdat/temporal.py

-        if len(input_months) != len(predefined_months):
-            raise ValueError(
-                "Exactly 12 months were not passed in the list of custom seasons."
-            )


Removed requirements for all 12 months to be included in a custom season

tomvothecoder · 2024-04-04T22:15:10Z

@tomvothecoder Can this be considered for v0.7.0?

This PR still needs thorough review before I'm confident in merging it. I'll probably tag Steve at some point.

The release after v0.7.0 is more realistic and reasonable. We can always initiate a new release for this feature whenever it is merged.

lee1043 · 2024-04-04T22:16:44Z

@tomvothecoder Can this be considered for v0.7.0?

This PR still needs thorough review before I'm confident in merging it. I'll probably tag Steve at some point.

The release after v0.7.0 is more realistic and reasonable. We can always initiate a new release for this feature whenever it is merged.

@tomvothecoder no problem. Thank you for consideration.

lee1043 · 2024-04-04T23:21:39Z

@tomvothecoder it looks like when custom season go beyond calendar year (Nov, Dec, Jan) there is error as follows.

import os
import xcdat as xc

input_data = os.path.join(
    "/p/css03/esgf_publish/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/psl/gn/v20181218/",
    "psl_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201301-201312.nc")

ds = xc.open_mfdataset(input_data)

# Example of custom seasons in a three month format:
custom_seasons = [
    ['Dec', 'Jan'],
]

season_config = {'custom_seasons': custom_seasons, 'dec_mode': 'DJF', 'drop_incomplete_djf': True}

ds.temporal.group_average("psl", "season", season_config=season_config)

CPU times: user 448 ms, sys: 32.8 ms, total: 481 ms
Wall time: 471 ms
xarray.Dataset
Dimensions:
lat: 192bnds: 2lon: 384time: 4
Coordinates:
lat
(lat)
float64
-89.28 -88.36 ... 88.36 89.28
lon
(lon)
float64
0.0 0.9375 1.875 ... 358.1 359.1
time
(time)
object
2013-02-01 00:00:00 ... 2013-11-...
Data variables:
lat_bnds
(lat, bnds)
float64
dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds
(lon, bnds)
float64
dask.array<chunksize=(384, 2), meta=np.ndarray>
psl
(time, lat, lon)
float64
dask.array<chunksize=(1, 192, 384), meta=np.ndarray>
Indexes: (3)
Attributes: (44)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py:862, in DataArray._getitem_coord(self, key)
    861 try:
--> 862     var = self._coords[key]
    863 except KeyError:

KeyError: 'time.year'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
<timed exec> in ?()

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, data_var, freq, weighted, keep_weights, season_config)
    400         }
    401         """
    402         self._set_data_var_attrs(data_var)
    403 
--> 404         return self._averager(
    405             data_var,
    406             "group_average",
    407             freq,

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, data_var, mode, freq, weighted, keep_weights, reference_period, season_config)
    870 
    871         if self._mode == "average":
    872             dv_avg = self._average(ds, data_var)
    873         elif self._mode in ["group_average", "climatology", "departures"]:
--> 874             dv_avg = self._group_average(ds, data_var)
    875 
    876         # The original time dimension is dropped from the dataset because
    877         # it becomes obsolete after the data variable is averaged. When the

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, ds, data_var)
   1295         dv = _get_data_var(ds, data_var)
   1296 
   1297         # Label the time coordinates for grouping weights and the data variable
   1298         # values.
-> 1299         self._labeled_time = self._label_time_coords(dv[self.dim])
   1300 
   1301         if self._weighted:
   1302             time_bounds = ds.bounds.get_bounds("T", var_key=data_var)

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, time_coords)
   1470         >>>       dtype='datetime64[ns]')
   1471         >>> Coordinates:
   1472         >>> * time     (time) datetime64[ns] 2000-01-01T00:00:00 ... 2000-04-01T00:00:00
   1473         """
-> 1474         df_dt_components: pd.DataFrame = self._get_df_dt_components(
   1475             time_coords, drop_obsolete_cols=True
   1476         )
   1477         dt_objects = self._convert_df_to_dt(df_dt_components)

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xcdat/temporal.py in ?(self, time_coords, drop_obsolete_cols)
   1534 
   1535         # Use the TIME_GROUPS dictionary to determine which components
   1536         # are needed to form the labeled time coordinates.
   1537         for component in TIME_GROUPS[self._mode][self._freq]:
-> 1538             df[component] = time_coords[f"{self.dim}.{component}"].values
   1539 
   1540         # The season frequency requires additional datetime components for
   1541         # processing, which are later removed before time coordinates are

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py in ?(self, key)
    869     def __getitem__(self, key: Any) -> Self:
    870         if isinstance(key, str):
--> 871             return self._getitem_coord(key)
    872         else:
    873             # xarray-style array indexing
    874             return self.isel(indexers=self._item_key_to_dict(key))

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataarray.py in ?(self, key)
    861         try:
    862             var = self._coords[key]
    863         except KeyError:
    864             dim_sizes = dict(zip(self.dims, self.shape))
--> 865             _, key, var = _get_virtual_variable(self._coords, key, dim_sizes)
    866 
    867         return self._replace_maybe_drop_dims(var, name=key)

~/.conda/envs/xcdat_dev_20240401/lib/python3.11/site-packages/xarray/core/dataset.py in ?(variables, key, dim_sizes)
    212     if _contains_datetime_like_objects(ref_var):
    213         ref_var = DataArray(ref_var)
    214         data = getattr(ref_var.dt, var_name).data
    215     else:
--> 216         data = getattr(ref_var, var_name).data
    217     virtual_var = Variable(ref_var.dims, data)
    218 
    219     return ref_name, var_name, virtual_var

AttributeError: 'IndexVariable' object has no attribute 'year'
CPU times: user 240 ms, sys: 13.2 ms, total: 253 ms
Wall time: 249 ms
xarray.Dataset
Dimensions:
lat: 192bnds: 2lon: 384time: 1
Coordinates:
lat
(lat)
float64
-89.28 -88.36 ... 88.36 89.28
lon
(lon)
float64
0.0 0.9375 1.875 ... 358.1 359.1
time
(time)
object
2013-02-01 00:00:00
Data variables:
lat_bnds
(lat, bnds)
float64
dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds
(lon, bnds)
float64
dask.array<chunksize=(384, 2), meta=np.ndarray>
psl
(time, lat, lon)
float64
dask.array<chunksize=(1, 192, 384), meta=np.ndarray>
Indexes: (3)
Attributes: (44)
CPU times: user 106 ms, sys: 6.96 ms, total: 113 ms
Wall time: 110 ms
xarray.Dataset
Dimensions:
lat: 192bnds: 2lon: 384time: 1
Coordinates:
lat
(lat)
float64
-89.28 -88.36 ... 88.36 89.28
lon
(lon)
float64
0.0 0.9375 1.875 ... 358.1 359.1
time
(time)
object
2013-02-01 00:00:00
Data variables:
lat_bnds
(lat, bnds)
float64
dask.array<chunksize=(192, 2), meta=np.ndarray>
lon_bnds
(lon, bnds)
float64
dask.array<chunksize=(384, 2), meta=np.ndarray>
psl
(time, lat, lon)
float64
dask.array<chunksize=(1, 192, 384), meta=np.ndarray>
Indexes: (3)
Attributes: (44)

tomvothecoder · 2024-04-08T16:05:01Z

@lee1043 Thanks for trying to this out and providing an example script! I'll debug the stack trace.

lee1043 · 2024-10-10T17:36:54Z

@tomvothecoder I will give another round of test for this PR in this week. Sorry, it slipped from my list and thank you for the reminder!

lee1043 · 2024-10-11T01:17:29Z

I used 2 years of monthly time series data to test this PR. From the test results below, I believe the code is working as expected.

1. First sanity check: compare xcdat default vs custom season

import os
import xarray as xr
import xcdat as xc
import matplotlib.pyplot as plt

# input data is from 
# "/p/css03/esgf_publish/CMIP6/CMIP/AWI/AWI-CM-1-1-MR/historical/r1i1p1f1/Amon/psl/gn/v20181218/"

input_data = [
    "psl_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201201-201212.nc",
    "psl_Amon_AWI-CM-1-1-MR_historical_r1i1p1f1_gn_201301-201312.nc"
]

ds = xc.open_mfdataset(input_data)

# Sanity check

def compare_xcdat_default_vs_custom(custom_season, custom_drop_incomplete_seasons=True, ax=None, title=None):
    
    if custom_season == "DJF":
        custom_seasons = [['Dec', 'Jan', 'Feb']]
        index = 0
    elif custom_season == "MAM":
        custom_seasons = [['Mar', 'Apr', 'May']]
        index = 1
    elif custom_season == "JJA":
        custom_seasons = [['Jun', 'Jul', 'Aug']]
        index = 2
    elif custom_season == "SON":
        custom_seasons = [['Sep', 'Oct', 'Nov']]
        index = 3
        
    # season from default method
    ds_default_season = ds.temporal.climatology(
        "psl",
        freq="season",
        weighted=True,
        season_config={"dec_mode": "DJF", "drop_incomplete_djf": True},
    )

    da_default_season = ds_default_season['psl'][index]
    #print(custom_season, 'default', ds_default_season.sizes, da_default_season.shape)
    
    # season from custom season
    season_config = {'custom_seasons': custom_seasons, 'dec_mode': 'DJF', 'drop_incomplete_seasons': custom_drop_incomplete_seasons}
    ds_custom_season = ds.temporal.group_average("psl", "season", season_config=season_config)
    # get climatology
    if len(ds_custom_season.time) > 1:
        ds_custom_season = ds_custom_season.bounds.add_missing_bounds()
        ds_custom_season = ds_custom_season.temporal.average("psl")
    da_custom_season = ds_custom_season["psl"].squeeze()
    
    #print(custom_season, 'custom', ds_custom_season.sizes, da_custom_season.shape)
    
    # plot
    (da_default_season - da_custom_season).plot(ax=ax)
    
    if ax is not None and title is not None:
        ax.set_title(title)

fig, ax = plt.subplots(2, 4, figsize=(12, 5))

seasons = ["DJF", "MAM", "JJA", "SON"]

for i, season in enumerate(seasons):
    compare_xcdat_default_vs_custom(season, custom_drop_incomplete_seasons=True, ax=ax[0,i], title=season)
    compare_xcdat_default_vs_custom(season, custom_drop_incomplete_seasons=False, ax=ax[1,i], title=season)

fig.suptitle("xcdat default - custom \n upper: custom_drop_incomplete_seasons=True \n upper: custom_drop_incomplete_seasons=False")

fig.tight_layout()

The result is consistent to that is expected. DJF is one time step with drop_incomplete_seasons=True so no difference between default and custom approach (upper left). It became different when incomplete season is not dropped from custom, while it was dropped from default (lower left). Other seasons are not very much influenced by the parameter. so noise like differences are shown.

2. Compare to other tools

custom_season = "JJAS"
custom_seasons = [['Jun', 'Jul', 'Aug', 'Sep']]

#
# xcdat
#
ds = xc.open_mfdataset(input_data)

season_config = {'custom_seasons': custom_seasons, 'dec_mode': 'DJF', 'drop_incomplete_seasons': True}

ds_cs_xcdat = ds.temporal.group_average("psl", "season", season_config=season_config)
# get climatology
if len(ds_cs_xcdat.time) > 1:
    ds_cs_xcdat = ds_cs_xcdat.bounds.add_missing_bounds()
    ds_cs_xcdat = ds_cs_xcdat.temporal.average("psl")
da_cs_xcdat = ds_cs_xcdat["psl"].squeeze()

#
# CDAT
# 

# "cdscan -x test.xml *.nc" conducted as pre-process

import cdms2
import cdutil

f = cdms2.open('test.xml')
d = f('psl')
custom_season_class = cdutil.times.Seasons(custom_season)
d_cs = custom_season_class.climatology(d)

# 
# PMP (xcdat and xarray method)
# 
from pcmdi_metrics.utils import custom_season_average

# xcdat method
ds_cs_pmp_xcdat_yearly = custom_season_average(ds, "psl", season=custom_season, method="xcdat")
ds_cs_pmp_xcdat_yearly = ds_cs_pmp_xcdat_yearly.bounds.add_missing_bounds()
ds_cs_pmp_xcdat = ds_cs_pmp_xcdat_yearly.temporal.average("psl")
da_cs_pmp_xcdat = ds_cs_pmp_xcdat['psl']

# xarray method
ds_cs_pmp_xarray_yearly = custom_season_average(ds, "psl", season=custom_season, method="xarray")
ds_cs_pmp_xarray = ds_cs_pmp_xarray_yearly.mean(dim=["year"])
da_cs_pmp_xarray = ds_cs_pmp_xarray['psl']

# 
# Compare results
# 
fig, ax = plt.subplots(2, 4, figsize=(12, 5))

da_cs_xcdat.plot(ax=ax[0,0])
da_cs_cdat.plot(ax=ax[0,1])
da_cs_pmp_xcdat.plot(ax=ax[0,2])
da_cs_pmp_xarray.plot(ax=ax[0,3])


(da_cs_xcdat.to_numpy() - da_cs_cdat).plot(ax=ax[1,0])
(da_cs_cdat - da_cs_pmp_xcdat.to_numpy()).plot(ax=ax[1,1])
(da_cs_xcdat - da_cs_pmp_xcdat).plot(ax=ax[1,2])
(da_cs_pmp_xcdat - da_cs_pmp_xarray).plot(ax=ax[1,3])

ax[0,0].set_title("xcdat")
ax[0,1].set_title("cdat")
ax[0,2].set_title("pmp (xc)")
ax[0,3].set_title("pmp (xr)")

ax[1,0].set_title("xcdat - cdat")
ax[1,1].set_title("cdat - pmp (xc)")
ax[1,2].set_title("xcdat - pmp (xc)")
ax[1,3].set_title("pmp (xc) - pmp (xr)")

fig.suptitle(custom_season)

fig.tight_layout()

No difference between xcdat, cdat, and pmp (xcdat) results, while pmp (xarray) is inconsistent maybe because it does not consider propor temporal weighting.

lee1043 · 2024-10-11T01:24:23Z

@tomvothecoder if you plan to switch drop_incomplete_djf to drop_incomplete_seasons in this PR, do you plan to apply that to other functions as well? That would makes sense if there is a time series starting from July ending in June after years, and want to get JJA temporal average. But this can be a breaking change if drop_incomplete_djf is being used in one's code and they update xcdat version. I wonder if the following could be a reasonable transition plan: mapping drop_incomplete_djf to drop_incomplete_seasons during the transition, and print a deprecation warning sign if drop_incomplete_djf is being used.

tomvothecoder · 2024-10-11T20:27:10Z

@tomvothecoder if you plan to switch drop_incomplete_djf to drop_incomplete_seasons in this PR, do you plan to apply that to other functions as well? That would makes sense if there is a time series starting from July ending in June after years, and want to get JJA temporal average. But this can be a breaking change if drop_incomplete_djf is being used in one's code and they update xcdat version.

The drop_incomplete_seasons functionality applies to all of the group averaging APIs (group_average, climatology, and departures).

I wonder if the following could be a reasonable transition plan: mapping drop_incomplete_djf to drop_incomplete_seasons during the transition, and print a deprecation warning sign if drop_incomplete_djf is being used.

I commented in this thread above that users will who use drop_incomplete_djf will receive a DeprecationWarning to use drop_incomplete_seasons instead.

In this PR right now, drop_incomplete_djf is mapped to drop_incomplete_seasons. However, I don't think we should map these configs because they behave differently. drop_incomplete_djf removes the leading and trailing incomplete DJF seasons, while drop_incomplete_seasons will drop all incomplete seasons. It is better to keep the behaviors of these configs separate to not break any existing code that depends on the current drop_incomplete_djf behavior. Users will have time to transition over to drop_incomplete_seasons before drop_incomplete_djf is deprecated.

I will update this PR to restore the original drop_incomplete_djf functionality to have it alongside drop_incomplete_seasons.

tomvothecoder · 2024-10-23T21:59:24Z

From our meeting on 10/23:

Near-term solution: move forward with this PR and merge it when ready
Long-term solution: consider refactoring the underlying logic with Xarray #9524 in the future
- Scope out effort in refactoring xCDAT temporal backend to move away from Pandas to Xarray (need multi variable grouping)
- Experiment with this Xarray PR now, see if it fits our requirements for custom seasons

tomvothecoder · 2024-10-24T18:03:48Z

@xCDAT/core-developers For the final time coordinate output, the current logic maps the custom season to its middle month. For example, year 2000 and "NDJFM" will be represented by cftime(2000, 1, 1, 0...) after seasonal group averaging.

How useful would it be to keep the custom seasons as auxiliary coords? For example, (2000, NDJFM). I think it would be helpful to have since it explicitly shows what the custom seasons are.

In Xarray, grouping by season will result in "season" coords. I'm not sure if these auxiliary coords are CF-compliant, although probably not a big deal. For example,

  <xarray.DataArray 'season' (season: 2)> Size: 32B
  array(['DJFM', 'AMJ'], dtype='<U4')
  Coordinates:
    * season   (season) <U4 32B 'DJFM' 'AMJ'

Alternatively, the user can just reference the attributes of the data variable to see the custom seasons.

{'operation': 'temporal_avg',
 'mode': 'group_average',
 'freq': 'season',
 'weighted': 'True',
 'drop_incomplete_seasons': 'True',
 'custom_seasons': ['NovDecJanFebMar']}

Let me know your thoughts. This can also be a future enhancement or something to include if we decide to refactor later on.

- Remove logic for requiring all 12 months to be used

- Add conditional that determines whether subsetting time coordinates is necessary with custom seasons - Update docstrings for `season_config` - Add tests

- Months are also shifted in the `_preprocess_dataset()` method now. Before months were being shifted twice, once when dropping incomplete seasons or DJF, and a second time when labeling time coordinates.

tomvothecoder · 2024-11-12T21:05:33Z

In commit 0b6852f (#423), I refactored the logic for shifting months to use Xarray/NumPy instead of Pandas. We'll gradually shift away from the Pandas back-end for storing and manipulating Datetime components (used to group time coordinates) towards Xarray/Numpy in #217 since xarray >=2024.09.0 now supports grouping by multiple variables.

I will do a final walk through at the next xCDAT meeting (11/20) before merging.

tomvothecoder added the type: enhancement New enhancement request label Mar 3, 2023

tomvothecoder self-assigned this Mar 3, 2023

tomvothecoder force-pushed the feature/416-custom-season-span branch from 96c5eca to fa087b7 Compare March 6, 2023 20:31

tomvothecoder added this to the FY24Q1 (v0.7.0) milestone Sep 27, 2023

tomvothecoder modified the milestones: FY24Q1 (10/01/23 - 12/31/23), FY24Q2 (01/01/24 - 03/31/24) Oct 11, 2023

tomvothecoder mentioned this pull request Dec 4, 2023

Grouper object design doc pydata/xarray#8510

Merged

tomvothecoder modified the milestone: FY24Q2 (01/01/24 - 03/31/24) Dec 19, 2023

tomvothecoder removed this from the FY24Q2 (01/01/24 - 03/31/24) milestone Jan 16, 2024

tomvothecoder force-pushed the feature/416-custom-season-span branch from c11e505 to dc0c325 Compare April 4, 2024 20:12

tomvothecoder requested a review from lee1043 April 4, 2024 21:50

tomvothecoder marked this pull request as ready for review April 4, 2024 21:50

tomvothecoder commented Apr 4, 2024

View reviewed changes

tomvothecoder mentioned this pull request Apr 15, 2024

[Bug]: temporal.group_average with custom seasons produces wrong result when season cross year boundary for custom seasons. #642

Open

tomvothecoder added this to the FY24Q4 (07/01/24 - 09/30/24) milestone Jun 20, 2024

tomvothecoder added the priority: soon Should be addressed soon. label Oct 10, 2024

lee1043 self-requested a review October 11, 2024 01:18

lee1043 approved these changes Oct 11, 2024

View reviewed changes

tomvothecoder force-pushed the feature/416-custom-season-span branch from 4131525 to 4c3e8dd Compare October 23, 2024 21:55

tomvothecoder mentioned this pull request Nov 12, 2024

[Refactor]: Consider using flox and xr.resample() to improve temporal averaging grouping logic #217

Open

tomvothecoder added 16 commits November 12, 2024 13:02

Add baseline unit tests

7716efc

- Remove logic for requiring all 12 months to be used

Add support for shifting months in seasons spanning calendar year

ade02ba

Add explaination of code for spanning months

2a87cf0

Replace drop_incomplete_djf with drop_incomplete_seasons

6b4ec65

Fix _drop_incomplete_seasons() adding dims to variables

fbad335

- Add conditional that determines whether subsetting time coordinates is necessary with custom seasons - Update docstrings for `season_config` - Add tests

Add deprecation warning for "drop_incomplete_djf"

4a896ba

Fix tests

9f496e5

Add test to cover missing line in _drop_incomplete_season

7cb6910

Add clear error message when no complete seasons are found

1cf4364

Update deprecation version for drop_incomplete_djf

a050884

Update static time dim ref to CF interpreted dim key

5ace7a8

Add drop_incomplete_djf back

25e31f7

Move deprecation warning to inside DJF conditional

2b47972

Update comment about .where concatenating time din

05496ed

Refactor logic for shifting months to use Xarray instead of Pandas

0b6852f

- Months are also shifted in the `_preprocess_dataset()` method now. Before months were being shifted twice, once when dropping incomplete seasons or DJF, and a second time when labeling time coordinates.

Add todo comments

8d156c2

tomvothecoder force-pushed the feature/416-custom-season-span branch from 8f9af92 to 8d156c2 Compare November 12, 2024 21:02

tomvothecoder mentioned this pull request Nov 14, 2024

Initial prototype to refactor custom_seasons API with Xarray SeasonGrouper() API [WIP] #715

Draft

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for custom seasons spanning calendar years #423

Add support for custom seasons spanning calendar years #423

tomvothecoder commented Mar 3, 2023 •

edited

Loading

tomvothecoder commented Mar 6, 2023

tomvothecoder commented Apr 4, 2024

codecov bot commented Apr 4, 2024 •

edited

Loading

lee1043 commented Apr 4, 2024

lee1043 commented Apr 4, 2024

tomvothecoder left a comment •

edited

Loading

tomvothecoder Apr 4, 2024

tomvothecoder Apr 4, 2024

tomvothecoder commented Apr 4, 2024 •

edited

Loading

lee1043 commented Apr 4, 2024

lee1043 commented Apr 4, 2024

tomvothecoder commented Apr 8, 2024

lee1043 commented Oct 10, 2024

lee1043 commented Oct 11, 2024

lee1043 commented Oct 11, 2024

tomvothecoder commented Oct 11, 2024 •

edited

Loading

tomvothecoder commented Oct 23, 2024

tomvothecoder commented Oct 24, 2024 •

edited

Loading

tomvothecoder commented Nov 12, 2024

Add support for custom seasons spanning calendar years #423

Are you sure you want to change the base?

Add support for custom seasons spanning calendar years #423

Conversation

tomvothecoder commented Mar 3, 2023 • edited Loading

Description

TODO:

Checklist

Additional Context

tomvothecoder commented Mar 6, 2023

tomvothecoder commented Apr 4, 2024

codecov bot commented Apr 4, 2024 • edited Loading

Codecov Report

lee1043 commented Apr 4, 2024

lee1043 commented Apr 4, 2024

tomvothecoder left a comment • edited Loading

Choose a reason for hiding this comment

tomvothecoder Apr 4, 2024

Choose a reason for hiding this comment

tomvothecoder Apr 4, 2024

Choose a reason for hiding this comment

tomvothecoder commented Apr 4, 2024 • edited Loading

lee1043 commented Apr 4, 2024

lee1043 commented Apr 4, 2024

tomvothecoder commented Apr 8, 2024

lee1043 commented Oct 10, 2024

lee1043 commented Oct 11, 2024

1. First sanity check: compare xcdat default vs custom season

2. Compare to other tools

lee1043 commented Oct 11, 2024

tomvothecoder commented Oct 11, 2024 • edited Loading

tomvothecoder commented Oct 23, 2024

tomvothecoder commented Oct 24, 2024 • edited Loading

tomvothecoder commented Nov 12, 2024

tomvothecoder commented Mar 3, 2023 •

edited

Loading

codecov bot commented Apr 4, 2024 •

edited

Loading

tomvothecoder left a comment •

edited

Loading

tomvothecoder commented Apr 4, 2024 •

edited

Loading

tomvothecoder commented Oct 11, 2024 •

edited

Loading

tomvothecoder commented Oct 24, 2024 •

edited

Loading