CDAT Migration Phase 2: Refactor arm_diags set #842

chengzhuzhang · 2024-08-26T22:24:12Z

Description

Closes CDAT Migration Phase 2: Refactor arm_diags set #667 to replace CDAT Migration Phase 2: Refactor arm_diags set #834
Closes [Bug]: UnboundLocalError: local variable 'cwv_max' referenced before assignment for arm_diags plot with region sgpc1 #858

Checklist

My code follows the style guidelines of this project
I have performed a self-review of my own code
My changes generate no new warnings
Any dependent changes have been merged and published in downstream modules

If applicable:

New and existing unit tests pass with my changes (locally and CI/CD build)
I have added tests that prove my fix is effective or that my feature works
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

e3sm_diags/derivations/formulas.py

chengzhuzhang · 2024-09-09T17:25:14Z

Updates:
All sub sets code refactoring including (annual_cycle, diurnal_cycle(_zt), convection_onset, aerosol_activation) are completed.

Todo:
Fix mypy issues
Potential performance bottleneck from deriving variables using xarray dataset as input

tomvothecoder · 2024-09-09T21:51:25Z

Potential performance bottleneck from deriving variables using xarray dataset as input

Thank you for identifying the two formulas code as potential bottlenecks. I confirmed that these computations with Xarray are indeed much slower than CDAT.

Solution

I ran a performance benchmark and found the solution to the slowness: we need to call .load(scheduler="sync") in _get_dataset_with_source_vars() to speed up the computation. I pushed commit ef44fc6 (#842) with this fix.

Benchmark Results

The first runtime is the current code and the second runtime is with .load(). I also ran e3sm_diags with commit ef44fc6 (#842) and confirmed a significant runtime improvement, similar to the benchmarks below. Performance is now on-par with CDAT.

"""
Results
----------
1. Elapsed time (Xarray non-chunked): 6.540755605790764 seconds
2. Elapsed time (Xarray non-chunked with .load()): 0.17097265785560012 seconds
3. Elapsed time (Xarray chunked): 0.1452920027077198 seconds
4. Elapsed time (numpy .values): 6.418793010059744 seconds
5. Elapsed time (numpy .data): 7.334999438840896 seconds
"""

Elapsed time (CDAT main branch, single runtime): 0.12261438369750977 seconds

Benchmark code: https://github.com/E3SM-Project/e3sm_diags/pull/842/files#diff-5d1606dd605753c34397d3700cba5774b4db874ccd950e480e759e9824c561ac

e3sm_diags/driver/utils/climo_xr.py

e3sm_diags/driver/utils/diurnal_cycle_xr.py

chengzhuzhang · 2024-09-10T22:29:45Z

The first runtime is the current code and the second runtime is with .load(). I also ran e3sm_diags with commit ef44fc6 (#842)

@tomvothecoder thank you for performing the timing test. Interesting that when I test your commit. Running through a configuration with each subset, it took 24 mins. Without, .load(), the total time is about 3 mins, which I consider at least on par with cdat code. In this case, maybe we should drop the .load() change?

tomvothecoder · 2024-09-11T16:49:13Z

The first runtime is the current code and the second runtime is with .load(). I also ran e3sm_diags with commit ef44fc6 (#842)

@tomvothecoder thank you for performing the timing test. Interesting that when I test your commit. Running through a configuration with each subset, it took 24 mins. Without, .load(), the total time is about 3 mins, which I consider at least on par with cdat code. In this case, maybe we should drop the .load() change?

I would think that loading the derived variables dataset into memory shouldn't slow down performance unless the datasets were extremely large (which we should use Dask chunking for).

I only benchmarked performance for the formula computations. I will benchmark a complete run to verify your findings and determine if we should revert the commit or not.

Side-note:

It could be that the logic I implemented already stores the dataset in-memory, since it merges multiple xr.Dataset objects opened via open_dataset() (uses numpy arrays) instead of using open_mfdataset() (uses Dask arrays). If this the case, I don't see how .load() would improve the speed of the formula computations though.

e3sm_diags/e3sm_diags/driver/utils/dataset_xr.py

Lines 1056 to 1064 in ef44fc6

    
           for var in vars_to_get: 
        
               ds = self._get_time_series_dataset_obj(var) 
        
               datasets.append(ds) 
        
           ds = xr.merge(datasets) 
        
           ds = squeeze_time_dim(ds) 
        
           ds.load(scheduler="sync") 
        
           return ds

tomvothecoder · 2024-09-13T17:59:43Z

RE: #842 (comment)
I found adding .load() in _get_dataset_with_source_vars() adds 2-4 minutes of runtime to a complete arm_diags run.

I reverted this change and will now address the pre-commit issues.

# Commit: 58361c49-4b1b-11ec-9b3b-9c5c8e2f5e4e (no .load())
# run_set function took 281.78 seconds to complete.
# run_set function took 332.79 seconds to complete.
# Commit: ef44fc6ffd538a5e257b097b99f7a1a79b79bc3b (with .load())
# run_set function took 472.81 seconds to complete.

e3sm_diags/plot/arm_diags_plot.py

tomvothecoder · 2024-09-13T21:49:09Z

Hey @chengzhuzhang, I fixed the pre-commit issues in 2c248fb (#842). I also performed initial code cleanup since I would probably be doing that later anyways. Refer to the commit message and my review comments for more information.

I re-ran all sets and they completed successfully with these changes, although I noticed some of the diagnostic sets weren't done yet.

tomvothecoder

Initial review comment and some questions of FIXME comments.

e3sm_diags/driver/arm_diags_driver.py

chengzhuzhang · 2024-09-13T22:08:15Z

Thank you @tomvothecoder I will do another pass to see if there are other sets need to be refactored. Thanks a lot for fixing and clean this branch.

chengzhuzhang · 2024-09-18T20:33:39Z

@tomvothecoder I applied png regression tests. And all figures are produced and results are as expected as noted in the notebook.
Also performance-wise, the refactored codes are similar to the original codes. wall time is ~ 10mins with 4 workers for the full arm_diags set.

tomvothecoder · 2024-09-27T18:50:11Z

Hey @chengzhuzhang, I noticed you have several other PRs you're working on right now. I'm happy to help finish up refactoring the last diag function in this PR. Let me know.

chengzhuzhang · 2024-09-27T19:04:11Z

Hey @chengzhuzhang, I noticed you have several other PRs you're working on right now. I'm happy to help finish up refactoring the last diag function in this PR. Let me know.

@tomvothecoder it would be great if you could help me on finish this last diag, so that I'm not holding back the progress to merge!

tomvothecoder · 2024-09-27T19:42:07Z

Hey @chengzhuzhang, I noticed you have several other PRs you're working on right now. I'm happy to help finish up refactoring the last diag function in this PR. Let me know.

@tomvothecoder it would be great if you could help me on finish this last diag, so that I'm not holding back the progress to merge!

I found that no sets actually run "annual_cycle_aerosol" (arm_diags_model_vs_obs.cfg, arm_diags_model_vs_model.cfg). Unless we expect to run this diagnostic in the future, I think we can delete _run_diag_annual_cycle_aerosol instead of refactoring it.

chengzhuzhang · 2024-09-27T20:17:10Z

@tomvothecoder good for catching this. I think for now we can just create an issue to log this problem, and we can add back the code at a later time.

tomvothecoder · 2024-09-27T21:14:07Z

e3sm_diags/driver/utils/climo_xr.py

@@ -152,6 +152,7 @@ def climo(dataset: xr.Dataset, var_key: str, freq: ClimoFreq):
        # averaging.
        dims = [dim for dim in dv.dims if dim != time_coords.name]
        coords = {k: v for k, v in dv.coords.items() if k in dims}
+        climo = climo.squeeze(axis=0)


Fixes bug when ncycle == 1 where the climo variable time axis is not being squeezed which causes the rebuilt dv_climo DataArray to fail with

FAILED tests/e3sm_diags/driver/utils/test_climo_xr.py::TestClimo::test_returns_annual_cycle_climatology - ValueError: different number of dimensions on data and dims: 3 vs 2 FAILED tests/e3sm_diags/driver/utils/test_climo_xr.py::TestClimo::test_returns_DJF_season_climatology - ValueError: different number of dimensions on data and dims: 3 vs 2 FAILED tests/e3sm_diags/driver/utils/test_climo_xr.py::TestClimo::test_returns_MAM_season_climatology - ValueError: different number of dimensions on data and dims: 3 vs 2 FAILED tests/e3sm_diags/driver/utils/test_climo_xr.py::TestClimo::test_returns_JJA_season_climatology - ValueError: different number of dimensions on data and dims: 3 vs 2 FAILED tests/e3sm_diags/driver/utils/test_climo_xr.py::TestClimo::test_returns_SON_season_climatology - ValueError: different number of dimensions on data and dims: 3 vs 2 FAILED tests/e3sm_diags/driver/utils/test_climo_xr.py::TestClimo::test_returns_jan_climatology - ValueError: different number of dimensions on data and dims: 3 vs 2 FAILED tests/e3sm_diags/driver/utils/test_climo_xr.py::TestClimo::test_returns_climatology_for_derived_variable - ValueError: different number of dimensions on data and dims: 3 vs 2 FAILED tests/e3sm_diags/driver/utils/test_dataset_xr.py::TestGetClimoDataset::test_returns_climo_dataset_using_climo_of_time_series_files - ValueError: different number of dimensions on data and dims: 3 vs 2

tomvothecoder · 2024-09-27T21:17:01Z

Fix integration test failing
Address remaining FIXME and TODO: items in arm_diags_driver.py

Regression test results

.png regression test shows all plots are identical except the following, which have missing data (white spaces) compared to main:

~~Investigate the following plots because they have white spaces on dev branch for some reason -- CLOUD test variable~~ -- not a concern, dev branch is doing the right thing by masking closer to the surface at 1000 mb pressure level

Comparing:
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/arm_diags/armdiags-CLOUD-ANNUALCYCLE-nsac1-test.png
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags/armdiags-CLOUD-ANNUALCYCLE-nsac1-test.png
     * Difference path /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags_diff/armdiags-CLOUD-ANNUALCYCLE-nsac1-test.png
Comparing:
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/arm_diags/armdiags-CLOUD-ANNUALCYCLE-sgpc1-test.png
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags/armdiags-CLOUD-ANNUALCYCLE-sgpc1-test.png
     * Difference path /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags_diff/armdiags-CLOUD-ANNUALCYCLE-sgpc1-test.png
Comparing:
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/arm_diags/armdiags-CLOUD-ANNUALCYCLE-twpc1-test.png
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags/armdiags-CLOUD-ANNUALCYCLE-twpc1-test.png
     * Difference path /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags_diff/armdiags-CLOUD-ANNUALCYCLE-twpc1-test.png
Comparing:
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/arm_diags/armdiags-CLOUD-ANNUALCYCLE-twpc2-test.png
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags/armdiags-CLOUD-ANNUALCYCLE-twpc2-test.png
     * Difference path /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags_diff/armdiags-CLOUD-ANNUALCYCLE-twpc2-test.png
Comparing:
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/main/arm_diags/armdiags-CLOUD-ANNUALCYCLE-twpc3-test.png
    * /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags/armdiags-CLOUD-ANNUALCYCLE-twpc3-test.png
     * Difference path /global/cfs/cdirs/e3sm/www/cdat-migration-fy24/667-arm_diags-final/arm_diags_diff/armdiags-CLOUD-ANNUALCYCLE-twpc3-test.png

tomvothecoder

Remaining TODO and FIXME comments in arm_diags_driver.py

e3sm_diags/driver/arm_diags_driver.py

…ance

- Refactor `arm_diags_driver.py`- rename functions, reorder functions, rename variables for clarity, replace `.format` with f-strings, update docstrings, add `_select_point()` - Refactor `climo_xr.py` - extract `_get_cycle_for_freq()`, remove commented out code - Add `time_interval` to `ARMDiagsParameter`

- Rename functions to denote private and reorder based on call in `arm_diags_driver.py` - Add typestrings and annotations - Separate logically related blocks of code with comments - Add `_save_plots()` function to replace repeated I/O across functions

- This behavior mimics the co flag found in the CDAT codebase

…" flag

tomvothecoder · 2024-10-01T17:50:43Z

e3sm_diags/driver/utils/dataset_xr.py

+    def _exclude_sub_monthly_coord_spanning_year(
+        self, ds_subset: xr.Dataset
+    ) -> xr.Dataset:
+        """
+        Exclude the last time coordinate for sub-monthly data if it extends into
+        the next year.
+
+        Excluding end time coordinates that extend to the next year is
+        necessary because downstream operations such as annual cycle climatology
+        should consist of data for full years for accurate calculations.
+
+        For example, if the time slice is ("0001-01-01", "0002-01-01") and
+        the last time coordinate is:
+            * "0002-01-01" -> exclude
+            * "0001-12-31" -> don't exclude
+
+        Parameters
+        ----------
+        ds_subset : xr.Dataset
+            The subsetted dataset.
+
+        Returns
+        -------
+        xr.Dataset
+            The dataset with the last time coordinate excluded if necessary.
+
+        Notes
+        -----
+        This function replicates the CDAT cdms2 "co" slice flag (close, open).
+        """
+        time_dim = xc.get_dim_keys(ds_subset, axis="T")
+        time_values = ds_subset[time_dim]
+        last_time_year = time_values[-1].dt.year.item()
+        second_last_time_year = time_values[-2].dt.year.item()
+
+        if self.is_sub_monthly and last_time_year > second_last_time_year:
+            ds_subset = ds_subset.isel(time=slice(0, -1))
+
+        return ds_subset
+


Replicates the "co" slice flag for sub-monthly data

chengzhuzhang commented Sep 9, 2024

View reviewed changes

e3sm_diags/derivations/formulas.py Show resolved Hide resolved

chengzhuzhang commented Sep 9, 2024

View reviewed changes

e3sm_diags/derivations/formulas.py Show resolved Hide resolved

chengzhuzhang requested a review from tomvothecoder September 9, 2024 17:24

chengzhuzhang marked this pull request as ready for review September 9, 2024 17:25

tomvothecoder force-pushed the refactor/667-arm_diags_rebase branch from 7b52e83 to ef44fc6 Compare September 9, 2024 21:44

tomvothecoder reviewed Sep 9, 2024

View reviewed changes

e3sm_diags/driver/utils/climo_xr.py Outdated Show resolved Hide resolved

tomvothecoder reviewed Sep 9, 2024

View reviewed changes

e3sm_diags/driver/utils/diurnal_cycle_xr.py Outdated Show resolved Hide resolved

chengzhuzhang commented Sep 13, 2024

View reviewed changes

e3sm_diags/plot/arm_diags_plot.py Outdated Show resolved Hide resolved

chengzhuzhang commented Sep 13, 2024

View reviewed changes

e3sm_diags/plot/arm_diags_plot.py Outdated Show resolved Hide resolved

tomvothecoder reviewed Sep 13, 2024

View reviewed changes

tomvothecoder assigned chengzhuzhang Sep 13, 2024

tomvothecoder added the cdat-migration-fy24 CDAT Migration FY24 Task label Sep 13, 2024

tomvothecoder added this to the FY24 Q4 (07/01/24 - 9/30/24) milestone Sep 13, 2024

chengzhuzhang mentioned this pull request Sep 18, 2024

CDAT Migration Phase 2: Refactor arm_diags set #834

Closed

9 tasks

tomvothecoder reviewed Sep 27, 2024

View reviewed changes

e3sm_diags/driver/arm_diags_driver.py Show resolved Hide resolved

e3sm_diags/driver/arm_diags_driver.py Show resolved Hide resolved

e3sm_diags/driver/arm_diags_driver.py Outdated Show resolved Hide resolved

tomvothecoder changed the title ~~CDAT Migration Phase 2: Refactor arm_diags set (try 2)~~ CDAT Migration Phase 2: Refactor arm_diags set Sep 30, 2024

chengzhuzhang and others added 28 commits October 1, 2024 10:49

update convection onset

a43a1a6

update aerosol activation

5124e0f

Add .load() to _get_dataset_with_source_vars() to improve perform…

d303550

…ance

Clean up run script

b1dfeeb

Revert .load() in _get_dataset_with_source_vars()

aad246f

Replace fastAllGridFT with _fft_all_grid

04f0892

Update fastAllGridFT to _fft_all_grid

76ff0d2

add png regression test script

65a096f

add png regression test script

2ffb6e0

fix CI for a missing file

5818318

Remove unused annual_cycle_aerosol diags

9049725

Remove unused _select_point() function

d8b794a

Add arm_diags png regression testing

fc9a940

Update regression testing png notebook

54999a7

Fix ncycle=1 misaligned dims in climo_xr.py

420821f

Address ref FIXME comment

640b42a

Fix _get_time_slice() end_time format when years <1000

0d0671c

Refactor arm_diags_plot.py

311a630

- Rename functions to denote private and reorder based on call in `arm_diags_driver.py` - Add typestrings and annotations - Separate logically related blocks of code with comments - Add `_save_plots()` function to replace repeated I/O across functions

Update regression testing notebook

fb8a320

Add logic to exclude last time coordinate for sub-monthly data

da6d110

- This behavior mimics the co flag found in the CDAT codebase

Fix unit tests for submonthly time series data

67de95c

Update .get statement for long_name attr when it does not exist

5821a85

Add .get for long_name and other attrs

c13ad23

Apply suggestions from code review

49ee2a5

Fix RegionStats type annotation

8c936be

Add _exclude_sub_monthly_coord_spanning_year to replicate cdms2 "co…

dd4e7fe

…" flag

Update e3sm_diags/driver/utils/dataset_xr.py

7b53932

tomvothecoder force-pushed the refactor/667-arm_diags_rebase branch from 18357c5 to 7b53932 Compare October 1, 2024 17:49

tomvothecoder reviewed Oct 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CDAT Migration Phase 2: Refactor arm_diags set #842

CDAT Migration Phase 2: Refactor arm_diags set #842

chengzhuzhang commented Aug 26, 2024 •

edited by tomvothecoder

Loading

chengzhuzhang commented Sep 9, 2024

tomvothecoder commented Sep 9, 2024 •

edited

Loading

chengzhuzhang commented Sep 10, 2024

tomvothecoder commented Sep 11, 2024

tomvothecoder commented Sep 13, 2024

tomvothecoder commented Sep 13, 2024 •

edited

Loading

tomvothecoder left a comment

chengzhuzhang commented Sep 13, 2024

chengzhuzhang commented Sep 18, 2024

tomvothecoder commented Sep 27, 2024

chengzhuzhang commented Sep 27, 2024

tomvothecoder commented Sep 27, 2024

chengzhuzhang commented Sep 27, 2024

tomvothecoder Sep 27, 2024 •

edited

Loading

tomvothecoder commented Sep 27, 2024 •

edited

Loading

tomvothecoder left a comment

tomvothecoder Oct 1, 2024

CDAT Migration Phase 2: Refactor arm_diags set #842

Are you sure you want to change the base?

CDAT Migration Phase 2: Refactor arm_diags set #842

Conversation

chengzhuzhang commented Aug 26, 2024 • edited by tomvothecoder Loading

Description

Checklist

chengzhuzhang commented Sep 9, 2024

tomvothecoder commented Sep 9, 2024 • edited Loading

Solution

Benchmark Results

chengzhuzhang commented Sep 10, 2024

tomvothecoder commented Sep 11, 2024

tomvothecoder commented Sep 13, 2024

tomvothecoder commented Sep 13, 2024 • edited Loading

tomvothecoder left a comment

Choose a reason for hiding this comment

chengzhuzhang commented Sep 13, 2024

chengzhuzhang commented Sep 18, 2024

tomvothecoder commented Sep 27, 2024

chengzhuzhang commented Sep 27, 2024

tomvothecoder commented Sep 27, 2024

chengzhuzhang commented Sep 27, 2024

tomvothecoder Sep 27, 2024 • edited Loading

Choose a reason for hiding this comment

tomvothecoder commented Sep 27, 2024 • edited Loading

Regression test results

tomvothecoder left a comment

Choose a reason for hiding this comment

tomvothecoder Oct 1, 2024

Choose a reason for hiding this comment

chengzhuzhang commented Aug 26, 2024 •

edited by tomvothecoder

Loading

tomvothecoder commented Sep 9, 2024 •

edited

Loading

tomvothecoder commented Sep 13, 2024 •

edited

Loading

tomvothecoder Sep 27, 2024 •

edited

Loading

tomvothecoder commented Sep 27, 2024 •

edited

Loading