You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue arises from the existing parallel computing guide that is meant to provide some general guidance on parallel computing with dask/xarray/xcdat.
It would be helpful to create some specific documentation on how to parallelize in the xcdat context. The ideas would be to show more xcdat-oriented practical examples (without the complications of the dask cluster stuff). I imagine it would do things like this:
Download a dataset that is a few GBs to disk (e.g., a piControl file via wget)
Show how chunking in time-versus-space affects performance for a given operation (e.g., spatial averaging probably needs time-chunks, but temporal averaging might do fine with space chunks?)
Walk through how you might decide on a chunk size (e.g., this is a 4 GB dataset with 100 years or 1200 timepoints, so breaking it into decades [120 months], would give me pretty manageable 400 MB chunks to work on with 5 workers).
Maybe show a dask.delayed (or joblib) example, e.g.,
First download a number of netcdfs (e.g., a CMIP historical tas simulation broken into ~12 files)
Do a glob to get the file list
Create a function that opens a file, computes the spatial average, returns the spatial average
run results = dask.delayed(...)
Use xr.concat to combine the results into one dataset
Compare dask.delayed to serial performance
Talk about some xCDAT-specific parallelization considerations (e.g., the FAQs in the existing parallel notebook)
The text was updated successfully, but these errors were encountered:
Describe your documentation update
This issue arises from the existing parallel computing guide that is meant to provide some general guidance on parallel computing with dask/xarray/xcdat.
It would be helpful to create some specific documentation on how to parallelize in the xcdat context. The ideas would be to show more xcdat-oriented practical examples (without the complications of the dask cluster stuff). I imagine it would do things like this:
dask.delayed
(or joblib) example, e.g.,results = dask.delayed(...)
xr.concat
to combine the results into one datasetdask.delayed
to serial performanceThe text was updated successfully, but these errors were encountered: