You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Run dask script and monitor the running process in Dask dashboard.
Things might be helpful during using Dask:
Put the reproject part and other preprocessing into another separate script, this make your main script run faster and look more clean.
Chunk the data by space and time when load it, and make sure in every step, they have same chunk size. It is better to chunk the data as early as possible.
If you load the trained model such as Machine Learning or Deep Learning model, make sure the model not so big, my trained model was 15 GB because I did not set max_depth when I trained Random Forest. If the model is too big, map_block() function can not handle it, you will get unexpected error. For example, my updated model is 245 MB, I can pass the model path to map_block() function, if I load the model outside of map_block() function and then pass it to map_block() function, the unmanaged memory is extremely high. Although loading the model outside of map_block() function is faster because you only load once, it throw unmanaged memory too high error, so we only can load the model inside map_block() function, in this way, it load the model for every chunk.
When export to netcdf, use netcdf4, not netcdf3. Use xarray to export.
Client(n_workers=4, threads_per_worker=1). More workers and more threads might make your script run faster. But if your data is too big, and if you set too many workers or threads_per_worker, the webpage might snap. This point I am still trying.
The text was updated successfully, but these errors were encountered:
The procedure to use Dask on snellius:
Things might be helpful during using Dask:
The text was updated successfully, but these errors were encountered: