Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virtualizarr + Coiled Serverless Example Notebook #233

Merged
merged 4 commits into from
Aug 29, 2024
Merged

Conversation

norlandrhagen
Copy link
Collaborator

@norlandrhagen norlandrhagen commented Aug 27, 2024

Inspired by @thodson-usgs's Lithops example, I've created a Virtualizarr example using coiled serverless functions.

  • 1TB virtual dataset from 924 NetCDF files
  • 9 minutes and ~$0.24 of cloud cost on coiled

Would love some feedback if anyone has thoughts.

  • Changes are documented in docs/releases.rst
image

@norlandrhagen norlandrhagen added the usage example Real world use case examples label Aug 27, 2024
@norlandrhagen norlandrhagen changed the title terraclimate_coiled ex Virtualizarr + Coiled Serverless Example Notebook Aug 27, 2024
@TomNicholas
Copy link
Collaborator

Awesome!!

I'm a bit unclear where the in-memory datasets live at each point in the computation when using coiled functions. You do the reference generation on a bunch of separate instances, but in order to to combine_by_coords they all have to be on the same instance. At what point does that transfer occur?

@norlandrhagen
Copy link
Collaborator Author

norlandrhagen commented Aug 27, 2024

I'm a bit unclear where the in-memory datasets live at each point in the computation when using coiled functions. You do the reference generation on a bunch of separate instances, but in order to to combine_by_coords they all have to be on the same instance. At what point does that transfer occur?

I'm pretty sure the .map returns a generator of all the virtual datasets to my local laptop.
Coiled allows you to run notebooks in the cloud. I successfully tried running all the reference generation serverless functions from a coiled cloud notebook and that also worked great. It might be a good option for a larger dataset, but I think since the manifest arrays are so memory efficient, I didn't see any issues doing the reduce on a laptop.
I also tried starting the reference generation severless functions from a larger serverless function (that was meant to be the reduce machine) and it was kinda wacky.

It would be nice to find the limits to where this becomes a problem! At that point, maybe the lithops map-reduce executor or beam would be a better option.

@norlandrhagen norlandrhagen merged commit 708d168 into main Aug 29, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage example Real world use case examples
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants