Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Saving the groups generated from groupby operation #5674

Closed
digital-idiot opened this issue Aug 4, 2021 · 2 comments
Closed

Saving the groups generated from groupby operation #5674

digital-idiot opened this issue Aug 4, 2021 · 2 comments

Comments

@digital-idiot
Copy link

digital-idiot commented Aug 4, 2021

Problem
Group by is an expensive operation. Therefore I want to store my dataset to disk in the form of groups from group by operation. My use case is concerned with the groups, for example I want to take advantage of the lazy loading and only want to load selected groups into memory and process them.

Preferred Solution

  • An additional parameter to pass the groups to cache when writing the dataset to disk
    Or
  • A separate function to write the dataset as a collection of groups to file.

Alternatives considered
Treating each group as separate dataset and writing each of them to separate file. This is not suitable if number of groups is large and each group is relatively very small.

Additional context
It would also be great if groupby operation is natively supported for multiple coordinates.

@max-sixty
Copy link
Collaborator

Thanks for the suggestion. This didn't get traction and we're trying to keep issues < 1000, so I'll close, but feel free to suggest again, possibly with some motivating examples.

@max-sixty max-sixty closed this as not planned Won't fix, can't repro, duplicate, stale Sep 18, 2024
@dcherian
Copy link
Contributor

Funnily enough, the shuffle_by op here is one way to accomplish this: #9320 . Chunk boundaries will line up with group boundaries, so with the right data format this will work well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants