Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing pickle errors to allow for multiprocessing #37

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dmklee
Copy link

@dmklee dmklee commented Mar 6, 2023

When training on multiple GPU's using ddp, there is an error with pickling escnn models. The solution was pointed out QUVA-Lab/e2cnn/pull/69 for the e2cnn library so I made similar modifications to escnn. This was discussed in issue #16.

I made a basic test file, test_pickling.py and I have fixed the problem for C(n) and D(n) on R2. I am happy to extend it to other groups if you want. The pickling error occurs because escnn uses local functions which cannot be pickled. The fix is to convert local functions to the __call__ methods of a class.

@codingS3b
Copy link

This definitely helped me when using escnn together with pytorch lightning which, when saving checkpoints, also relied on pickling. Using your fork I was finally able to reload my trained lightning models, which failed before. Thanks!

@Gabri95
Copy link
Collaborator

Gabri95 commented Jul 17, 2023

Hey!

Sorry for not having merged your PR yet, but I am not fully convinced that this is the best way to go at the moment.
In particular, by following this strategy, I should replace most functions to deal with subgroups and representations with new classes. This seems an overhead and it would introduce some redundancy.
Moreover, using lambda expression won't be possible anymore.

However, at the end, one needs to only pickle classes from escnn.nn, not from the other subpackages.
Objects like groups, irreps and gspaces don't really have a state and can just be regenerated from some ids.
That means that, theoretically, they don't need to be completely pickled, but one needs to only store these ids.

I had already implemented a similar trick for groups, that's what the static _generator() method and the _keys() property are for. You can see them used for caching purpose here.

Now, the question for me is how to integrate this with pickling.
I read here that one can customize how a class is being pickled via the __getstate__() and __setstate__() methods, but I need to understand this a little better to make sure it actually solves all the problems here before refactoring everything.

Please, feel free to comment on this or to give me some advice if you have some experience with this.

Thanks,
Gabriele

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants