Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continuing training from a wrapped checkpoint does not work. #151

Open
Taikakim opened this issue Oct 2, 2024 · 1 comment
Open

Continuing training from a wrapped checkpoint does not work. #151

Taikakim opened this issue Oct 2, 2024 · 1 comment

Comments

@Taikakim
Copy link

Taikakim commented Oct 2, 2024

I'm getting a RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory error when trying to resume from a wrapped checkpoint. The only values I changed in the config are LR and inv_gamma.

The unwrap script is complaining about : RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

The checkpoints are on my Drive and I'm running in Colab, but I don't recall this was a problem before.

@Taikakim
Copy link
Author

Taikakim commented Oct 3, 2024

Ah, problem solved, the upload from Drive was interrupted before the checkpoint had been transferred :/ (I saved the checkpoint from inside a cell, and apparently, even if it shows in the right place in Colab, unless it's synced and I disconnect the VM, Colab does not finish uploads in the background.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant