Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bugfix] check if model checkpoint exists before loading #22

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mmtftr
Copy link

@mmtftr mmtftr commented Dec 18, 2024

Description

It's impossible to train a model from scratch with the current trainer code. The code attempts to load the model checkpoint even when it doesn't exist, and there is no way to disable it (args.model_file is set by the code even if it's not given explicitly).

Specifically, the model file's existence is not checked and load-model argument check is not implemented in /bin/main.py:518:

protnote/bin/main.py

Lines 519 to 530 in 2d69e44

# Load the model weights if --load-model argument is provided (using the DATA_PATH directory as the root)
# TODO: Process model loading in the get_setup function
if args.model_file:
load_model(
trainer=Trainer,
checkpoint_path=os.path.join(config["DATA_PATH"], args.model_file),
rank=rank,
from_checkpoint=args.from_checkpoint,
)
logger.info(
f"Loading model checkpoing from {os.path.join(config['DATA_PATH'], args.model_file)}. If training, will continue from epoch {Trainer.epoch+1}.\n"
)

This PR updates the training logic to check if the chcekpoint path exists before loading it.

PS: It may be useful to update docs and also add an else clause that logs something along the lines of INFO: Could not find checkpoint {path}, training from scratch

@mmtftr mmtftr changed the title [Bugifx] check if model checkpoint exists before loading [Bugfix] check if model checkpoint exists before loading Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant