[Bugfix] check if model checkpoint exists before loading #22

mmtftr · 2024-12-18T18:49:33Z

Description

It's impossible to train a model from scratch with the current trainer code. The code attempts to load the model checkpoint even when it doesn't exist, and there is no way to disable it (args.model_file is set by the code even if it's not given explicitly).

Specifically, the model file's existence is not checked and load-model argument check is not implemented in /bin/main.py:518:

protnote/bin/main.py

Lines 519 to 530 in 2d69e44

    
           # Load the model weights if --load-model argument is provided (using the DATA_PATH directory as the root) 
        
           # TODO: Process model loading in the get_setup function 
        
           if args.model_file: 
        
               load_model( 
        
                   trainer=Trainer, 
        
                   checkpoint_path=os.path.join(config["DATA_PATH"], args.model_file), 
        
                   rank=rank, 
        
                   from_checkpoint=args.from_checkpoint, 
        
               ) 
        
               logger.info( 
        
                   f"Loading model checkpoing from {os.path.join(config['DATA_PATH'], args.model_file)}. If training, will continue from epoch {Trainer.epoch+1}.\n" 
        
               )

This PR updates the training logic to check if the chcekpoint path exists before loading it.

PS: It may be useful to update docs and also add an else clause that logs something along the lines of INFO: Could not find checkpoint {path}, training from scratch

fix: check if model checkpoint exists before loading

ca5e5d3

mmtftr changed the title ~~[Bugifx] check if model checkpoint exists before loading~~ [Bugfix] check if model checkpoint exists before loading Dec 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] check if model checkpoint exists before loading #22

[Bugfix] check if model checkpoint exists before loading #22

mmtftr commented Dec 18, 2024

	# Load the model weights if --load-model argument is provided (using the DATA_PATH directory as the root)
	# TODO: Process model loading in the get_setup function
	if args.model_file:
	load_model(
	trainer=Trainer,
	checkpoint_path=os.path.join(config["DATA_PATH"], args.model_file),
	rank=rank,
	from_checkpoint=args.from_checkpoint,
	)
	logger.info(
	f"Loading model checkpoing from {os.path.join(config['DATA_PATH'], args.model_file)}. If training, will continue from epoch {Trainer.epoch+1}.\n"
	)

[Bugfix] check if model checkpoint exists before loading #22

Are you sure you want to change the base?

[Bugfix] check if model checkpoint exists before loading #22

Conversation

mmtftr commented Dec 18, 2024

Description