Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question]Callback collected model does not have same reward as training verbose[custom gym environment] #1170

Open
hotpotking-lol opened this issue Aug 15, 2022 · 1 comment

Comments

@hotpotking-lol
Copy link

Model saved periodically do not match the Reward on training window

I have a question when I was checking my training result. I am using a custom gym environment, and PPO algorithm from SB3.

During training, I save the model periodically in order to see how the model is evolving. And during learning, I also set the verbose=1 to keep track on the training progress. However, when I look at my temporary model I save periodically, the reward of those models do not have the same reward as the time they were saved.

For example, I saved "model_1" at timesteps=10,000 using a custom callback function. At the same time, the training windows showed "ep_rew_mean=366 " at timesteps=10,000. However, when I test "model_1" individually, the reward of this is 200. During the testing, I set model.predict(obs,deterministic = True). I wonder why this will happen, and is this cause by my callback function?

Moreover, my final model also do not have the same reward as the training window.

Here is my code for custom callback function:

class SaveOnModelCallback(BaseCallback):
    """
    Callback for saving a model (the check is done every ``check_freq`` steps)
    based on the training reward (in practice, we recommend using ``EvalCallback``).

    :param check_freq: (int)
    :param log_dir: (str) Path to the folder where the model will be saved.
      It must contains the file created by the ``Monitor`` wrapper.
    :param verbose: (int)
    """
    def __init__(self, check_freq: int, log_dir: str, verbose=1):
        super(SaveOnModelCallback, self).__init__(verbose)
        self.check_freq = check_freq
        self.log_dir = log_dir
        self.save_path = os.path.join(log_dir, 'best_model')

    def _init_callback(self) -> None:
        # Create folder if needed
        if self.save_path is not None:
            os.makedirs(self.save_path, exist_ok=True)

    def _on_step(self) -> bool:
        if self.n_calls % self.check_freq == 0:  
          count = self.n_calls // self.check_freq
          str1 = 'Tempmodel'
          print(f"Num timesteps: {self.num_timesteps}")
          print(f"Saving model to {self.save_path}.zip")
          self.model.save(str1+str(count))


        return True
@hotpotking-lol hotpotking-lol changed the title Callback collected model does not have same reward as training verbose [Question]Callback collected model does not have same reward as training verbose[custom gym environment] Aug 16, 2022
@araffin
Copy link
Collaborator

araffin commented Aug 16, 2022

Hello,
this is SB2 repo, not SB3.
Anyway, please give a minimal code example to reproduce the issue and you can also search for similar issues on the repo.
During training, the stochastic policy is used and return is averaged over 100 episodes (you should evaluate at least on 100 episodes with stochastic policy to be in the same setting, take also a look at the variance).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants