[Question]Callback collected model does not have same reward as training verbose[custom gym environment] #1170

hotpotking-lol · 2022-08-15T06:34:20Z

Model saved periodically do not match the Reward on training window

I have a question when I was checking my training result. I am using a custom gym environment, and PPO algorithm from SB3.

During training, I save the model periodically in order to see how the model is evolving. And during learning, I also set the verbose=1 to keep track on the training progress. However, when I look at my temporary model I save periodically, the reward of those models do not have the same reward as the time they were saved.

For example, I saved "model_1" at timesteps=10,000 using a custom callback function. At the same time, the training windows showed "ep_rew_mean=366 " at timesteps=10,000. However, when I test "model_1" individually, the reward of this is 200. During the testing, I set model.predict(obs,deterministic = True). I wonder why this will happen, and is this cause by my callback function?

Moreover, my final model also do not have the same reward as the training window.

Here is my code for custom callback function:

class SaveOnModelCallback(BaseCallback):
    """
    Callback for saving a model (the check is done every ``check_freq`` steps)
    based on the training reward (in practice, we recommend using ``EvalCallback``).

    :param check_freq: (int)
    :param log_dir: (str) Path to the folder where the model will be saved.
      It must contains the file created by the ``Monitor`` wrapper.
    :param verbose: (int)
    """
    def __init__(self, check_freq: int, log_dir: str, verbose=1):
        super(SaveOnModelCallback, self).__init__(verbose)
        self.check_freq = check_freq
        self.log_dir = log_dir
        self.save_path = os.path.join(log_dir, 'best_model')

    def _init_callback(self) -> None:
        # Create folder if needed
        if self.save_path is not None:
            os.makedirs(self.save_path, exist_ok=True)

    def _on_step(self) -> bool:
        if self.n_calls % self.check_freq == 0:  
          count = self.n_calls // self.check_freq
          str1 = 'Tempmodel'
          print(f"Num timesteps: {self.num_timesteps}")
          print(f"Saving model to {self.save_path}.zip")
          self.model.save(str1+str(count))


        return True

araffin · 2022-08-16T08:09:11Z

Hello,
this is SB2 repo, not SB3.
Anyway, please give a minimal code example to reproduce the issue and you can also search for similar issues on the repo.
During training, the stochastic policy is used and return is averaged over 100 episodes (you should evaluate at least on 100 episodes with stochastic policy to be in the same setting, take also a look at the variance).

hotpotking-lol changed the title ~~Callback collected model does not have same reward as training verbose~~ [Question]Callback collected model does not have same reward as training verbose[custom gym environment] Aug 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]Callback collected model does not have same reward as training verbose[custom gym environment] #1170

[Question]Callback collected model does not have same reward as training verbose[custom gym environment] #1170

hotpotking-lol commented Aug 15, 2022

araffin commented Aug 16, 2022

[Question]Callback collected model does not have same reward as training verbose[custom gym environment] #1170

[Question]Callback collected model does not have same reward as training verbose[custom gym environment] #1170

Comments

hotpotking-lol commented Aug 15, 2022

araffin commented Aug 16, 2022