You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! First,In ppo.py self.policy = self.loss = -self.policy_loss + self.value_loss - self.entropy_loss
you said ' Reduce sum over all sub-policies (where only the active sub-policy will be non-zero due to previous filtering',but the loss will be a list. How can a list of loss background? self.train_step = self.optimizer.minimize(self.loss, var_list=policy_params).
Second,you first compute '_create_sub_policy' ,in this part the loss will be reduce mean and finally became a scalar. After filtering,all sub policy module will output the same value. It really work?
The text was updated successfully, but these errors were encountered:
Hi! First,In ppo.py
self.policy = self.loss = -self.policy_loss + self.value_loss - self.entropy_loss
you said ' Reduce sum over all sub-policies (where only the active sub-policy will be non-zero due to previous filtering',but the loss will be a list. How can a list of loss background?
self.train_step = self.optimizer.minimize(self.loss, var_list=policy_params)
.Second,you first compute '_create_sub_policy' ,in this part the loss will be reduce mean and finally became a scalar. After filtering,all sub policy module will output the same value. It really work?
The text was updated successfully, but these errors were encountered: