We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dragen老师好,老师在写代码时候非常用心了尽可能用丰富的tf2函数和类。 PPO示例里,main()函数主循环部分: if done: if len(agent.buffer) >= batch_size: agent.optimize() 在逻辑上有些问题,因为如果在此次episode当中某次交互done了,但是它存的buffer_size还没有比batch_size大,那在上面条件逻辑下这部分数据就会存在您写的buffer里并且保留到下一个episode交互中,等下一次可以agent.optimize()的时候,你算得discounted returns序列就不满足markov性。这个if逻辑在morvan的PPO里也能看到,他用的是 done or len(agent.buffer) >= batch_size 的逻辑。
谢谢老师,在您写的算法示例启发下非常受教
The text was updated successfully, but these errors were encountered:
No branches or pull requests
dragen老师好,老师在写代码时候非常用心了尽可能用丰富的tf2函数和类。
PPO示例里,main()函数主循环部分:
if done:
if len(agent.buffer) >= batch_size:
agent.optimize()
在逻辑上有些问题,因为如果在此次episode当中某次交互done了,但是它存的buffer_size还没有比batch_size大,那在上面条件逻辑下这部分数据就会存在您写的buffer里并且保留到下一个episode交互中,等下一次可以agent.optimize()的时候,你算得discounted returns序列就不满足markov性。这个if逻辑在morvan的PPO里也能看到,他用的是 done or len(agent.buffer) >= batch_size 的逻辑。
谢谢老师,在您写的算法示例启发下非常受教
The text was updated successfully, but these errors were encountered: