-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
code #13
base: master
Are you sure you want to change the base?
code #13
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting.
First, could you please change the name to 09_?_nes_env_name.py
This is a local/random search algorithm, so it should be in 09s. ('10_xx' is for actor-critic methods)
However, I'm not so sure about using the low-level API for the Pong.
It could have been done in a single line with the high-level API. If @hunkim is okay, then I guess it's okay.
I left some comments only in the bipedal example, but the relevant issues in the Pong example should be fixed as well.
|
||
import gym | ||
import numpy as np | ||
import cPickle as pickle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you using Python2?
cPickle is renamed to pickle in python3
no need to import cPickle
env = gym.make('BipedalWalker-v2') | ||
np.random.seed(10) | ||
|
||
hl_size = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add comments for each hyper parameters?
Also please follow some standard naming conventions.
aver_reward
should be renamed to something like avg_reward
model[k] = model[k] + alpha/(npop*sigma) * np.dot(N[k].transpose(1, 2, 0), A) | ||
|
||
cur_reward = f(model) | ||
aver_reward = aver_reward * 0.9 + cur_reward * 0.1 if aver_reward != None else cur_reward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need aver_reward
?
Why did you use EMA?
"high-level API" preferred. |
Added two evolution strategy implementations