We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using the default agent parameters, but set spec.update to 'sarsa', the model simply does not converge to the optimal solution.
// agent parameter spec to play with (this gets eval()'d on Agent reset) var spec = {} spec.update = 'sarsa'; // 'qlearn' or 'sarsa' spec.gamma = 0.9; // discount factor, [0, 1) spec.epsilon = 0.2; // initial epsilon for epsilon-greedy policy, [0, 1) spec.alpha = 0.1; // value function learning rate spec.lambda = 0.1; // eligibility trace decay, [0,1). 0 = no eligibility traces spec.replacing_traces = true; // use replacing or accumulating traces spec.planN = 0; // number of planning steps per iteration. 0 = no planning
spec.smooth_policy_update = true; // non-standard, updates policy smoothly to follow max_a Q spec.beta = 0.1; // learning rate for smooth policy update
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Using the default agent parameters, but set spec.update to 'sarsa', the model simply does not converge to the optimal solution.
// agent parameter spec to play with (this gets eval()'d on Agent reset)
var spec = {}
spec.update = 'sarsa'; // 'qlearn' or 'sarsa'
spec.gamma = 0.9; // discount factor, [0, 1)
spec.epsilon = 0.2; // initial epsilon for epsilon-greedy policy, [0, 1)
spec.alpha = 0.1; // value function learning rate
spec.lambda = 0.1; // eligibility trace decay, [0,1). 0 = no eligibility traces
spec.replacing_traces = true; // use replacing or accumulating traces
spec.planN = 0; // number of planning steps per iteration. 0 = no planning
spec.smooth_policy_update = true; // non-standard, updates policy smoothly to follow max_a Q
spec.beta = 0.1; // learning rate for smooth policy update
The text was updated successfully, but these errors were encountered: