You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I'm learning DDP method recently and also upvoted your brilliant implementation. It seems like you are using the MPC version of ilqr? I change it into normal version but it does not converged any more.... Could you please exaplain a little bit for me.
But anyway thanks for your implementation!!
The following is my code which is modified from your code!
import gym
import env
from autograd import grad, jacobian
import autograd.numpy as np
Hi! I'm learning DDP method recently and also upvoted your brilliant implementation. It seems like you are using the MPC version of ilqr? I change it into normal version but it does not converged any more.... Could you please exaplain a little bit for me.
But anyway thanks for your implementation!!
The following is my code which is modified from your code!
import gym
import env
from autograd import grad, jacobian
import autograd.numpy as np
class ILqr:
def init(self, next_state, running_cost, final_cost,
umax, state_dim, pred_time=500):
self.pred_time = pred_time
self.umax = umax
self.v = [0.0 for _ in range(pred_time + 1)]
self.v_x = [np.zeros(state_dim) for _ in range(pred_time + 1)]
self.v_xx = [np.zeros((state_dim, state_dim)) for _ in range(pred_time + 1)]
self.f = next_state
self.lf = final_cost
self.lf_x = grad(self.lf)
self.lf_xx = jacobian(self.lf_x)
self.l_x = grad(running_cost, 0)
self.l_u = grad(running_cost, 1)
self.l_xx = jacobian(self.l_x, 0)
self.l_uu = jacobian(self.l_u, 1)
self.l_ux = jacobian(self.l_u, 0)
self.f_x = jacobian(self.f, 0)
self.f_u = jacobian(self.f, 1)
env = gym.make('CartPoleContinuous-v0').env
obs = env.reset()
ilqr = ILqr(lambda x, u: env._state_eq(x, u),
lambda x, u: 0.5 * np.sum(np.square(u)),
lambda x: 0.5 * (np.square(1.0 - np.cos(x[2])) + np.square(x[1]) + np.square(x[3])),
env.max_force,
env.observation_space.shape[0])
u_seq = [np.zeros(1) for _ in range(ilqr.pred_time)]
x_seq = [obs.copy()]
for t in range(ilqr.pred_time):
x_seq.append(env._state_eq(x_seq[-1], u_seq[t]))
cnt = 0
while True:
env.reset()
The text was updated successfully, but these errors were encountered: