You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I met some difficulties to reproduce the Mujoco half-cheetah results shown in the paper, and have some questions for your implementation. My local environment is Python 3.7.11, torch 1.10.0+cu113, gym 0.26.2. My questions are:
(1) I trained a model-free RL for 400000 steps, which settings are the same as the example in README.md. The paper reported the avg. reward was about 1500-2000, and max. reward was about 2200. However, my model-free results are between 2500 and 3000. I wonder why the model-free results are better than the paper. (results shown in the following figure)
(2) I also trained a model-based latent-ode RL for 61000 steps (still running now), which settings are also the same as the example in README.md. However, the avg. results seem not better than the model-free experiment at the same steps. I wonder if the Mujoco version or some other package versions will influence this result (results shown in the following figure).
(3) I found that the MSE loss of latent ODE decreased over time, but dt loss didn't (it was about 2.165 during the entire training process). Is it an expectable behaviour? Also, I found that the time gap is determined by np.clip(round(amp*np.cos(20*np.pi*np.linalg.norm(state[8:-1]))+amp+1), self.min_t, self.max_t), does it have some physical meaning or patterns?
Training logs of latent ode:
Epoch 1 | training MSE = 0.201491 | test MSE = 0.375520 | training dt loss = 2.164400 | test dt loss = 2.166912 | time = 23.816061 s
Epoch 2 | training MSE = 0.140499 | test MSE = 0.291101 | training dt loss = 2.165074 | test dt loss = 2.166906 | time = 22.703627 s
Epoch 3 | training MSE = 0.111174 | test MSE = 0.237388 | training dt loss = 2.164855 | test dt loss = 2.165762 | time = 20.589840 s
Epoch 4 | training MSE = 0.097770 | test MSE = 0.223056 | training dt loss = 2.164089 | test dt loss = 2.166288 | time = 20.890777 s
Epoch 5 | training MSE = 0.091927 | test MSE = 0.223834 | training dt loss = 2.163729 | test dt loss = 2.165926 | time = 21.758636 s
Epoch 6 | training MSE = 0.087607 | test MSE = 0.219942 | training dt loss = 2.163412 | test dt loss = 2.166388 | time = 18.831989 s
Epoch 7 | training MSE = 0.084743 | test MSE = 0.204372 | training dt loss = 2.163763 | test dt loss = 2.165238 | time = 20.200457 s
Epoch 8 | training MSE = 0.083336 | test MSE = 0.205035 | training dt loss = 2.163808 | test dt loss = 2.165916 | time = 19.223901 s
Epoch 9 | training MSE = 0.083146 | test MSE = 0.211704 | training dt loss = 2.163765 | test dt loss = 2.165511 | time = 19.203694 s
Epoch 10 | training MSE = 0.081754 | test MSE = 0.212139 | training dt loss = 2.163377 | test dt loss = 2.165225 | time = 19.532426 s
Epoch 11 | training MSE = 0.080134 | test MSE = 0.228265 | training dt loss = 2.163585 | test dt loss = 2.166573 | time = 20.249373 s
Epoch 12 | training MSE = 0.080521 | test MSE = 0.203695 | training dt loss = 2.162982 | test dt loss = 2.165596 | time = 20.280165 s
Epoch 13 | training MSE = 0.080686 | test MSE = 0.211850 | training dt loss = 2.162555 | test dt loss = 2.165789 | time = 20.470568 s
Epoch 14 | training MSE = 0.079521 | test MSE = 0.220026 | training dt loss = 2.162912 | test dt loss = 2.165811 | time = 17.431301 s
Epoch 15 | training MSE = 0.078815 | test MSE = 0.181571 | training dt loss = 2.163148 | test dt loss = 2.165812 | time = 19.203955 s
Epoch 16 | training MSE = 0.078544 | test MSE = 0.209845 | training dt loss = 2.163012 | test dt loss = 2.165332 | time = 19.719091 s
Epoch 17 | training MSE = 0.077576 | test MSE = 0.209810 | training dt loss = 2.163904 | test dt loss = 2.165824 | time = 20.133925 s
Epoch 18 | training MSE = 0.077081 | test MSE = 0.193259 | training dt loss = 2.163013 | test dt loss = 2.165472 | time = 23.004033 s
Epoch 19 | training MSE = 0.078475 | test MSE = 0.218727 | training dt loss = 2.162930 | test dt loss = 2.165444 | time = 19.103691 s
Epoch 20 | training MSE = 0.081168 | test MSE = 0.224107 | training dt loss = 2.163754 | test dt loss = 2.165966 | time = 20.924851 s
The text was updated successfully, but these errors were encountered:
I met some difficulties to reproduce the Mujoco half-cheetah results shown in the paper, and have some questions for your implementation. My local environment is Python 3.7.11, torch 1.10.0+cu113, gym 0.26.2. My questions are:
(1) I trained a model-free RL for 400000 steps, which settings are the same as the example in README.md. The paper reported the avg. reward was about 1500-2000, and max. reward was about 2200. However, my model-free results are between 2500 and 3000. I wonder why the model-free results are better than the paper. (results shown in the following figure)
(2) I also trained a model-based latent-ode RL for 61000 steps (still running now), which settings are also the same as the example in README.md. However, the avg. results seem not better than the model-free experiment at the same steps. I wonder if the Mujoco version or some other package versions will influence this result (results shown in the following figure).
(3) I found that the MSE loss of latent ODE decreased over time, but dt loss didn't (it was about 2.165 during the entire training process). Is it an expectable behaviour? Also, I found that the time gap is determined by
np.clip(round(amp*np.cos(20*np.pi*np.linalg.norm(state[8:-1]))+amp+1), self.min_t, self.max_t)
, does it have some physical meaning or patterns?Training logs of latent ode:
The text was updated successfully, but these errors were encountered: