Can't reproduce Mujoco half-cheetah results #2

wutianyuan1 · 2023-02-07T06:09:32Z

I met some difficulties to reproduce the Mujoco half-cheetah results shown in the paper, and have some questions for your implementation. My local environment is Python 3.7.11, torch 1.10.0+cu113, gym 0.26.2. My questions are:
(1) I trained a model-free RL for 400000 steps, which settings are the same as the example in README.md. The paper reported the avg. reward was about 1500-2000, and max. reward was about 2200. However, my model-free results are between 2500 and 3000. I wonder why the model-free results are better than the paper. (results shown in the following figure)
(2) I also trained a model-based latent-ode RL for 61000 steps (still running now), which settings are also the same as the example in README.md. However, the avg. results seem not better than the model-free experiment at the same steps. I wonder if the Mujoco version or some other package versions will influence this result (results shown in the following figure).
(3) I found that the MSE loss of latent ODE decreased over time, but dt loss didn't (it was about 2.165 during the entire training process). Is it an expectable behaviour? Also, I found that the time gap is determined by np.clip(round(amp*np.cos(20*np.pi*np.linalg.norm(state[8:-1]))+amp+1), self.min_t, self.max_t), does it have some physical meaning or patterns?

Training logs of latent ode:

Epoch 1 | training MSE = 0.201491 | test MSE = 0.375520 | training dt loss = 2.164400 | test dt loss = 2.166912 | time = 23.816061 s
Epoch 2 | training MSE = 0.140499 | test MSE = 0.291101 | training dt loss = 2.165074 | test dt loss = 2.166906 | time = 22.703627 s
Epoch 3 | training MSE = 0.111174 | test MSE = 0.237388 | training dt loss = 2.164855 | test dt loss = 2.165762 | time = 20.589840 s
Epoch 4 | training MSE = 0.097770 | test MSE = 0.223056 | training dt loss = 2.164089 | test dt loss = 2.166288 | time = 20.890777 s
Epoch 5 | training MSE = 0.091927 | test MSE = 0.223834 | training dt loss = 2.163729 | test dt loss = 2.165926 | time = 21.758636 s
Epoch 6 | training MSE = 0.087607 | test MSE = 0.219942 | training dt loss = 2.163412 | test dt loss = 2.166388 | time = 18.831989 s
Epoch 7 | training MSE = 0.084743 | test MSE = 0.204372 | training dt loss = 2.163763 | test dt loss = 2.165238 | time = 20.200457 s
Epoch 8 | training MSE = 0.083336 | test MSE = 0.205035 | training dt loss = 2.163808 | test dt loss = 2.165916 | time = 19.223901 s
Epoch 9 | training MSE = 0.083146 | test MSE = 0.211704 | training dt loss = 2.163765 | test dt loss = 2.165511 | time = 19.203694 s
Epoch 10 | training MSE = 0.081754 | test MSE = 0.212139 | training dt loss = 2.163377 | test dt loss = 2.165225 | time = 19.532426 s
Epoch 11 | training MSE = 0.080134 | test MSE = 0.228265 | training dt loss = 2.163585 | test dt loss = 2.166573 | time = 20.249373 s
Epoch 12 | training MSE = 0.080521 | test MSE = 0.203695 | training dt loss = 2.162982 | test dt loss = 2.165596 | time = 20.280165 s
Epoch 13 | training MSE = 0.080686 | test MSE = 0.211850 | training dt loss = 2.162555 | test dt loss = 2.165789 | time = 20.470568 s
Epoch 14 | training MSE = 0.079521 | test MSE = 0.220026 | training dt loss = 2.162912 | test dt loss = 2.165811 | time = 17.431301 s
Epoch 15 | training MSE = 0.078815 | test MSE = 0.181571 | training dt loss = 2.163148 | test dt loss = 2.165812 | time = 19.203955 s
Epoch 16 | training MSE = 0.078544 | test MSE = 0.209845 | training dt loss = 2.163012 | test dt loss = 2.165332 | time = 19.719091 s
Epoch 17 | training MSE = 0.077576 | test MSE = 0.209810 | training dt loss = 2.163904 | test dt loss = 2.165824 | time = 20.133925 s
Epoch 18 | training MSE = 0.077081 | test MSE = 0.193259 | training dt loss = 2.163013 | test dt loss = 2.165472 | time = 23.004033 s
Epoch 19 | training MSE = 0.078475 | test MSE = 0.218727 | training dt loss = 2.162930 | test dt loss = 2.165444 | time = 19.103691 s
Epoch 20 | training MSE = 0.081168 | test MSE = 0.224107 | training dt loss = 2.163754 | test dt loss = 2.165966 | time = 20.924851 s

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't reproduce Mujoco half-cheetah results #2

Can't reproduce Mujoco half-cheetah results #2

wutianyuan1 commented Feb 7, 2023 •

edited

Loading

Can't reproduce Mujoco half-cheetah results #2

Can't reproduce Mujoco half-cheetah results #2

Comments

wutianyuan1 commented Feb 7, 2023 • edited Loading

wutianyuan1 commented Feb 7, 2023 •

edited

Loading