Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't reproduce Mujoco half-cheetah results #2

Open
wutianyuan1 opened this issue Feb 7, 2023 · 0 comments
Open

Can't reproduce Mujoco half-cheetah results #2

wutianyuan1 opened this issue Feb 7, 2023 · 0 comments

Comments

@wutianyuan1
Copy link

wutianyuan1 commented Feb 7, 2023

I met some difficulties to reproduce the Mujoco half-cheetah results shown in the paper, and have some questions for your implementation. My local environment is Python 3.7.11, torch 1.10.0+cu113, gym 0.26.2. My questions are:
(1) I trained a model-free RL for 400000 steps, which settings are the same as the example in README.md. The paper reported the avg. reward was about 1500-2000, and max. reward was about 2200. However, my model-free results are between 2500 and 3000. I wonder why the model-free results are better than the paper. (results shown in the following figure)
(2) I also trained a model-based latent-ode RL for 61000 steps (still running now), which settings are also the same as the example in README.md. However, the avg. results seem not better than the model-free experiment at the same steps. I wonder if the Mujoco version or some other package versions will influence this result (results shown in the following figure).
(3) I found that the MSE loss of latent ODE decreased over time, but dt loss didn't (it was about 2.165 during the entire training process). Is it an expectable behaviour? Also, I found that the time gap is determined by np.clip(round(amp*np.cos(20*np.pi*np.linalg.norm(state[8:-1]))+amp+1), self.min_t, self.max_t), does it have some physical meaning or patterns?

Training logs of latent ode:

Epoch 1 | training MSE = 0.201491 | test MSE = 0.375520 | training dt loss = 2.164400 | test dt loss = 2.166912 | time = 23.816061 s
Epoch 2 | training MSE = 0.140499 | test MSE = 0.291101 | training dt loss = 2.165074 | test dt loss = 2.166906 | time = 22.703627 s
Epoch 3 | training MSE = 0.111174 | test MSE = 0.237388 | training dt loss = 2.164855 | test dt loss = 2.165762 | time = 20.589840 s
Epoch 4 | training MSE = 0.097770 | test MSE = 0.223056 | training dt loss = 2.164089 | test dt loss = 2.166288 | time = 20.890777 s
Epoch 5 | training MSE = 0.091927 | test MSE = 0.223834 | training dt loss = 2.163729 | test dt loss = 2.165926 | time = 21.758636 s
Epoch 6 | training MSE = 0.087607 | test MSE = 0.219942 | training dt loss = 2.163412 | test dt loss = 2.166388 | time = 18.831989 s
Epoch 7 | training MSE = 0.084743 | test MSE = 0.204372 | training dt loss = 2.163763 | test dt loss = 2.165238 | time = 20.200457 s
Epoch 8 | training MSE = 0.083336 | test MSE = 0.205035 | training dt loss = 2.163808 | test dt loss = 2.165916 | time = 19.223901 s
Epoch 9 | training MSE = 0.083146 | test MSE = 0.211704 | training dt loss = 2.163765 | test dt loss = 2.165511 | time = 19.203694 s
Epoch 10 | training MSE = 0.081754 | test MSE = 0.212139 | training dt loss = 2.163377 | test dt loss = 2.165225 | time = 19.532426 s
Epoch 11 | training MSE = 0.080134 | test MSE = 0.228265 | training dt loss = 2.163585 | test dt loss = 2.166573 | time = 20.249373 s
Epoch 12 | training MSE = 0.080521 | test MSE = 0.203695 | training dt loss = 2.162982 | test dt loss = 2.165596 | time = 20.280165 s
Epoch 13 | training MSE = 0.080686 | test MSE = 0.211850 | training dt loss = 2.162555 | test dt loss = 2.165789 | time = 20.470568 s
Epoch 14 | training MSE = 0.079521 | test MSE = 0.220026 | training dt loss = 2.162912 | test dt loss = 2.165811 | time = 17.431301 s
Epoch 15 | training MSE = 0.078815 | test MSE = 0.181571 | training dt loss = 2.163148 | test dt loss = 2.165812 | time = 19.203955 s
Epoch 16 | training MSE = 0.078544 | test MSE = 0.209845 | training dt loss = 2.163012 | test dt loss = 2.165332 | time = 19.719091 s
Epoch 17 | training MSE = 0.077576 | test MSE = 0.209810 | training dt loss = 2.163904 | test dt loss = 2.165824 | time = 20.133925 s
Epoch 18 | training MSE = 0.077081 | test MSE = 0.193259 | training dt loss = 2.163013 | test dt loss = 2.165472 | time = 23.004033 s
Epoch 19 | training MSE = 0.078475 | test MSE = 0.218727 | training dt loss = 2.162930 | test dt loss = 2.165444 | time = 19.103691 s
Epoch 20 | training MSE = 0.081168 | test MSE = 0.224107 | training dt loss = 2.163754 | test dt loss = 2.165966 | time = 20.924851 s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant