Do you also have an LSTM implementation? #1

Niels-Sch · 2019-03-27T13:17:08Z

I really love this implementation, and I see that LSTM is still in the TODO. Have you made any progress on this in the last two months or should I just do it myself?

adik993 · 2019-03-27T18:38:58Z

Hi, I'm glad someone found it useful. Unfortunately, I haven't got time yet to implement it. I definitely will one day, but I'm not sure when I will find some time for it.

Niels-Sch · 2019-03-27T19:06:55Z

I just finished implementing it. It's still a massive mess though with lots of hackery so I won't bother you with it, but I might clean it up and let you know if you'd like :)

I really like how clear every function is in your code. You make me want to improve my own coding.

adik993 · 2019-03-27T20:18:00Z

Heh, everything emerges from mess :) Yes, sure I'd be happy to see your take on it, it's always nice to have some reference during coding, especially with ML, where the devil is in the details.

Niels-Sch · 2019-03-31T03:42:34Z

I will :) I'm cleaning it while I'm figgering out how to connect the models to Java through onnx/tensorflow/keras.

I also changed some of the algorithm in my version. For example I'm normalizing the curiosity rewards and instead of using .exp() on the difference of the logs I'm using an approximation that doesn't explode. I also simplified some of the hyper parameters. I'm getting full solves of pendulum in a bit less than 20 epochs, so all your renders sticking up in the air. Btw your tensorboard logs are super useful! Because of that I realised that Tanh's are preferred in the agent model because they are slower than the Relu's, allowing the ICM to keep up. Also they're probably less jump-to-conclusiony making them more stable.

Also I'm not using the "recurrent" parameter yet since it makes saving the hidden states tricky while maintaining compatibility with the run_[...].py files, but I guess I'll figger that out after further cleaning.

tomast95 · 2019-07-12T14:34:06Z

Hi, I'm also interested in (statefull) LSTM implementation.
Your implementation is very nice(inheritance and not too long files) and super usefull. I even learned new python thing from you - datatypes in functions declarations and its returns. And also how to use tensorboard... huuge thank you! :)

So far I have changed some of your code to use statefull LSTM and removed multienv to run on my env in single process(felt easier to work with). ICM now runs on each episode seperately (instead of your [n_env,batch_size,n_features] its [batch_size, n_timesteps, n_features]) and later its concated to [n_env_spisodes, batch_size, n_timesteps, n_features] for PPO training input.

But I have problems with diverging losses and rewards (viz my post here ). So now I'm curious if my approach with LSTM is correct.

ICM rewards(states are propagated at once with zeros in init hidden)
Getting policies in PPO(one state at time with re-using hidden and reset between env episodes)
batch ppo training (re-using hidden and reset it between env episodes)

Divergence persist even after reworking it to use batches on all places it uses models (ICM in reward and loss, PPO in getting old policies and training)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do you also have an LSTM implementation? #1

Do you also have an LSTM implementation? #1

Niels-Sch commented Mar 27, 2019

adik993 commented Mar 27, 2019

Niels-Sch commented Mar 27, 2019 •

edited

Loading

adik993 commented Mar 27, 2019

Niels-Sch commented Mar 31, 2019 •

edited

Loading

tomast95 commented Jul 12, 2019 •

edited

Loading

Do you also have an LSTM implementation? #1

Do you also have an LSTM implementation? #1

Comments

Niels-Sch commented Mar 27, 2019

adik993 commented Mar 27, 2019

Niels-Sch commented Mar 27, 2019 • edited Loading

adik993 commented Mar 27, 2019

Niels-Sch commented Mar 31, 2019 • edited Loading

tomast95 commented Jul 12, 2019 • edited Loading

Niels-Sch commented Mar 27, 2019 •

edited

Loading

Niels-Sch commented Mar 31, 2019 •

edited

Loading

tomast95 commented Jul 12, 2019 •

edited

Loading