Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NGU implementation #4

Open
TakieddineSOUALHI opened this issue Feb 18, 2023 · 2 comments
Open

NGU implementation #4

TakieddineSOUALHI opened this issue Feb 18, 2023 · 2 comments

Comments

@TakieddineSOUALHI
Copy link

Hi, First of all thank you for providing these implementations to the community.

I've a few questions about your NGU implementation. the original work uses two networks a randomly fixed network like in RND and an embedding network to calculate the exploration rewards. The idea of the embedding network is to use it to represent states in episodic memory and use them later to calculate intrinsic rewards. Also, the embedding network is trained each iteration to optimize for action-state pairs (a,s) with batches sampled from the replay buffer.

My questions are:

  • How does this implementation handle the episodic memory and the training embedding network. If I understood your implementation well. you assume that the buffer (either replay or rollout) is the episodic memory and use it to embed states.
  • Meanwhile the embedding network is used to calculate intrinsic rewards, a predictor network is the one trained and used for RND rewards. I didn't understand this part quite well. Can you elaborate this point please?
@yuanmingqi
Copy link
Collaborator

Hi! The key insight of NGU is to combine episodic state novelty and life-long novelty:

  • Episodic state novelty for maximizing intra-episode exploration;
  • Life-long state novelty for maximizing Inter-episode exploration;

For the RND part, we follow the origin design of RND. But for the episodic part, we use a random and fixed encoder to generate representations, which is inspired by

Seo Y, Chen L, Shin J, et al. State entropy maximization with random encoders for efficient exploration[C]//International Conference on Machine Learning. PMLR, 2021: 9443-9454.

  • Since we only need the representations to perform pseudo-count and the memory is erased in each episode, thus the embedding method may not be that critical.
  • A fixed encoder can provide fixed representations and maintain a stable reward space;
  • It is more efficient and easy to train.

Anyway, you can follow the original implementation or create a new one, which depends on your task.

@yuanmingqi
Copy link
Collaborator

yuanmingqi commented Feb 29, 2024

Hello! We've published a big update that provides more reasonable implementations of these intrinsic rewrads.

If you have any other questions, please don't hesitate to ask here.

@TakieddineSOUALHI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants