Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reinforcejs VS ConvNetjs #8

Open
functionsoft opened this issue Oct 23, 2015 · 11 comments
Open

Reinforcejs VS ConvNetjs #8

functionsoft opened this issue Oct 23, 2015 · 11 comments

Comments

@functionsoft
Copy link

No description provided.

@functionsoft
Copy link
Author

Hi,

I'm looking at
http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html

and comparing the agent there with the one at

http://cs.stanford.edu/people/karpathy/reinforcejs/waterworld.html

They are acting in very similar environment, but have different AI implementaitons.

My question is, which is the more advanced and complete AI agent between the two versions?

What are the differences in the neural network implementations and which is more intelligent agent?

Thanks,

Mike

@karpathy
Copy link
Owner

Hi, both of those agents are using the same algorithm: DQN, but yes the
implementation is different on the level of details. I'd use the
REINFORCEjs one, it's more recent and complete.

On Fri, Oct 23, 2015 at 9:29 AM, functionsoft [email protected]
wrote:

Hi,

I'm looking at
http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html

and comparing the agent there with the one at

http://cs.stanford.edu/people/karpathy/reinforcejs/waterworld.html

They are acting in very similar environment, but have different AI
implementaitons.

My question is, which is the more advanced and complete AI agent between
the two versions?

What are the differences in the neural network implementations and which
is more intelligent agent?

Thanks,

Mike


Reply to this email directly or view it on GitHub
#8 (comment).

@functionsoft
Copy link
Author

Hi,

Thanks for getting back to me. I’m glad you said that, because that’s the library I chose out of the two to work with and understand.

In the learn function of the DQNAgent there is a comment regarding replay memory, about priority sweeps, how could this be simply implemented with the current code? I assume it involves marking the experience memory with some value that represents good experience vs bad experience? So that the best memories are played back?

Also, the type of neural network implemented in this agent, what is it? Is it a simple multilayer perceptron? Would the agent benefit from more hidden layers?

Any ideas or suggestions greatly appreciated.

Thanks and Regards,

Mike

From: Andrej
Sent: Friday, October 23, 2015 6:28 PM
To: karpathy/reinforcejs
Cc: functionsoft
Subject: Re: [reinforcejs] Reinforcejs VS ConvNetjs (#8)

Hi, both of those agents are using the same algorithm: DQN, but yes the
implementation is different on the level of details. I'd use the
REINFORCEjs one, it's more recent and complete.

On Fri, Oct 23, 2015 at 9:29 AM, functionsoft [email protected]
wrote:

Hi,

I'm looking at
http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html

and comparing the agent there with the one at

http://cs.stanford.edu/people/karpathy/reinforcejs/waterworld.html

They are acting in very similar environment, but have different AI
implementaitons.

My question is, which is the more advanced and complete AI agent between
the two versions?

What are the differences in the neural network implementations and which
is more intelligent agent?

Thanks,

Mike


Reply to this email directly or view it on GitHub
#8 (comment).


Reply to this email directly or view it on GitHub.

@mryellow
Copy link

mryellow commented Nov 2, 2015

priority sweeps, how could this be simply implemented with the current code? I assume it involves marking the experience memory with some value that represents good experience vs bad experience? So that the best memories are played back?

Seen it done that way in a paper somewhere (can't find it), they added an extra property to the experience objects with a value which was then used to prune experiences.

@nosyndicate
Copy link

Hi, mryellow, I am very interesting in the prioritized sweeping with experience replay paper you talk about, can you recall anything that is related to it that I can use to google it?

@mryellow
Copy link

mryellow commented Nov 3, 2015

Not sure I have it saved here, think it may have been an incomplete draft, and not that interesting otherwise.

They were using ReinforceJS, had modified this bit

this.learnFromTuple(e[0], e[1], e[2], e[3], e[4])
to include some kind of crude threshold on a score. Believe it was effectively only really looking for actions with a non-zero reward.

One bit that sticks in my head is they were using a Greek alphabet Rho or Psi or something and had in-line comments with it showing properly encoded rather than LaTex or a substitute simple character.

@mryellow
Copy link

mryellow commented Nov 3, 2015

google: "this.learnFromTuple(e[0], e[1], e[2], e[3], e[4], e[5])"

On Learning Coordination Among Soccer Agents

http://robocup.csu.edu.cn/web/wp-content/uploads/2012/12/data/pdfs/robio12-116.pdf

@mryellow
Copy link

mryellow commented Nov 3, 2015

Hangon, only result, but not it, although I've seen this paper before.... and don't think it passed in the score, but checked it before firing learnFromTuple... So that's a wild goose chase, sorry for the noise.

@nosyndicate
Copy link

Thanks, mryellow

@andrewcz
Copy link

There is a new paper in regards to deep reinforcements learning in continous spaces by deepmind. Continuos control with deep reinforcements learning. Is there plans to add this in code form. Many thanks Andrew.

@NullVoxPopuli
Copy link

I'm also curious about the deepmind's learnings :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants