Reinforcejs VS ConvNetjs #8

functionsoft · 2015-10-23T16:26:07Z

No description provided.

functionsoft · 2015-10-23T16:29:26Z

Hi,

I'm looking at
http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html

and comparing the agent there with the one at

http://cs.stanford.edu/people/karpathy/reinforcejs/waterworld.html

They are acting in very similar environment, but have different AI implementaitons.

My question is, which is the more advanced and complete AI agent between the two versions?

What are the differences in the neural network implementations and which is more intelligent agent?

Thanks,

Mike

karpathy · 2015-10-23T17:28:34Z

Hi, both of those agents are using the same algorithm: DQN, but yes the
implementation is different on the level of details. I'd use the
REINFORCEjs one, it's more recent and complete.

On Fri, Oct 23, 2015 at 9:29 AM, functionsoft [email protected]
wrote:

Hi,

I'm looking at
http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html

and comparing the agent there with the one at

http://cs.stanford.edu/people/karpathy/reinforcejs/waterworld.html

They are acting in very similar environment, but have different AI
implementaitons.

My question is, which is the more advanced and complete AI agent between
the two versions?

What are the differences in the neural network implementations and which
is more intelligent agent?

Thanks,

Mike

—
Reply to this email directly or view it on GitHub
#8 (comment).

functionsoft · 2015-10-24T07:42:12Z

Hi,

Thanks for getting back to me. I’m glad you said that, because that’s the library I chose out of the two to work with and understand.

In the learn function of the DQNAgent there is a comment regarding replay memory, about priority sweeps, how could this be simply implemented with the current code? I assume it involves marking the experience memory with some value that represents good experience vs bad experience? So that the best memories are played back?

Also, the type of neural network implemented in this agent, what is it? Is it a simple multilayer perceptron? Would the agent benefit from more hidden layers?

Any ideas or suggestions greatly appreciated.

Thanks and Regards,

Mike

From: Andrej
Sent: Friday, October 23, 2015 6:28 PM
To: karpathy/reinforcejs
Cc: functionsoft
Subject: Re: [reinforcejs] Reinforcejs VS ConvNetjs (#8)

Hi, both of those agents are using the same algorithm: DQN, but yes the
implementation is different on the level of details. I'd use the
REINFORCEjs one, it's more recent and complete.

On Fri, Oct 23, 2015 at 9:29 AM, functionsoft [email protected]
wrote:

Hi,

I'm looking at
http://cs.stanford.edu/people/karpathy/convnetjs/demo/rldemo.html

and comparing the agent there with the one at

http://cs.stanford.edu/people/karpathy/reinforcejs/waterworld.html

They are acting in very similar environment, but have different AI
implementaitons.

My question is, which is the more advanced and complete AI agent between
the two versions?

What are the differences in the neural network implementations and which
is more intelligent agent?

Thanks,

Mike

—
Reply to this email directly or view it on GitHub
#8 (comment).

—
Reply to this email directly or view it on GitHub.

mryellow · 2015-11-02T19:29:31Z

priority sweeps, how could this be simply implemented with the current code? I assume it involves marking the experience memory with some value that represents good experience vs bad experience? So that the best memories are played back?

Seen it done that way in a paper somewhere (can't find it), they added an extra property to the experience objects with a value which was then used to prune experiences.

nosyndicate · 2015-11-03T04:11:14Z

Hi, mryellow, I am very interesting in the prioritized sweeping with experience replay paper you talk about, can you recall anything that is related to it that I can use to google it?

mryellow · 2015-11-03T04:27:54Z

Not sure I have it saved here, think it may have been an incomplete draft, and not that interesting otherwise.

They were using ReinforceJS, had modified this bit

reinforcejs/lib/rl.js

Line 1091 in 0b9315a

this.learnFromTuple(e[0], e[1], e[2], e[3], e[4])

to include some kind of crude threshold on a score. Believe it was effectively only really looking for actions with a non-zero reward.

One bit that sticks in my head is they were using a Greek alphabet Rho or Psi or something and had in-line comments with it showing properly encoded rather than LaTex or a substitute simple character.

mryellow · 2015-11-03T04:29:31Z

google: "this.learnFromTuple(e[0], e[1], e[2], e[3], e[4], e[5])"

On Learning Coordination Among Soccer Agents

http://robocup.csu.edu.cn/web/wp-content/uploads/2012/12/data/pdfs/robio12-116.pdf

mryellow · 2015-11-03T04:32:42Z

Hangon, only result, but not it, although I've seen this paper before.... and don't think it passed in the score, but checked it before firing learnFromTuple... So that's a wild goose chase, sorry for the noise.

nosyndicate · 2015-11-03T18:46:51Z

Thanks, mryellow

andrewcz · 2015-12-29T23:24:21Z

There is a new paper in regards to deep reinforcements learning in continous spaces by deepmind. Continuos control with deep reinforcements learning. Is there plans to add this in code form. Many thanks Andrew.

NullVoxPopuli · 2020-07-08T13:21:17Z

I'm also curious about the deepmind's learnings :D

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reinforcejs VS ConvNetjs #8

Reinforcejs VS ConvNetjs #8

functionsoft commented Oct 23, 2015

functionsoft commented Oct 23, 2015

karpathy commented Oct 23, 2015

functionsoft commented Oct 24, 2015

mryellow commented Nov 2, 2015

nosyndicate commented Nov 3, 2015

mryellow commented Nov 3, 2015

mryellow commented Nov 3, 2015

mryellow commented Nov 3, 2015

nosyndicate commented Nov 3, 2015

andrewcz commented Dec 29, 2015

NullVoxPopuli commented Jul 8, 2020

Reinforcejs VS ConvNetjs #8

Reinforcejs VS ConvNetjs #8

Comments

functionsoft commented Oct 23, 2015

functionsoft commented Oct 23, 2015

karpathy commented Oct 23, 2015

functionsoft commented Oct 24, 2015

mryellow commented Nov 2, 2015

nosyndicate commented Nov 3, 2015

mryellow commented Nov 3, 2015

mryellow commented Nov 3, 2015

mryellow commented Nov 3, 2015

nosyndicate commented Nov 3, 2015

andrewcz commented Dec 29, 2015

NullVoxPopuli commented Jul 8, 2020