number of LSTM blocks and cells #1

xy0806 · 2016-05-06T03:10:53Z

Dear Yaseen,
thanks for your clean code.
As you know, there have the conceptions 'LSTM block' and 'LSTM cell'. But in a lot of LSTM example codes, including yours, there seems to be no attention was paid to this difference. In the codes, only cells are created, while no blocks.
After reading and thinking about this problem, I got the conclusion that: the LSTM with m blocks with n cells and the LSTM with one block with m*n cells are actually the same.
Then, how do you think about this problem and could you give me any hints about this issue?

Thanks,
Xin Yang

uyaseen · 2016-05-06T19:49:32Z

Hi Yang,

Glad to know that you found the code helpful.

The distinction between cell and blocks eroded over time, most of modern LSTM architecuters have one cell per block, (which in my opinion is simple), regarding your question, "why not much attention was paid to this difference", I am afraid I might not have a very clear answer, I would say:

-> Do we have any empirical evidence which suggests that LSTMs with multiple cells architecture works better than "one cell per block" architecture ? [I am not aware of any such evidence, If we don't have any such evidence then people will prefer less cumbersome model]; Also same applies for peep-hole connections, some people don't use them as they don't find them very helpful
-> Why not increase the size of the memory (capacity) of a cell instead of adding more cells in a block ? (I would prefer increasing the memory, it's simple & more interpretable ; and to me it looks like they are equivalent as well, since both architectures are using the same gates ["cells within a block use the same gates"])
-> Simple is always better (GRUs are simple version of LSTMs and have almost equivalent performance in many task, therefore, are very popular these days)

I hope above explanation helps a bit, [1] explains the difference between various LSTM architectures.

[1] LSTM: A Search Space Odyssey

xy0806 · 2016-05-07T02:38:39Z

Dear Yaseen,

Thanks for the quick and informative reply.
I think I may need to ask one more key question which is closely related to
my thoughts and really confuses me now:
if I want to implement a LSTM in which each block contains multiple cells,
how should I modify your code?
could you teach me something about the creation step of multi-cell blocks?

Thanks,
Xin Yang

On 7 May 2016 at 03:49, Usama Yaseen [email protected] wrote:

Hi Yang,

Glad to know that you found the code helpful.

The distinction between cell and blocks eroded over time, most of modern
LSTM architecuters have one cell per block, (which in my opinion is
simple), regarding your question, "why not much attention was paid to this
difference", I am afraid I might not have a very clear answer, I would say:

-> Do we have any empirical evidence which suggests that LSTMs with
multiple cells architecture works better than "one cell per block"
architecture ? [I am not aware of any such evidence, If we don't have any
such evidence then people will prefer less cumbersome model]; Also same
applies for peep-hole connections, some people don't use them as they don't
find them very helpful
-> Why not increase the size of the memory (capacity) of a cell instead of
adding more cells in a block ? (I would prefer increasing the memory, it's
simple & more interpretable ; and to me it looks like they are equivalent
as well, since both architectures are using the same gates ["cells within a
block use the same gates"])
-> Simple is always better (GRUs are simple version of LSTMs and have
almost equivalent performance in many task, therefore, are very popular
these days)

I hope above explanation helps a bit, [1] explains the difference between
various LSTM architectures.

[1] LSTM: A Search Space Odyssey
http://arxiv.org/pdf/1503.04069v1.pdf

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#1 (comment)

uyaseen · 2016-05-07T10:12:50Z

I have to look at few papers again to make sure I don't miss anything, but these days I am travelling and don't even have access to my laptop, you have to wait at-least one week for the reply (I am sorry it cannot be earlier than that :/)

xy0806 · 2016-05-07T10:51:19Z

ok, i can wait for that. i can play with the most simple one those days. ^_^

best
On 7 May 2016 18:12, "Usama Yaseen" [email protected] wrote:

I have to look at few papers again to make sure I don't miss anything, but
these days I am travelling and don't even have access to my laptop, you
have to wait at-least one week for the reply (I am sorry it cannot be
earlier than that :/)

—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#1 (comment)

son20112074 · 2016-10-18T10:15:47Z

Thank Yaseen and Xin Yang. This is also my problem.And now, i can more understand.

DongGuangchang · 2017-03-27T07:33:06Z

Dear Yaseen,
I ，a rookie in depth learning, encountered some difficulties when debugging your program on recurrent neural networks:
first, it has a error when debugging sample.py . ERROR (theano.sandbox.cuda): nvcc compiler not found on $PATH. Check your nvcc installation and try again.
data size: 49388, vocab size: 75
train(..)
load_data(..)
[Train] # of rows: 987
... transferring data to the GPU
Traceback (most recent call last):
... building the model
File "F:/DL-File/RNN/theano-recurrence-b9b8a82410be005d5a3121345e8d62c5ca547982/train.py", line 145, in
n_h=100, use_existing_model=True, n_epochs=600)
File "F:/DL-File/RNN/theano-recurrence-b9b8a82410be005d5a3121345e8d62c5ca547982/train.py", line 50, in train
rec_params = pkl.load(f)
EOFError

Second, it has a error when debugging train.py .
ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (outputs_info in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 2) has dtype int32, while the result of the inner function (fn) has dtype int64. This can happen if the inner function of scan results in an upcast or downcast.
I hope you help explain the reasons for the above mistakes。Thank you very much！
Thanks,
Liang Dong

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

number of LSTM blocks and cells #1

number of LSTM blocks and cells #1

xy0806 commented May 6, 2016 •

edited

Loading

uyaseen commented May 6, 2016 •

edited

Loading

xy0806 commented May 7, 2016

uyaseen commented May 7, 2016

xy0806 commented May 7, 2016

son20112074 commented Oct 18, 2016

DongGuangchang commented Mar 27, 2017

number of LSTM blocks and cells #1

number of LSTM blocks and cells #1

Comments

xy0806 commented May 6, 2016 • edited Loading

uyaseen commented May 6, 2016 • edited Loading

xy0806 commented May 7, 2016

uyaseen commented May 7, 2016

xy0806 commented May 7, 2016

son20112074 commented Oct 18, 2016

DongGuangchang commented Mar 27, 2017

xy0806 commented May 6, 2016 •

edited

Loading

uyaseen commented May 6, 2016 •

edited

Loading