Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what's the logic of performing normalization each time step in ctc? #8

Open
ghost opened this issue Dec 9, 2015 · 2 comments
Open

Comments

@ghost
Copy link

ghost commented Dec 9, 2015

Hi, I noticed in ctc.py, to compute the forward/backward variable, there is normalization performed at each time step (e.g., Line32, Line54). I can't figure out the logic behind, can you guys explain what the forward variable 'alphas[s,t]' does mean after the normalization? And if I want to compute the conditional probability p(seq|params), what is the equivalent? I believe this probability can not be calculated by alphas[L-1, T-1] + alphas[L-2, T-1] after the normalization.

@zxie
Copy link
Collaborator

zxie commented Dec 13, 2015

Sorry for delay, replying in case.

As you may have already seen, the normalization is just to prevent underflow as described right before Section 4.2 in http://www.cs.toronto.edu/~graves/icml_2006.pdf

The log conditional probability can still be computed as described in that section (lines 55 and 83 in code).

@ghost
Copy link
Author

ghost commented Dec 14, 2015

@zxie 👍 Thank you. According to your mentioned paper I found the deduction in "A tutorial on hidden Markov models and selected applications in speech recognition" by Rabiner, 1989.
The reason I asked the question is because in Alex Gray's Ph.D. dissertation, he mentioned in Section7.3.1 "A good way to avoid this is to work in the log scale...Note that rescaling the variables at every timestep (Rabiner,1989) is less robust, and can fail for very long sequences."
I've tested various CTC implementations by @skaae (https://github.com/skaae/Lasagne-CTC), @mohammadpz (https://github.com/mohammadpz/CTC-Connectionist-Temporal-Classification), myself and you. It turns out your implementation produces the best result in estimation of probability p(l|x). The former three implementations are all in log scale. Do you have any comment on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant