-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of CTC in pure theano with custom gradient #108
base: master
Are you sure you want to change the base?
Implementation of CTC in pure theano with custom gradient #108
Conversation
…ld hopefully be more robust to precision issues)
Sorry for the late reply, and thanks for the heads up on the mailing list. Looks cool at first glance! Not quite sure if this belongs in |
The model from the paper and the data pre-processing part are not overly complicated at first sight, but the prediction algorithm (prefix search) might require some work. I'll try to look into it this week-end. |
What about a toy example that uses a less complex prediction method in the end (e.g., just sampling)? |
It seems there are some precision issues on real world data (TIMIT speech). I need to investigate that first. When I get it to work reliably I think I will run the model with a simple prediction scheme (greedy) for the demo. |
The CTC loss function now takes predictions in log space (before softmax) to avoid precision issues.
The latests commit should fix most precision issues, but there is still some divergence at some point. The final output layer values (before softmax) explode at some point in the training. This happens before any useful output is obtained, the network just learns to predict the blank class all the time. I have added a Tensorflow implementation for the sake of the comparison and a test notebook to compare the TF implementation of CTC with mine. The loss values of my implementation seem correct, but the gradients are a bit off. I was not able to track down the reason any further. If anyone is interested in getting CTC in pure Theano, some help would be very welcome ;-) |
This (predicting blanks) seems to be a common effect:
So maybe this is a good sign ;)
And the TF implementation works well with the same dataset? PS: Looking at your notebook, when you call |
Ok, the damn error is fixed now, both loss and gradient are now in line with tensorflow's implementation. Thanks for reading through this, I have corrected the pickle line. I will let training run for a long time to see if it goes past predicting blank all the time since this is the expected behaviour. I'm now waiting for some help from the Theano people because the binary variables I use in some places seem to break the graph optimization when the target device is a GPU. |
Great! Bad luck -- I think Theano would have optimized the log-sum-exp expression by itself, but I'm not sure if optimization breaks depending on
Any progress on this? Do you need some advice? If you don't need those variables for advanced indexing, you may get away by simply casting them to floatX. |
I wanted something that would remain robust even with optimizations off, especially because I was debugging it myself ;-) so the handwritten logsumexp is safer when properly implemented. For the optimization errors, I have opened a discussion on theano-users ML but the activity is a bit low right now. It's actually not too serious because it will only trigger a warning with default .theanorc settings. I think the CTC part is done, but for the experimental demo the results are not good. There must be an issue with the model or the parameters or the data, some difference between this code and the paper. If somebody familiar with CTC trained model could have a look that would be great. Meanwhile I will give it a try when I have some time. |
Unfortunately, this comes a bit late as theano has recently merged a PR adding some bindings to warp-ctc (Theano/Theano#5949). But I wanted to finish this anyway, so here it is :-).
This implementation is:
I think it can still be useful to anyone who wants to modify the original cost function. And it can run without extra dependency on any plateform where theano runs already.
Notes: