-
-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fine Tuning #33
Comments
I'm working on simplifying and documenting how to perform fine tuning. I would say to use https://github.com/daanzu/kaldi-active-grammar/releases/download/v1.4.0/kaldi_model_daanzu_20200328_1ep-mediumlm.zip I've had more success with the procedure in |
Thanks, I got it working as well, i modified that one and added ivectors and used the dnn alignment instead of gmm...however my WER is very high, almost 100% wrong(the graph and language model is correct because i switched it with original final.mdl and the WER dropped down < 20%). Do you have the paramaters for fine tuning for your examples of about 30 hours...number epochs, learning rate initial, learning rate end, minibatch size |
I am planning on doing much more experimentation on this soon, but I think I had most success with parameters like this:
|
Thank you, works beautifully, dropped WER from 38% to 30% with only 1 hour of train data and independent test set |
Great! Thanks for the WER% info. |
@daanzu - Can you please tell me where can I find the following files: Also, I have symliked the following files. Was that expected? |
@yondu22 - Can you please elaborate how did you do this? What changes did you make in the script? |
used this script |
@vasudev-hv This is a nnet3 chain model, not tri3. The other files can be generated from the ones included in the download. |
I am currently also trying to setup a training pipeline. While I recently managed to get Training DataAround 9000 utterances from my day-to-day use, recorded and labeled by KAG within the Language model/Decoding GraphMy language model I build using the original The only thing remaining from here to build a decoding graph is the grammar ( Verification of the Decoding GraphI can successfully pair the original Getting
|
@JohnDoe02 Wow, great detailed write up! Thanks for posting it. As stated earlier, I had more success adapting the Sorry for being so slow to finish cleaning up my version. I will at least get the basic script posted ASAP. And I hope to get a nice Docker image put together to make it relatively easy. |
@daanzu I am very much looking forward to having a look at your training script. Don't polish too hard, anything helps! |
https://gist.github.com/daanzu/d29e18abb9e21ccf1cddc8c3e28054ff It's not pretty, but maybe it can be of use until I have something better. Regarding training files length, including a some amount of prose dictation of reasonable length definitely can be a big help. I think it still can be helpful to include short commands in the training, plus it is so easy to collect a large set of them through normal use, but they have weaknesses. |
Thanks for posting! I will try to get it to run as well. On first sight I find it rather interesting that you are using
Also no need to ramp up the tolerances, i.e., you are using Very interesting. I'll investigate. |
It's been a while since I started experimenting, and I can't recall exactly how I ended up with this. I think the |
I have experimented some more and prepared a very clean data set, with a about 1h of dictation, 1h of command-like speech and 1h from day-to-day use. However, this did not bring much improvement. For my new (more complex) data set I got a reference value for the WER of Next, I focused on getting your script to run as well. Interestingly, I ran into pretty much the same problems as with the With this setup, my results were similar as with the I investigated some further and found out that kaldi uses the phone sequences within the alignment files at the training stage to calculate a special 4-gram phone language model (cf., The denominator FST at: http://www.kaldi-asr.org/doc/chain.html). Apparantly, this phone language model is recalculated from scratch as the first prerequesite for the actual training ( @daanzu Would you mind uploading your original alignment files (I guess their total size should be on the order of 1G), so that I can check if their presense is the proper fix for the issue? |
@JohnDoe02 I am a bit puzzled by your experience, but I will try to find time to look into it more. Which alignment files are you looking for? I sure think I ran my fine tuning experiments on an export very similar to the published package. |
From looking at your script, the tree directory is defined in the beginning as As far as I can tell from my experiments, these files must have been present from the start when you ran your finetuning script (i.e., were not generated during execution). While |
@JohnDoe02 Ah, I didn't realize that was there and a dependency. Good find! Attached is (I think) the tree directory for the most recent models. It will be quite interesting to see your results and comparison. https://github.com/daanzu/kaldi-active-grammar/releases/download/v1.8.0/tree_sp.zip |
Nice, thanks for uploading! The scripts are now happy, no more missing files. The only remaining confusion is the In any case, I will run some experiments and report back how things go! |
So first of all let's start with the good news. I'm finally able to obtain models which perform better on the test set then the input model (16 WER vs. 20 WER is the best I got so far). Indeed, if plugging it into KAG, I am finally understood when saying Regarding your alignment files for which I really had hoped that they would be the magic missing piece in the puzzle (I was dreaming of error rates jumping down to ~3-5 WER -- I am so naive ;) ), this turned out to be not the case. Using your alignment files the error drops down to 20-30 WER after training in comparison to ~20 WER before training, i.e., I end up with a slightly less performant model. Using your files, I was not able to outperform the input model yet. For the result mentioned in the beginning, I used my alignment hack. The reason that error is so much lower than before is because there was a bug in hack (:D). There is an innocent-looking file in the tree-dir, For the moment I am now again focusing on gathering more data (currently, I have about 5h). |
I believe to have found the reason and correct fix for the @daanzu Could you please confirm that you have such a file with such content within your original src dir? Telling from your script it should be within |
@JohnDoe02 Good find! Yes, my source directory has a |
I commented out the lines and deleted ${chain_opts[@]}, and went with the default
3, 5 ,5 It seemed to work well for me, let me know your experience and if should change |
Against all expectations, my results degraded (a lot!) by using the For now, I switched strategy and am training from scratch. My first results look very promising with error rates around Downside is of course that training takes much longer and becomes impossible without GPU. Also I was not able yet to make one of my from-scratch-trained models work with KAG. |
@JohnDoe02 Surprising! FWIW, I trained my first 100%-personal model with just ~4h of audio, and was astounded at how decent the results were, considering. And building up a corpus isn't too hard during general use with my "action corrected" command. |
@johndoe I would double check you work, we all got better results |
@daanzu Yes, I am astounded as well. This works out-of-the-box much better than anticipated. @yondu22 Good point with the lowercase vs. uppercase. Obviously I ran into this problem, too. Fixed that one about 5 weeks ago. As mentioned above, for now I a simply training from scratch. My first results are nothing short of amazing, and while I am still facing an integration issue regarding dictation (cf., #39), spelling is already working much better. However, I am still interested in getting the transfer script running as well. One thing I am curious about is if aligning with a GMM would bring a difference. Also there is run_tdnn_wsj_rm_1a.sh which at least for the RM corpus seems to have given better results than the 1c variant we are using. |
@JohnDoe02 I'm skeptical that aligning with a GMM model would help, but if you want to give it a try, I could upload the GMM model that was used in the training of my published model. Experiments are always interesting! |
@daanzu Sounds great! I will give a shot |
@JohnDoe02 Here's the GMM model. Apologies for the delay! |
Hi @daanzu , I am trying to finetune Indian English model with my wake word data. I am not able to understand as to where to give the path for my dataset as well as the model downloaded from the VOSK webiste. Would really be helpful if you could clarify the path issue from the script you provided? |
@Ashutosh1995 I just pushed an updated version that is a bit more explicit about the inputs. However, the model you are using very well may not include everything necessary, especially to make fine tuning easy. https://gist.github.com/daanzu/d29e18abb9e21ccf1cddc8c3e28054ff |
@daanzu I got the path information. There was one more query I had. How much data is required to perform finetuning for a task like wake word detection on the model file you have provided ? |
@Ashutosh1995 Any amount of training data should be helpful, but of course, the more the better. I haven't yet tested as thoroughly and rigorously as I would like. https://github.com/daanzu/kaldi-active-grammar/blob/master/docs/models.md |
Hi @daanzu , splice_opts is not present in https://github.com/daanzu/kaldi-active-grammar/releases/download/v1.4.0/kaldi_model_daanzu_20200328_1ep-mediumlm.zip Could you please provide that ? |
@Ashutosh1995 |
Hi @daanzu, I reached Stage 9 but there I get an exception saying that Exception: Expected exp/nnet3_chain/tree_sp/ali.1.gz to exist. I couldn't find .gz file in the source model folder. Could you please help in this regard! I am a bit novice to kaldi hence figuring out things. |
@Ashutosh1995 See #33 (comment) above, and use that file. |
Hi @daanzu, I have followed the steps of the file that you have here in this comment. Then when I replaced only the final.md file obtained after finetune on your model vosk-model-en-us-daanzu-20200905-lgraph I did not get any results when I use it to test my validation set using the Android demo of vosk-api (completely empty results). With the original model I get excellent results. Do I need to replace any other file on the Android model in order to get the results from the fine-tuned model (I've searched and I did not find which files we should change in the original models)? Or is my data just not enough and I'm getting no errors due to that? Thanks for your awesome job and I hope you can help me on that. |
Hi @daanzu I need help for fine tune the aspire model("vosk-model-en-us-aspire-0.2"). The model has this following structure Any help please? |
Hi @daanzu , I am fine tuning the aspire model, and I also stuck with the stage 9 without the tree_sp for aspire. Can you please upload that for aspire as well? Thank you . |
the size of final.mdl download from https://github.com/daanzu/kaldi-active-grammar/releases/download/v1.4.0/vosk-model-en-us-daanzu-20200328.zip is about 70M ;but when i train from zero ,i use the vosk-api-master/training/local/chain/run_tdnn.sh model parameter , in the end ,my final.mdl is about 20M. |
i use the targe lexicon.txt , thus generate the phones.tx.txt,L.fst and so on ,but in the steps/nnet3/align_lats.sh,stage ,there is an error "ERROR (compile-train-graphs[5.5]:GetArc():context-fst.cc:177) ContextFst: CreateArc, invalid ilabel supplied [confusion about phone list or disambig symbols?]: 335 |
the L.fst ,phones.txt can genete from the download? |
is it possible to finetune this model using this method? vosk-model-tl-ph-generic-0.6 |
I am trying to finetune the model [vosk-model-small-en-us-0.15] using a noisy dataset. (Librispeech clean augmented with noisy data) . I tried following the above steps but I wanted to know if anyone tried this before or is the performance similar to other vosk models. also, while changing the hyperparameters for tuning, can we try the grid method here in kaldi for vosk to understand what parameters are good for a bteer performance or emperical experimentation , like trying to change the parameters randomly, is only possible? |
Do you have the procedure for fine tuning the model with our own data?
Would we use this model
https://github.com/daanzu/kaldi-active-grammar/releases/download/v1.4.0/kaldi_model_daanzu_20200328_1ep-mediumlm.zip
or
https://github.com/daanzu/kaldi-active-grammar/releases/download/v1.4.0/vosk-model-en-us-daanzu-20200328.zip
then apply this script and update the paths?
https://github.com/kaldi-asr/kaldi/blob/master/egs/rm/s5/local/chain/tuning/run_tdnn_wsj_rm_1c.sh
The text was updated successfully, but these errors were encountered: