Training Error when run tedlium recipe #213

yuexianghubit · 2019-05-22T01:22:30Z

Hi Alls:

I already install essen and I am trying to run the tedlium recipe, but I got the error:

train-ctc-parallel --report-step=1000 --num-sequence=20 --frame-limit=25000 --learn-rate=0.00004 --momentum=0.9 --verbose=1 'ark,s,cs:apply-cmvn --norm-vars=true --utt2spk=ark:data/train_tr95/utt2spk scp:data/train_tr95/cmvn.scp scp:exp/train_phn_l5_c320/train.scp ark:- | add-deltas ark:- ark:- |' 'ark:gunzip -c exp/train_phn_l5_c320/labels.tr.gz|' exp/train_phn_l5_c320/nnet/nnet.iter0 exp/train_phn_l5_c320/nnet/nnet.iter1
WARNING (train-ctc-parallel:SelectGpuId():cuda-device.cc:150) Suggestion: use 'nvidia-smi -c 1' to set compute exclusive mode
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:262) Selecting from 4 GPUs
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(0): GeForce GTX 1080 Ti free:189M, used:10985M, total:11175M, free/total:0.0169513
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(1): GeForce GTX 1080 Ti free:11015M, used:163M, total:11178M, free/total:0.985418
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(2): GeForce GTX 1080 Ti free:11015M, used:163M, total:11178M, free/total:0.985418
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:277) cudaSetDevice(3): GeForce GTX 1080 Ti free:11015M, used:163M, total:11178M, free/total:0.985418
LOG (train-ctc-parallel:SelectGpuIdAuto():cuda-device.cc:310) Selected device: 1 (automatically)
LOG (train-ctc-parallel:FinalizeActiveGpu():cuda-device.cc:194) The active GPU is [1]: GeForce GTX 1080 Ti free:10983M, used:195M, total:11178M, free/total:0.982556 version 6.1
LOG (train-ctc-parallel:PrintMemoryUsage():cuda-device.cc:334) Memory used: 0 bytes.
LOG (train-ctc-parallel:DisableCaching():cuda-device.cc:731) Disabling caching of GPU memory.

ERROR (train-ctc-parallel:ExpectToken():io-funcs.cc:197) Expected token "<ForwardDropoutFactor>", got instead "<DropFactor>".
ERROR (train-ctc-parallel:ExpectToken():io-funcs.cc:197) Expected token "<ForwardDropoutFactor>", got instead "<DropFactor>".

[stack trace: ]
eesen::KaldiGetStackTraceabi:cxx11
eesen::KaldiErrorMessage::~KaldiErrorMessage()
eesen::ExpectToken(std::istream&, bool, char const*)
eesen::BiLstm::ReadData(std::istream&, bool)
eesen::Layer::Read(std::istream&, bool, bool)
.
.
.
eesen::Net::Read(std::istream&, bool)
eesen::Net::Read(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)
train-ctc-parallel(main+0xbb1) [0x434345]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0) [0x7fc1fec83830]
train-ctc-parallel(_start+0x29) [0x4320f9]

How to fix it?
Thank you!

The text was updated successfully, but these errors were encountered:

fmetze · 2019-05-23T03:30:29Z

The error is probably caused by an inconsistency between your conf/*.proto file and the actual model. It seems that the prototype file has been generated with a prototype in the librispeech recipe (or https://github.com/jb1999/eesen), while the actual code is srvk's Eesen?

Eesen's standard acoustic model does not contain the ForwardDropFactor, but jb1999's Eesen does.

yuexianghubit · 2019-05-23T10:45:13Z

I installed srvk's Eesen, not jb1999's Eesen.
Today i tried the librispeech recipe, the acoutsic model training process can run normally, the nnet proto is below:
<Nnet>
<BiLstmParallel> <InputDim> 360 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <ForwardDropoutFactor> 0.2 <ForwardSequenceDropout> T <RecurrentDropoutFactor> 0.2 <RecurrentSequenceDropout> T <NoMemLossDropout> T <TwiddleForward> T
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <ForwardDropoutFactor> 0.2 <ForwardSequenceDropout> T <RecurrentDropoutFactor> 0.2 <RecurrentSequenceDropout> T <NoMemLossDropout> T <TwiddleForward> T
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <ForwardDropoutFactor> 0.2 <ForwardSequenceDropout> T <RecurrentDropoutFactor> 0.2 <RecurrentSequenceDropout> T <NoMemLossDropout> T <TwiddleForward> T
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0 <ForwardDropoutFactor> 0.2 <ForwardSequenceDropout> T <RecurrentDropoutFactor> 0.2 <RecurrentSequenceDropout> T <NoMemLossDropout> T <TwiddleForward> T
<AffineTransform> <InputDim> 640 <OutputDim> 44 <ParamRange> 0.1
<Softmax> <InputDim> 44 <OutputDim> 44
</Nnet>

but when i ran the tedlium recipe , the acoustic model training got the error i sent before. And the nnet proto now is:
<Nnet>
<BiLstmParallel> <InputDim> 120 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0
<BiLstmParallel> <InputDim> 640 <CellDim> 640 <ParamRange> 0.1 <LearnRateCoef> 1.0 <MaxGrad> 50.0 <FgateBias> 1.0
<AffineTransform> <InputDim> 640 <OutputDim> 78 <ParamRange> 0.1
<Softmax> <InputDim> 78 <OutputDim> 78
</Nnet>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training Error when run tedlium recipe #213

Training Error when run tedlium recipe #213

yuexianghubit commented May 22, 2019 •

edited

Loading

fmetze commented May 23, 2019

yuexianghubit commented May 23, 2019

Training Error when run tedlium recipe #213

Training Error when run tedlium recipe #213

Comments

yuexianghubit commented May 22, 2019 • edited Loading

fmetze commented May 23, 2019

yuexianghubit commented May 23, 2019

yuexianghubit commented May 22, 2019 •

edited

Loading