You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I am playing around this model, it is great so far, but I'd like to experiment a bit with fine tuning on small portion of data, real world use case might be to improve recognition of words that are not standard part of language, let it be e.g. technical terms, local dialects or slang. I'd like to verify my steps so far and maybe ask thing or two.
This is what I am doing - first run stages 1-4, this pretty much just creates dump folder and do some validation.
Then I take bpe.model file and asr_stats folder from files on hugging face. What I am missing here is tokens.txt file, but I reconstructed it using config.yaml in asr_train folder, because token list is also there. I continue from step 10 and loading .pth file with --pretrained_model parameter. Also I need to adapt train config a bit since e.g. warmup for several thousand steps does not make sense in this case.
First question is - is this approach valid or am I missing something?
Then another thing.. In this scenario it makes sense to me freezing the most of layers. I haven't found any similar example though and I am struggling to understand meaning of some layers. Any advice which layers to keep unfrozen? Would it be just embed layers on both encoder and decoder, maybe 1-2 highest encoders and decoders, or also something more (maybe criterion_att and ctc)?
Thanks.
The text was updated successfully, but these errors were encountered:
Hi,
I am playing around this model, it is great so far, but I'd like to experiment a bit with fine tuning on small portion of data, real world use case might be to improve recognition of words that are not standard part of language, let it be e.g. technical terms, local dialects or slang. I'd like to verify my steps so far and maybe ask thing or two.
This is what I am doing - first run stages 1-4, this pretty much just creates dump folder and do some validation.
Then I take bpe.model file and asr_stats folder from files on hugging face. What I am missing here is tokens.txt file, but I reconstructed it using config.yaml in asr_train folder, because token list is also there. I continue from step 10 and loading .pth file with --pretrained_model parameter. Also I need to adapt train config a bit since e.g. warmup for several thousand steps does not make sense in this case.
First question is - is this approach valid or am I missing something?
Then another thing.. In this scenario it makes sense to me freezing the most of layers. I haven't found any similar example though and I am struggling to understand meaning of some layers. Any advice which layers to keep unfrozen? Would it be just embed layers on both encoder and decoder, maybe 1-2 highest encoders and decoders, or also something more (maybe criterion_att and ctc)?
Thanks.
The text was updated successfully, but these errors were encountered: