Finetuning on small dataset #12

Laope94 · 2024-10-16T15:08:25Z

Hi,
I am playing around this model, it is great so far, but I'd like to experiment a bit with fine tuning on small portion of data, real world use case might be to improve recognition of words that are not standard part of language, let it be e.g. technical terms, local dialects or slang. I'd like to verify my steps so far and maybe ask thing or two.

This is what I am doing - first run stages 1-4, this pretty much just creates dump folder and do some validation.
Then I take bpe.model file and asr_stats folder from files on hugging face. What I am missing here is tokens.txt file, but I reconstructed it using config.yaml in asr_train folder, because token list is also there. I continue from step 10 and loading .pth file with --pretrained_model parameter. Also I need to adapt train config a bit since e.g. warmup for several thousand steps does not make sense in this case.

First question is - is this approach valid or am I missing something?

Then another thing.. In this scenario it makes sense to me freezing the most of layers. I haven't found any similar example though and I am struggling to understand meaning of some layers. Any advice which layers to keep unfrozen? Would it be just embed layers on both encoder and decoder, maybe 1-2 highest encoders and decoders, or also something more (maybe criterion_att and ctc)?

Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finetuning on small dataset #12

Finetuning on small dataset #12

Laope94 commented Oct 16, 2024

Finetuning on small dataset #12

Finetuning on small dataset #12

Comments

Laope94 commented Oct 16, 2024