Fine-tune Model to calculate CTC loss in Inference part #2991

monologue110 · 2021-10-13T02:03:30Z

monologue110
Oct 13, 2021

In the inference part, the model used is a fine-tuned stt-en-citrinet model.
nemo_asr.models.EncDecCTCModel.restore_from(<path_to_.nemo>)
And straightforward to use the function model.transcribe(paths2audio_files ) to get the transcription.

Now I want to move one step back that only calculates the loss (CTC loss). Is there any function and how to do that?

Answered by titu1994

Oct 13, 2021

If you have the ground truth labels, you can follow the implementation of the training_step() in EncDecCTCModel and see how we call forward() and then pass the logits to the loss function.

Normally you could use model.transcribe() with logprobs=True to get the logits to pass to the loss function, however that doesn't provide the length of the actual encoded audio segment to pass to CTC loss. You can approximate it with original acoustic length after preprocessing // model stride (dependent on each model) and pass that to the CTC loss.

We will look into more useful ways of storing this information and providing it to users via transcribe(). But this approach should work in the mean time.

N…

View full answer

titu1994 · 2021-10-13T02:37:37Z

titu1994
Oct 13, 2021
Maintainer

If you have the ground truth labels, you can follow the implementation of the training_step() in EncDecCTCModel and see how we call forward() and then pass the logits to the loss function.

Normally you could use model.transcribe() with logprobs=True to get the logits to pass to the loss function, however that doesn't provide the length of the actual encoded audio segment to pass to CTC loss. You can approximate it with original acoustic length after preprocessing // model stride (dependent on each model) and pass that to the CTC loss.

We will look into more useful ways of storing this information and providing it to users via transcribe(). But this approach should work in the mean time.

Needless to say, you will need the ground truth labels to undergo the exact same preprocessing - basically tokenization and detokenization. It would make things easier to try to follow the setup data methods and then use the created dataloader directly

2 replies

monologue110 Oct 13, 2021
Author

In the class ctc_loss, how to define num_classes?

titu1994 Oct 13, 2021
Maintainer

Vocabulary size of the model - 1024 for the Citrinet models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-tune Model to calculate CTC loss in Inference part #2991

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Fine-tune Model to calculate CTC loss in Inference part #2991

monologue110 Oct 13, 2021

Replies: 1 comment · 2 replies

titu1994 Oct 13, 2021 Maintainer

monologue110 Oct 13, 2021 Author

titu1994 Oct 13, 2021 Maintainer

monologue110
Oct 13, 2021

Replies: 1 comment 2 replies

titu1994
Oct 13, 2021
Maintainer

monologue110 Oct 13, 2021
Author

titu1994 Oct 13, 2021
Maintainer