You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found that 54 % of phrases in my test set were misaligned.
Misaligned meaning
Cut too early from the end
Cut too late from the previous fragment
Shifted altogether
I had voice detection control whether the aligned parts matched(aeneas -> cue list -> cut audio -> have voice recognition detect speech -> compare to entries in cue list with a similarity algorithm).
Now, more than half misaligned is discouraging.
Currently I did not see many options to influence recognition, the parameters of the cli seem rather cosmetic in influence, barring language selection and input text type.
I would like to improve alignment, have a confidence message to be able to quickly review or discard.
The pipeline contains everything required for a confidence parameter. Also other parameters for deep control are important.
What I am deeply missing is a threshold parameter in decibel to define pauses and audio - this would eliminate premature cuts for good.
The text was updated successfully, but these errors were encountered:
I used the web app for aligning.
I found that 54 % of phrases in my test set were misaligned.
Misaligned meaning
I had voice detection control whether the aligned parts matched(aeneas -> cue list -> cut audio -> have voice recognition detect speech -> compare to entries in cue list with a similarity algorithm).
Now, more than half misaligned is discouraging.
Currently I did not see many options to influence recognition, the parameters of the cli seem rather cosmetic in influence, barring language selection and input text type.
I would like to improve alignment, have a confidence message to be able to quickly review or discard.
The pipeline contains everything required for a confidence parameter. Also other parameters for deep control are important.
What I am deeply missing is a threshold parameter in decibel to define pauses and audio - this would eliminate premature cuts for good.
The text was updated successfully, but these errors were encountered: