You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It'd be interesting to add these features as opt-in capabilities, but they are really not required until we want to run evaluations in Swift. Opening this issue for future reference.
The
transformers
version of the Whisper tokenizer has anEnglishTextNormalizer
(https://github.com/huggingface/transformers/blob/d9deddb4c18410a14952537a91099319ecedb869/src/transformers/models/whisper/tokenization_whisper.py#L529) that is initialized with the contents of this file. There's also aBasicTextNormalizer
and some additional stuff.These normalizers are not applied during regular use of the tokenizer. They can be enabled by passing custom flags to
decode
. This usually happens during quality evaluation, as explained in this PR, or as seen in the Open ASR leaderboard, which contains a hardcoded version of the English normalization file.It'd be interesting to add these features as opt-in capabilities, but they are really not required until we want to run evaluations in Swift. Opening this issue for future reference.
h/t @ZachNagengast for his help diving into this.
The text was updated successfully, but these errors were encountered: