Small issue regarding error message ISO 639-2 codes and `Echogarden.recognize` #49

rbozan · 2024-04-28T13:44:01Z

There's a small issue regarding the error message when supplying ISO 639-2 codes to Echogarden.recognize as such:

  const result = await Echogarden.recognize(input, {
    whisper: {
      model: 'small'
    },
    language: 'spa'
  });

This returns

Transcode with command-line ffmpeg.. 5.3ms
Crop using voice activity detection.. 3.5ms
Prepare for recognition.. 0.3ms
Language specified: Spanish (spa)
Load whisper module.. 0.3ms
The language Spanish is not supported by the Whisper engine.

While supplying 'es' works fine

Transcode with command-line ffmpeg.. 5.5ms
Crop using voice activity detection.. 10.2ms
Prepare for recognition.. 2.0ms
Language specified: Spanish (es)
Load whisper module.. 13.5ms
Load tokenizer data.. 72.6ms
Create encoder inference session for model 'small'.. 732.2ms
(--etcetera--)

So I have to supply ISO 639-1 language codes, not ISO 639-2. But the message indicates that Spanish is not supported at all.

The text was updated successfully, but these errors were encountered:

rotemdan · 2024-04-28T13:57:13Z

Thanks,

Two letter language codes are used throughout all synthesis and recognition operations, I believe.

The error message makes it look like the language isn't supported. I should change it to also test if the language format is supported first, though maybe adding support for the three letter codes immediately could be a more thorough solution.

Currently, the error message itself actually does correctly parse spa as Spanish, because of this method:

export function languageCodeToName(languageCode: string) {
	const languageNames = new Intl.DisplayNames(['en'], { type: 'language' })

	let translatedLanguageName: string | undefined

	try {
		translatedLanguageName = languageNames.of(languageCode)
	} catch (e) {
	}

	return translatedLanguageName || 'Unknown'
}

This translation make it look like it understands what the language is, but it's currently only used for the error message itself.

Also, adding support for full language names like french has also been on my task list for a while.

I'll need to find some way to normalize all language codes or names to the two letter ISO 639-1 ones, and their extensions, like pt-br.

rotemdan added the enhancement New feature or request label Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small issue regarding error message ISO 639-2 codes and `Echogarden.recognize` #49

Small issue regarding error message ISO 639-2 codes and `Echogarden.recognize` #49

rbozan commented Apr 28, 2024 •

edited

Loading

rotemdan commented Apr 28, 2024 •

edited

Loading

Small issue regarding error message ISO 639-2 codes and Echogarden.recognize #49

Small issue regarding error message ISO 639-2 codes and Echogarden.recognize #49

Comments

rbozan commented Apr 28, 2024 • edited Loading

rotemdan commented Apr 28, 2024 • edited Loading

Small issue regarding error message ISO 639-2 codes and `Echogarden.recognize` #49

Small issue regarding error message ISO 639-2 codes and `Echogarden.recognize` #49

rbozan commented Apr 28, 2024 •

edited

Loading

rotemdan commented Apr 28, 2024 •

edited

Loading