-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
any information on the data format? #11
Comments
The format is identical to the Texas Instruments format, as used on their speech chips. Compression basically works like this:
Keep any eye out for the bit order within a byte - the format is bit oriented, and byte based formats use inconsistent ordering. The only easily available software that encodes recorded speech already is QBox Pro - it's old, and I've never got it running properly. It's floating around the web in various places. A modern open source way of generating this format would be awesome, and be welcomed by many communities. I identified two shortcuts you might want to investigate if you're interested in rule based text to speech:
Bear in mind that english to phoneme to coefficient mapping is likely to take at least 8K of code and data - quite a chunk of Arduino code space. |
Just curious if anything has changed on on-the-fly tts? I'm blind, I grew up with an Apple II E computer. The computer had an Echo II from Street Electronics installed. This is an expansion card based on the same chip. A program, Textalker, read changes on the screen. There is an emulator of this in action on several disk images at https://bluegrasspals.com/blindapple/. The point is that Textalker generated rule-based speech on the fly. I'm still learning how things work, but it may also be helpful. As I said, I'm blind, so I'm very interested in finding a tts library like this. Thanks for reading. |
@jscuster Walter |
i'm trying to write a dynamic synthesis and have enough linguistic knowledge about phonemes, transcription, lpc, etc. and would either use a little neural net to translate the text directly into the target coefficients or do it by rules.
before doing so, i need to understand what i would output (in terms of koefficients, energy and repetitions and possibly more?).
however, i tried to understand the file format from the code. it seems do be compressed somehow since, voiced (ten) and unvoiced (four) sounds get different numbers of coefficients etc. i also don't quite get how to encode repetitions and energy.
would be great to have more info. thanks!
The text was updated successfully, but these errors were encountered: