-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vague output for audio #25
Comments
Hi @lixinghe1999 , our model is mainly trained on natural sound like bird chirping, dog barking and train passing, so it is hard to distinguish human speech. Here are two solutions to enhance it:
|
Thank you for your rapid reply. However, it still outputs meaningless results for other sounds, like musical instrument sounds. Can you give me some hints to solve it? I believe it is not necessary to retrain Does it possible for the audio duration? Since the IMU duration is fixed to 2 seconds, I also fix the audio duration to 2 seconds |
It may also be related to the sampling length. We sample 1024 frames in total. Lines 81 to 86 in 913638c
|
I slightly modify the eval code of audio to run on my dataset, however, the outputs are vague even the audio is speech.
There are all like the blow ones:
I attach my code below
The text was updated successfully, but these errors were encountered: