-
Notifications
You must be signed in to change notification settings - Fork 721
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The README needs a hello_world example #275
Comments
Hi! Thanks for pointing this out, as pocketsphinx_continuous.exe is, quite clearly, gone. And it was never useful for building applications in the first place. I will fix this documentation as soon as possible, for now it will just be removed to avoid further confusion :) |
In any case the preferred way to use the library will be through Python. Java is just as bad as C in my opinion ;-) |
The python way is certainly the way things are going but having the c
version of an API means us java people can write the wrapper fairly easily
using JNI and the lib.so and get the speed advantages of C. The useful bits
for key phrase spotting from my perspective are a recogniser.run() method,
and then a "call-back" mechanism (listeners in java) that is registered
with the object(?) doing the running. Another call-back function for
silence (non speech) would be good, and, ultimately a "soundex
<https://en.wikipedia.org/wiki/Soundex>" tokenizer for out of vocabulary? -
I tried to convince the kaldi people of this but to no effect. Confusing
"dog" with "god" is forgivable and needing clarification (or pragmatic
disambiguation); confusing "dog" with "bus" is not sensible to us humans.
…On Tue, 9 Aug 2022 at 12:34, David Huggins-Daines ***@***.***> wrote:
In any case the preferred way to use the library will be through Python.
Java is just as bad as C in my opinion ;-)
—
Reply to this email directly, view it on GitHub
<#275 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALY3LWITUATVU42TJQEXPJTVYI63PANCNFSM555XCX3A>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Ah, good to know. The C API certainly won't go away... my plan is to integrate the WebRTC VAD code since it's the standard and its licence is compatible. The problem is that pocketsphinx_continuous existed as example code which never really worked well, but worked enough that people tried to build things with it, and then instead of doing live ASR correctly, it was decided to just keep hacking on the existing toy code. Coqui has a lot of good examples of, in my opinion, the right way to do streaming ASR: https://github.com/coqui-ai/STT-examples. For Java is it preferable to use SWIG or just JNI directly? I removed the SWIG code because with SWIG it was too difficult to make a good Python API, and other languages like Ruby weren't actually using it. Originally the SWIG wrapper was just there to support Java on Android. I certainly won't support anything Java as I'm already spending too much of my time on PocketSphinx which I consider to be obsolete in general... Another long-standing problem is that the API isn't really designed correctly for callbacks. This is one of the reasons why I removed the audio code, as it was based around the thoroughly obsolete assumption that one gets audio by opening /dev/audio and doing blocking read() calls on it. |
Took a quick look; Coqui looks good to me (Java bit is deprecated). Java
tried (and failed) to standardize how to connect to a mic. Today we seem to
do it per OS, and just have our application check for the OS and grab the
right executable (CMake on this linux box). The key I think is to have the
API for C on each os be the same. The API may not want to be the same for
each application programming language (my son is using PureData as I type.
Ouch.) but can I suggest that for python..C# java etc the API exposes the C
API, and then extends it in a python/C#/Java kind of way, the key being to
maintain and expose the C API. My favourite example of this is the pigpio
package <http://abyz.me.uk/rpi/pigpio/> for the pis. Joan looks after the C
interface; others port the C to their preferred language. Documenting the
C interface is key however ...if I could figure out how doxygen is meant to
work :-/ WebRTC looks heavy weight to me "firebase" "ICE cadidates" and
"rooms" all on the opening page of the intro. At the C level, what about
registering a listener and then writing byte arrays to
pocketSphinx_continuous? Leave it to us to figure out how to get a
microphone to produce raw bytes in the code and writing them. A good
feature of packages I have used is when they say "implement X. Instructions
for this are available at Y. You can test your X by using this code...
When that is working, connect your X to our Z by doing this ..."
JNI is my preferred way of working - it is plodding, not exciting, and
strict, but it never breaks. You have to do things in the right order
(write the java first), and get the .so file in the right place, and the
classpath needs to have the wrapper. Done. Both of those can be hard but
when it doesn't work it usually boils down to one of those classic issues.
…On Tue, 9 Aug 2022 at 15:22, David Huggins-Daines ***@***.***> wrote:
Ah, good to know. The C API certainly won't go away... my plan is to
integrate the WebRTC VAD code since it's the standard and its licence is
compatible. The problem is that pocketsphinx_continuous existed as example
code which never really worked well, but worked *enough* that people
tried to build things with it, and then instead of doing live ASR
correctly, it was decided to just keep hacking on the existing toy code.
Coqui has a lot of good examples of, in my opinion, the right way to do
streaming ASR: https://github.com/coqui-ai/STT-examples.
For Java is it preferable to use SWIG or just JNI directly? I removed the
SWIG code because SWIG it was too difficult to make a good Python API with
it, and other languages like Ruby weren't actually using it. Originally the
SWIG wrapper was just there to support Java on Android. I certainly won't
support anything Java as I'm already spending too much of my time on
Another long-standing problem is that the API isn't really designed
correctly for callbacks. This is one of the reasons why I removed the audio
code, as it was based around the thoroughly obsolete assumption that one
gets audio by opening /dev/audio and doing blocking read() calls on it.
—
Reply to this email directly, view it on GitHub
<#275 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALY3LWIHX6GBALPHYWQRXZTVYJSR5ANCNFSM555XCX3A>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Actually now that I think of it the preferred option for the microphone on Unix and possibly also Windows is just be to popen() sox, as it is nearly always there, usually works, and can do various other things too. |
David, that is a great idea, as people can modify sox as needed, or replace
it with a similarly behaving executable.
If I understood you to say you plan to work on streaming with WebRTC, this
has been the best option for years:
https://www.npmjs.com/package/audio-recorder-polyfill
Best regards,
Jim
…On Tue, Aug 9, 2022, 1:23 PM David Huggins-Daines ***@***.***> wrote:
Actually now that I think of it the preferred option for the microphone on
Unix and possibly also Windows is just be to popen() sox, as it is nearly
always there, usually works, and can do various other things too.
—
Reply to this email directly, view it on GitHub
<#275 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZ4RVHYCZ2VERFJ4GC5MOLVYK43BANCNFSM555XCX3A>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Vis a vis instructions, an example on how to run pocketsphinx.exe in "live" mode (presumably a microphone though I have no idea why the word microphone doesn't seem to appear anywhere in the code or documentation) would be useful including the command line parameters necessary to specify the lm and hmm... The huge number of command line switches are rather daunting too. The bare minimum (language model and ancillary files) would be helpful. Thanks |
Most of the command line switches are not useful to you, and I think this is mentioned in the documentation, but I will mention it quite a lot louder :-) Microphone input is not an easy thing, and a lot of trouble came from giving people the impression that it was. The Python module makes everything quite simple in any case:
|
And as mentioned in the other issue, ask yourself the question: do I really want a command-line executable written in C that does live speech recognition from a microphone, on Windows? Please let me know if this is actually a useful thing. I suspect it isn't. |
Well, you already know my opinion :-) although i might be the only one on the planet who does...LOL Cheers! |
Actually you're not the only one! But what you need, if I'm not mistaken, is what It seems that The original |
The example using PortAudio can be seen here: https://github.com/cmusphinx/pocketsphinx/blob/live_examples/examples/live_portaudio.c |
Thanks again! I will check out portaudio and see if I can use that instead. The phrase "quite simple to implement" is a very nice thing to see 👍 ! And I look forward to the example for ad_win32.c although it's probably what I use now. And, yes, pocketsphinx_continuous is where I got the guts of my code for our app... |
The |
It turns out sox is not as platform independent as one would wish, even on
nux. David we will need, on the font page, a link to setting up a
microphone on a list of platforms. I can contribute the raspberry pi code
and instructions. - shared interface for your code of course.
…On Thu, 20 Oct 2022, 21:57 David Huggins-Daines, ***@***.***> wrote:
The ad_win32.c code actually has a number of problems and can be
simplified for the new live speech API... particularly if it doesn't have
to be fit into an existing framework. This is one of the reasons I removed
the libsphinxad library, PortAudio or OpenAL do a better job of being a
cross-platform library, so if you are targeting a particular platform it's
probably better to go straight to the platform's API.
—
Reply to this email directly, view it on GitHub
<#275 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALY3LWLRT7TLIMQMSNETQSDWEGW4NANCNFSM555XCX3A>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
There are now examples for portaudio, pulseaudio, and Win32 wave input, see #319 I will however leave this issue open as we can always use more examples! |
The instructions at:
https://cmusphinx.github.io/wiki/tutorialpocketsphinx
are now past their use by date. the README.md is fine on linux but for those of us who know what a lib file is could we have a hello_ps.c please. Using c again reminds me why we all switched to java back in the dark ages...
The text was updated successfully, but these errors were encountered: