The README needs a hello_world example #275

petercwallis · 2022-08-08T16:23:13Z

The instructions at:
https://cmusphinx.github.io/wiki/tutorialpocketsphinx
are now past their use by date. the README.md is fine on linux but for those of us who know what a lib file is could we have a hello_ps.c please. Using c again reminds me why we all switched to java back in the dark ages...

dhdaines · 2022-08-09T11:33:17Z

Hi! Thanks for pointing this out, as pocketsphinx_continuous.exe is, quite clearly, gone. And it was never useful for building applications in the first place.

I will fix this documentation as soon as possible, for now it will just be removed to avoid further confusion :)

dhdaines · 2022-08-09T11:34:04Z

In any case the preferred way to use the library will be through Python. Java is just as bad as C in my opinion ;-)

petercwallis · 2022-08-09T13:24:53Z

The python way is certainly the way things are going but having the c version of an API means us java people can write the wrapper fairly easily using JNI and the lib.so and get the speed advantages of C. The useful bits for key phrase spotting from my perspective are a recogniser.run() method, and then a "call-back" mechanism (listeners in java) that is registered with the object(?) doing the running. Another call-back function for silence (non speech) would be good, and, ultimately a "soundex <https://en.wikipedia.org/wiki/Soundex>" tokenizer for out of vocabulary? - I tried to convince the kaldi people of this but to no effect. Confusing "dog" with "god" is forgivable and needing clarification (or pragmatic disambiguation); confusing "dog" with "bus" is not sensible to us humans.

…

On Tue, 9 Aug 2022 at 12:34, David Huggins-Daines ***@***.***> wrote: In any case the preferred way to use the library will be through Python. Java is just as bad as C in my opinion ;-) — Reply to this email directly, view it on GitHub <#275 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALY3LWITUATVU42TJQEXPJTVYI63PANCNFSM555XCX3A> . You are receiving this because you authored the thread.Message ID: ***@***.***>

dhdaines · 2022-08-09T14:22:11Z

Ah, good to know. The C API certainly won't go away... my plan is to integrate the WebRTC VAD code since it's the standard and its licence is compatible. The problem is that pocketsphinx_continuous existed as example code which never really worked well, but worked enough that people tried to build things with it, and then instead of doing live ASR correctly, it was decided to just keep hacking on the existing toy code.

Coqui has a lot of good examples of, in my opinion, the right way to do streaming ASR: https://github.com/coqui-ai/STT-examples.

For Java is it preferable to use SWIG or just JNI directly? I removed the SWIG code because with SWIG it was too difficult to make a good Python API, and other languages like Ruby weren't actually using it. Originally the SWIG wrapper was just there to support Java on Android. I certainly won't support anything Java as I'm already spending too much of my time on PocketSphinx which I consider to be obsolete in general...

Another long-standing problem is that the API isn't really designed correctly for callbacks. This is one of the reasons why I removed the audio code, as it was based around the thoroughly obsolete assumption that one gets audio by opening /dev/audio and doing blocking read() calls on it.

petercwallis · 2022-08-09T15:54:26Z

Took a quick look; Coqui looks good to me (Java bit is deprecated). Java tried (and failed) to standardize how to connect to a mic. Today we seem to do it per OS, and just have our application check for the OS and grab the right executable (CMake on this linux box). The key I think is to have the API for C on each os be the same. The API may not want to be the same for each application programming language (my son is using PureData as I type. Ouch.) but can I suggest that for python..C# java etc the API exposes the C API, and then extends it in a python/C#/Java kind of way, the key being to maintain and expose the C API. My favourite example of this is the pigpio package <http://abyz.me.uk/rpi/pigpio/> for the pis. Joan looks after the C interface; others port the C to their preferred language. Documenting the C interface is key however ...if I could figure out how doxygen is meant to work :-/ WebRTC looks heavy weight to me "firebase" "ICE cadidates" and "rooms" all on the opening page of the intro. At the C level, what about registering a listener and then writing byte arrays to pocketSphinx_continuous? Leave it to us to figure out how to get a microphone to produce raw bytes in the code and writing them. A good feature of packages I have used is when they say "implement X. Instructions for this are available at Y. You can test your X by using this code... When that is working, connect your X to our Z by doing this ..." JNI is my preferred way of working - it is plodding, not exciting, and strict, but it never breaks. You have to do things in the right order (write the java first), and get the .so file in the right place, and the classpath needs to have the wrapper. Done. Both of those can be hard but when it doesn't work it usually boils down to one of those classic issues.

…

On Tue, 9 Aug 2022 at 15:22, David Huggins-Daines ***@***.***> wrote: Ah, good to know. The C API certainly won't go away... my plan is to integrate the WebRTC VAD code since it's the standard and its licence is compatible. The problem is that pocketsphinx_continuous existed as example code which never really worked well, but worked *enough* that people tried to build things with it, and then instead of doing live ASR correctly, it was decided to just keep hacking on the existing toy code. Coqui has a lot of good examples of, in my opinion, the right way to do streaming ASR: https://github.com/coqui-ai/STT-examples. For Java is it preferable to use SWIG or just JNI directly? I removed the SWIG code because SWIG it was too difficult to make a good Python API with it, and other languages like Ruby weren't actually using it. Originally the SWIG wrapper was just there to support Java on Android. I certainly won't support anything Java as I'm already spending too much of my time on Another long-standing problem is that the API isn't really designed correctly for callbacks. This is one of the reasons why I removed the audio code, as it was based around the thoroughly obsolete assumption that one gets audio by opening /dev/audio and doing blocking read() calls on it. — Reply to this email directly, view it on GitHub <#275 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALY3LWIHX6GBALPHYWQRXZTVYJSR5ANCNFSM555XCX3A> . You are receiving this because you authored the thread.Message ID: ***@***.***>

dhdaines · 2022-08-09T20:22:59Z

Actually now that I think of it the preferred option for the microphone on Unix and possibly also Windows is just be to popen() sox, as it is nearly always there, usually works, and can do various other things too.

jsalsman · 2022-10-11T08:24:25Z

David, that is a great idea, as people can modify sox as needed, or replace it with a similarly behaving executable. If I understood you to say you plan to work on streaming with WebRTC, this has been the best option for years: https://www.npmjs.com/package/audio-recorder-polyfill Best regards, Jim

…

On Tue, Aug 9, 2022, 1:23 PM David Huggins-Daines ***@***.***> wrote: Actually now that I think of it the preferred option for the microphone on Unix and possibly also Windows is just be to popen() sox, as it is nearly always there, usually works, and can do various other things too. — Reply to this email directly, view it on GitHub <#275 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZ4RVHYCZ2VERFJ4GC5MOLVYK43BANCNFSM555XCX3A> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

smbika007 · 2022-10-19T14:54:47Z

Vis a vis instructions, an example on how to run pocketsphinx.exe in "live" mode (presumably a microphone though I have no idea why the word microphone doesn't seem to appear anywhere in the code or documentation) would be useful including the command line parameters necessary to specify the lm and hmm...

The huge number of command line switches are rather daunting too. The bare minimum (language model and ancillary files) would be helpful.

Thanks

dhdaines · 2022-10-19T16:50:57Z

Most of the command line switches are not useful to you, and I think this is mentioned in the documentation, but I will mention it quite a lot louder :-)

Microphone input is not an easy thing, and a lot of trouble came from giving people the impression that it was. The Python module makes everything quite simple in any case:

from pocketsphinx import LiveSpeech
for phrase in LiveSpeech():
    print(phrase)

dhdaines · 2022-10-19T16:52:04Z

And as mentioned in the other issue, ask yourself the question: do I really want a command-line executable written in C that does live speech recognition from a microphone, on Windows?

Please let me know if this is actually a useful thing. I suspect it isn't.

smbika007 · 2022-10-20T19:18:58Z

And as mentioned in the other issue, ask yourself the question: do I really want a command-line executable written in C that does live speech recognition from a microphone, on Windows?

Please let me know if this is actually a useful thing. I suspect it isn't.

Well, you already know my opinion :-) although i might be the only one on the planet who does...LOL

Cheers!

dhdaines · 2022-10-20T19:45:32Z

Actually you're not the only one! But what you need, if I'm not mistaken, is what pocketsphinx_continuous was originally intended to be: example code which you can incorporate into your application.

It seems that sox doesn't do microphone input on Windows, either. PortAudio is a pretty good solution, and actually quite simple to implement. You can either include portaudio_static.lib and portaudio.h into your project directly, or you can "install" it somewhere, then set the CMAKE_PREFIX_PATH environment variable for your CMake build to point to that location. I'll publish some instructions on https://cmusphinx.github.io/ shortly. Perhaps we can add it as a git submodule (it isn't very big)

The original ad_win32.c code can also be used - it uses the oldest and most awful of the many awful (they are all awful) Windows audio APIs but doesn't require any external dependencies. I'll put together an example of it as well.

dhdaines · 2022-10-20T19:46:20Z

The example using PortAudio can be seen here: https://github.com/cmusphinx/pocketsphinx/blob/live_examples/examples/live_portaudio.c

smbika007 · 2022-10-20T20:51:49Z

Actually you're not the only one! But what you need, if I'm not mistaken, is what pocketsphinx_continuous was originally intended to be: example code which you can incorporate into your application.

It seems that sox doesn't do microphone input on Windows, either. PortAudio is a pretty good solution, and actually quite simple to implement. You can either include portaudio_static.lib and portaudio.h into your project directly, or you can "install" it somewhere, then set the CMAKE_PREFIX_PATH environment variable for your CMake build to point to that location. I'll publish some instructions on https://cmusphinx.github.io/ shortly. Perhaps we can add it as a git submodule (it isn't very big)

The original ad_win32.c code can also be used - it uses the oldest and most awful of the many awful (they are all awful) Windows audio APIs but doesn't require any external dependencies. I'll put together an example of it as well.

Thanks again! I will check out portaudio and see if I can use that instead. The phrase "quite simple to implement" is a very nice thing to see 👍 ! And I look forward to the example for ad_win32.c although it's probably what I use now. And, yes, pocketsphinx_continuous is where I got the guts of my code for our app...

dhdaines · 2022-10-20T20:57:32Z

The ad_win32.c code actually has a number of problems and can be simplified for the new live speech API... particularly if it doesn't have to be fit into an existing framework. This is one of the reasons I removed the libsphinxad library, PortAudio or OpenAL do a better job of being a cross-platform library, so if you are targeting a particular platform it's probably better to go straight to the platform's API.

petercwallis · 2022-10-21T07:35:56Z

It turns out sox is not as platform independent as one would wish, even on nux. David we will need, on the font page, a link to setting up a microphone on a list of platforms. I can contribute the raspberry pi code and instructions. - shared interface for your code of course.

…

On Thu, 20 Oct 2022, 21:57 David Huggins-Daines, ***@***.***> wrote: The ad_win32.c code actually has a number of problems and can be simplified for the new live speech API... particularly if it doesn't have to be fit into an existing framework. This is one of the reasons I removed the libsphinxad library, PortAudio or OpenAL do a better job of being a cross-platform library, so if you are targeting a particular platform it's probably better to go straight to the platform's API. — Reply to this email directly, view it on GitHub <#275 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALY3LWLRT7TLIMQMSNETQSDWEGW4NANCNFSM555XCX3A> . You are receiving this because you authored the thread.Message ID: ***@***.***>

dhdaines · 2022-10-22T00:56:35Z

There are now examples for portaudio, pulseaudio, and Win32 wave input, see #319

I will however leave this issue open as we can always use more examples!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The README needs a hello_world example #275

The README needs a hello_world example #275

petercwallis commented Aug 8, 2022

dhdaines commented Aug 9, 2022

dhdaines commented Aug 9, 2022

petercwallis commented Aug 9, 2022 via email

dhdaines commented Aug 9, 2022 •

edited

Loading

petercwallis commented Aug 9, 2022 via email

dhdaines commented Aug 9, 2022

jsalsman commented Oct 11, 2022 via email

smbika007 commented Oct 19, 2022

dhdaines commented Oct 19, 2022

dhdaines commented Oct 19, 2022

smbika007 commented Oct 20, 2022 •

edited

Loading

dhdaines commented Oct 20, 2022

dhdaines commented Oct 20, 2022

smbika007 commented Oct 20, 2022 •

edited

Loading

dhdaines commented Oct 20, 2022

petercwallis commented Oct 21, 2022 via email

dhdaines commented Oct 22, 2022

The README needs a hello_world example #275

The README needs a hello_world example #275

Comments

petercwallis commented Aug 8, 2022

dhdaines commented Aug 9, 2022

dhdaines commented Aug 9, 2022

petercwallis commented Aug 9, 2022 via email

dhdaines commented Aug 9, 2022 • edited Loading

petercwallis commented Aug 9, 2022 via email

dhdaines commented Aug 9, 2022

jsalsman commented Oct 11, 2022 via email

smbika007 commented Oct 19, 2022

dhdaines commented Oct 19, 2022

dhdaines commented Oct 19, 2022

smbika007 commented Oct 20, 2022 • edited Loading

dhdaines commented Oct 20, 2022

dhdaines commented Oct 20, 2022

smbika007 commented Oct 20, 2022 • edited Loading

dhdaines commented Oct 20, 2022

petercwallis commented Oct 21, 2022 via email

dhdaines commented Oct 22, 2022

dhdaines commented Aug 9, 2022 •

edited

Loading

smbika007 commented Oct 20, 2022 •

edited

Loading

smbika007 commented Oct 20, 2022 •

edited

Loading