For my Lab Day project, I worked on implementing speech recognition in a fully offline mode. This would detect a wakeword followed by a limited set of words for which we’ve trained detection models. Thus, we started experimenting with the Snowboy hotword detector, which includes all of those features.Snowboy’s hotword library has thousands of existing words that have been largely contributed to by users.
In addition, the ability to create new hotwords is relatively quick and seamless. One would simply have to record the word three times, test it, and the model file for the hotword is ready for download.
The model files are then used to continuously listen for the hotword as it compares it to the microphone input. For the use case at hand, we can create a wakeword as well as treat the finite list of commands as “hot phrases”, for example, “close garage door.” In this way, the system can be optimized to already know exactly what to expect as well as how the user will say it. This of course is a benefit computationally but also places a constraint on the user side (if the user says “close the garage door” instead of “close garage door,” for example). The user may want some freedom with making a request instead of having to memorize how to say each command every time. Having a speech-to-text engine will allow this flexibility, but since the commands are relatively simple, there would of course be unnecessary overhead. Thus, the time and space constraints of snow versus other software packages will need to be examined further. One limitation is that to build a general model for a word, the snowboy web portal needs to receive at least 500 samples from different user accounts (each saying the word 3 times.) We are interested in determining whether we can train generalized models by following Snowboy’s guidelines for offline training, and will be researching that, further.