Recently I have been busy building a personal assistant that I would be fitting in my car. Currently I am in experimentation mode and I am experimenting with speech capabilities. I would start with a description of my journey so far.
First let me show off a little bit with these videos that I created while experimenting.
Here is another one with grammar based speech recognition.
Lizzie (read Lizzie Hearts) would be the name of the personal assistant and I have planned to use a Raspberry Pi to accommodate her. I am using a Raspberry Pi 3 which is quite powerful considering the form factor and size. I started with installing Windows 10 IoT core because I have been primarily working on Microsoft Windows technologies for a while and the platform is quite comfortable both in terms of fundamentals and programming technologies. After experimenting for a while I moved to Linux (I tried several variants of Linux which include Raspbian, Cent OS, Ubuntu Core and Linux Mint), played with PocketSphinx and Sphinx 4 for offline speech recognition on Linux. While speech recognition using these was easy but the accuracy and most importantly dictation mode was lacking (not unless you are willing to build a large language model with very limited set of tools). I came back to Microsoft Windows 10 IoT core and started my experiments with a clear set of goals and functionality. The features that I am working on right now are:
- Emails
- Announce count of unread emails.
- Search emails using keywords or senders.
- Read emails aloud.
- Send/Reply emails.
- Accept/Reject meeting invites
- Calendar (from Google Calendar)
- Read schedule for today or any other day (past or future).
- Create an appointment.
- Create and send meeting invites.
- Cancel/Reschedule any appointment.
- Cancel/Reschedule any meeting and send updates.
- Navigation(using Google Maps and GPS from connected phone – by running a service on phone as a web server)
- Distance/Time calculation (using Distance Matrix API from Google Maps API)
- Show places on maps (using Google Maps Geocoding API and Google Maps Embed API/Google Static Maps API)
- Directions API (need to decide the approach but most probably would be using the GPS data from phone on a predefined interval and call the URL from java script and then update the markers and directions on the Map application).
- Log the path in file and then plot that on the Map using Roads API.
- Show vehicle statistics using ELM327 (OBD-II Bluetooth adapter)
- Speed:
- RPM
- Fuel pressure
- Engine Load
- Engine temperature
- Intake air temperature
- Throttle position
- Engine Oil temperature
- Engine Run-time
- Music player
- Play music
- Playlist support
- Music search
- Play streaming radio
Well this is a long list and I am really excited to explore the unknown. I have already experimented with speech recognition and synthesis part now the next step is to experiment on vehicle statistics, I have got a very simple PoC (proof of concept) with simulator already working and the next step is to formalize and organize the bits. I have already worked on other items like Mail, Calendar, Music and Maps so they would be relatively easy to integrate with the speech controlled framework.
One more last note, for doing NLP (natural language processing) and getting intents and parameters out of natural language commands I am using Api.Ai cloud platform (I thought of writing my own but then decided to make use of Api.Ai because the speech recognition happens locally on the device and only the text is sent to the server for processing and the results so far are amazing – see first video in case you have any doubts; so I thought why re-invent the wheel).
Keep watching this space for updates, hoping to be back soon.
Until next time…
Leave a Reply