In the previous post I described my experiments around building an intelligent artificial personal assistant – Lizzie. The pseudo intelligent agents available today around us (Siri, Cortana or Google Next) are all great feats of engineering given the fact that they reside on small devices like mobile phones and are able to do powerful things like natural language processing (obviously acting on the commands that we give is relatively easy because computers have been doing that for a long time). Speech recognition in itself is a big deal and today small machines like a raspberry pi or mobile phones are able to do it in a speaker agnostic manner. Natural language processing from the recognized speech is the next step where the machine needs to be able to identify:
- Context: Dialog without context is meaningless, imagine a situation where you are talking about dinner with your wife and she starts talking about jewelry, won’t that be annoying?
- Intent: Once a context is identified the next step is to find out the intent, what do you want to happen in the context is what intent defines. For example if you are talking about hotel booking: what would you like to do? search hotels, make a booking, check room availability in a particular hotel or cancel a booking you have already made and so on.
- Entities: Identifying the intent is not enough, actionable information is required to act on the intent, continuing the previous example: if you want to book a hotel – the agent would need details like dates, place, room type (or your budget may be), number of people etc. So extracting entities in a meaningful way is the next challenge. It can be quite difficult – for example how do you distinguish between park avenue and tenth avenue (one is a brand while other is an address and the first one can be an address as well depending on the neighborhood – this is where data and statistical modeling comes into picture – more on this below).
- Followup: After action follow ups provide a way for the agent to keep learning and growing apart from being more natural because that’s how humans act. For example when a waiter in a restaurant serves you they also followup later to see if you need anything else or how was the food. After action follow ups also provide data for future (for example they can be used to study the patterns of user likes etc.) and the agent after enough data collection can start performing some actions automatically which it has learned from followup (for example ordering/not ordering pop corns when booking movie tickets).
Doing NLP (natural language processing) on small machines like phones or raspberry pi is not possible today because:
- NLP techniques that we have today are based on statistical models that require huge data sets to calculate different probabilities and come up with a result set.
- The number of calculations happening in previous step would not be possible in reasonable time on these devices (you would not want to have the agent acting on your command after a month).
The NLP services available on cloud like api.ai or wit.ai or Google cloud NLP API are the practical solution to the NLP challenge facing us on these small devices, considering the fact that speech recognition is done locally on the device we only need to stream the recognized text over to cloud for processing. The data transfer rates and internet bandwidth are not a problem today so it’s possible to build an agent like Lizzie on a Raspberry Pi.
Will come back soon with details about an experiment that I did in terms of talking to a car’s ECU from a Raspberry pi using a Bluetooth serial connection.
Leave a Reply