A couple days ago Google published the 2017 summary of their voice-first solutions: Google Home (hardware) and Google Assistant (software). And it seems that the new way of how we interact with the technology knocks on our door. With “Google Home usage increased 9X this holiday season over last year’s”, and one Google Home Mini sold in each second since its premiere, it’s become clear that voice interfaces are slowly going out of an early adoption stage and they’ve begun to settle for good in our homes and minds.
But what is so revolutionary in VUIs and what are the real benefits of having voice-controlled devices around?
Smart home speakers, assistant platforms and cross-device solutions, so you can talk to your smartwatch and see the result on your TV or car’s dashboard. Personal assistants and VUIs are slowly appearing around us and it’s pretty likely that they will make our lives much easier. Because of my great faith that natural language will be the next human-machine interface, I decided to start writing new blog posts series and building an open source code where I would like to show how to create new kind of apps: conversational oriented, device-independent assistant skills which will give us freedom in platform or hardware we use.
And will bring the most natural interface for humans – voice.
This post is a part of series about building personal assistant app, designed for voice as a primary user interface. More posts in series:
There is a discovery in the field of AI, called Moravec’s paradox which tells that activities like abstract thinking and reasoning or skills classified as “hard” – engineering, maths or art are way easier to handle by machine than sensory or motor based unconscious activities.
It’s much easier to implement specialized computers to mimic adult human experts (professional chess or Go players, artists – painters or musicians) than building a machine with skills of 1-year old children with abilities to learn how to move around, recognize faces and voice or pay attention to interesting things. Easy problems are hard and require enormous computation resources, hard problems are easy and require very little computation.
Researchers look for the explanation in theory of evolution – our unconscious skills were developed and optimized during the natural selection process, over millions of years of evolution. And the “newer” skill is (like abstract thinking which appeared “only” hundreds thousands of years ago), the less time nature had to adjust our brains to handle it.
It’s not easy to interpret Moravec’s paradox. Some tell that it describes the future where machines will take jobs which require specialistic skills, making people serving an army of robotic chiefs and analysts. Others argue that paradox guarantees that AI will always need an assistance of people. Or, perhaps more correctly, people will use AI to improve those skills which aren’t as highly developed by nature.
For sure Moravec’s paradox proves one thing – the fact that we developed computer to beat human in Go or Chess doesn’t mean that General Artificial Intelligence is just around the corner. Yes, we are one step closer. But as long as AGI means for us “full copy of human intelligence”, over time it will be only harder.
As a technical people, we usually see AI solutions as a bunch of really smart algorithms operating on statistical models, doing nonlinear computations. In general something extremely abstract, what its roots in programming languages.
But, as “neural network” term may suggest, many of those solutions are inspired by biology, primarily biological brain.
Some time ago, DeepMind researchers published paper: Neuroscience-Inspired Artificial Intelligence, where they highlighted some AI techniques which directly or indirectly come from neuroscience. I will try to sum it up, but if you would like to read full version, it can be found under this link:
One of many definitions describes AI as hypothetical intelligence, created not by nature but artificially, in the engineering process. One of the goals of it is to create human-level, General Artificial Intelligence. Many people argue if such an intelligence is even possible, but there is one thing which proves it: it’s a human brain.
It seems natural that neuroscience is used as a guide or an inspiration for new types of architectures and algorithms. Biological computation very often works better than mathematical and logic-based methods, especially when it comes to cognitive functions.
Moreover, if current, still far-from-ideal AI techniques can be found as a core of brain functioning, it’s pretty likely that in some time in the future engineering effort pays off.
At the end, neuroscience can be also a good validation for existing AI solutions.
Voice interfaces are slowly showing up in our lives. In most cases, they replace complexity of mobile devices. But there are also new features which do make sense only when they are used just with our voice.
To see quick summary where Google is with its Google Assistant take a look at this video from last Google Developers Day (starting from 33:00):
May 17–19 — in that days at Shoreline Amphitheatre, Google organised the 9th edition of one of the biggest developer conference in the world — I/O17. This year I also had a great pleasure to spend that time in Mountain View with thousands of software engineers and techies from all over the globe. In this post I have collected all important announcements and experiences from this amazing 3-days festival. Just in case you would like to know what’s new coming in a world of new technology.
TL;DR If you are reading this on your mobile device and want to see really short recap of Google I/O 2017 Keynote, I encourage you to check my Medium Series:
Do you remember key announcements from I/O16? Probably the most important one was transition from mobile first to AI-first.
And here we are — one year later we have voice controlled device: Google Home, Google Assistant is a fact and Machine Learning is everywhere — also in our mobile devices, cars and smartwatches (literally doing some computation without accessing the cloud solutions). Again, during I/O17 Google’s CEO Sundar Pichai spent great time of Keynote showing the progress of AI revolution.
What does it bring to us? Even more intelligence in Google Photos like best shots selection or sharing suggestions (hey, your friend is on this photo, maybe you want to send it him?). But there will be also completely new level of vision understanding — Google Lens. Camera will be able to give more context (what kind of flower we are looking at) but also take an actions (automatically apply scanned WiFi password or book cinema tickets based on poster photography). Do you still receive paper bills? Soon whole payment process can be as quick as photo shot — Google Lens and Google Assistant (together with partners integrations) will do the rest for you.
More AI use cases: Gmail/Inbox smart replies (do you know that 12% of responses on mobile devices are sent in this way?), AutoML — neural networks to build other neural networks, so we can spend our time to think what to achieve, not how. Even more cloud AI services, partnerships and benefits for the best researchers. And new TPU, Google’s hardware to do cloud Machine Learning even faster.
Those and many other AI-initiatives finally have new home on the Internet: Google.ai.
Number of announcements and talks related to voice/text interfaces during I/O17 shows that it’s not only an app but completely new ecosystem which can work across all of your mobile devices. Google Assistant is available on Google Home, smartphones (including iPhones!), smartwatches, TVs and soon also in cars and any other pieces of hardware (thanks to Android Things).
Finally besides just talking, you can communicate with Assistant through text messages (so you can use it outside, in crowded places without looking weird 😉). And response will come in the best possible format — if the words are not enough, Assistant will use one of your screens (phone, TV, tablet) to display intended informations. So you can ask Google Home for direction to your favourite store and map will be sent to your mobile phone or car!
There is of course much more — hands-free calling, multilingual support, bluetooth, proactive notifications (Assistant will be able to back to you with informations in the best possible time, e.g. when your flight is delayed or traffic jam can make you late on scheduled meeting).
But probably the biggest announcement here is that Google Assistant now supports transactions and payments 💸.
Why is it so important? The numbers show that most of people really don’t like any kind of input fields — the biggest drop-offs happen while entering payment details or during sign-up process. And it makes sense, we would like to order food or send some money to friends without filling another form (because we already did this so many times in the past!). And this is where Assistant can help with, especially for first-timers. With a simple permission request it can provide all needed data (user or payment details) to our app or platform, so user won’t need to care about it anymore.
Even if we still don’t know if it’ll be Oreo or Oatmeal Cookie, there are a couple great things which are coming to our mobile devices. Full picture-in-picture support will give you possibility to keep video conversation while looking at calendar or notes. Even more intelligence — to improve battery, security and user experience. With Smart Selection your device knows the meaning of text so it’ll be able to suggest Maps for selected address or dialer for phone number. Who knows, maybe when this API is opened, you will be able to send some money to your friend just by selecting his name?
O comes with another great initiative — Android Go. With fast growing global market for mobile, especially in emerging countries, we need to have system which works well on slower devices and limited data connection.And Go is an adaptation of Android O to meet those requirements — to make our apps truly global.
Side note: if you are Web developer it can be also very interesting for you how Twitter built their Lite app (<1MB) in Progressive Web App technology.
A lot of great announcements for developers: official support for Kotlin (and not deprecating Java in any time in the future!), Android Studio 3.0focused on speed improvements (including build process) with better debugging features to make our work even easier. And if you are just starting to build world class app, don’t miss Architecture Components — guidelines and solutions for making our code robust and scalable.
AndInstant Apps — native lightweight apps which don’t require installation announced on I/O16 are now available for everyone to develop. Keep an eye on that — early partners report noticeable (double-digit) growth in purchases or content views.
For those who want to bring some intelligence directly to their apps, soon there will be TensorFlow Lite and Neural Network API — entire Machine Learning stack to make usage of our pre-trained models possibly the easiest.
Google also invests a lot of resources in Augmented and Virtual Reality. Besides new Daydream enabled devices, more apps and VR headsets there were two really interesting notes:
VPS — Visual Positioning System, Google’s extension to GPS for really precise indoor location. Thanks to Tango we’ll be able to locate inside buildings — with an addition of voice guidance it could be extremely helpful for visually-impaired people navigating through the world.
AR in education — the most exciting announcement: Tango will be actively used in education! After successful adaptation of Cardboards (more than 2 million students could take a part in virtual trips), Google announced continuation to their program —Expeditions AR (formerly Expeditions).
This time students will be able to gather around virtual objects placed in 3D space thanks to Tango enabled devices. If you want to see something very similar, I highly encourage to check one of the latest Tango integrations: ArtScience Museum in Singapore.
There is of course much, much more. During I/O17 we could see 2 keynotes (main one and keynote for developers) and take a part in more than 150 sessions. We could see, hear or even try new YouTube 360 videos, new updates in Firebase (Crashlytics as an official crash reporter, new tools to performance analysis), Cloud for IoT, Android Auto or new smartwatches with Android 2.0.
If you would like to see literally all of announcements from 3 days of I/O17, Google published blog post with everything covered in one place:
And if you want to immerse yourself in the details, videos from all 156 sessions are available now on YouTube!
Open source your code!
Besides all of these fascinating announcements, there is one particular takeaway which I’d like to share with you. More than 1 year ago Google open sourced TensorFlow. They shared years of their experience with everyone to make Machine Learning easier. After that time there are a lot of projects from people around the world, who solve real issues without having full knowledge about theory behind TensorFlow or ML. But they have great ideas (check this inspiring video below!). By giving them proper tools, we can make all these ideas happen.
That’s why you and your dev team should share the code. Because it’s really likely that somewhere there is someone who you could help with making the world a better place. We, the tech team behind Azimodo it.
Some time ago I published unofficial Google Actions SDK written in Java. Source code and documentation can be found on Github: Google-Actions-Java-SDK. Library can be also downloaded from Bintray jCenter: