This post was published in Chatbots Magazine: Your first Google Assistant skill.
Smart home speakers, assistant platforms and cross-device solutions, so you can talk to your smartwatch and see the result on your TV or carās dashboard. Personal assistants and VUIs are slowly appearing around us and itās pretty likely that they will make our lives much easier.
Because of my great faith that natural language will be the next human-machine interface, I decided to start writing new blog posts series and building an open source code where I would like to show how to create new kind of apps: conversational oriented, device-independent assistant skills which will give us freedom in platform or hardware we use.
And will bring the most natural interface for humans –Ā voice.
This post is a part of series about building personal assistant app, designed for voice as a primary user interface. More posts in series:
- This post
- Personalize Google Assistant skill with user data
- Surface capabilities in Google Assistant skills
WaterLog assistant skill
In this post weāll start with the simples implementation of assistant skill.Ā WaterLogĀ is an app which lets us track daily water intake by talking or writing in natural language directly to Google Assistant. First version of the app will have ability to log how much liters or milliliters of water we have drunk during the day.
For theĀ sakeĀ of simplicity weāll skip theory behind VUI design focus only on technicalĀ aspectsĀ of how to build fully working implementation.
Here are scenarios ofĀ possibleĀ conversations (happy paths):
New user
User:Ā Ok Google, Talk to WaterLog
WaterLog:Ā Hey! Welcome to Water Log. Do you know that you should drink about 3 liters of water each day to stay healthy? How much did you drink so far?
User:Ā I drunk 500ml of water
WaterLog:Ā Ok, Iāve added 500ml of water to your daily log. In sum you have drunk 500ml today. Let me know when you drink more! See you later.
Returning user
User:Ā Ok Google, Talk to WaterLog
WaterLog:Ā Hey! You have drunk 500ml today. How much water should I add now?
User:Ā 100ml
WaterLog:Ā Ok, Iāve added 100ml of water to your daily log. In sum you have drunk 600ml today. Let me know when you drink more! See you later.
Returning user asking for loggedĀ water
User:Ā Ok Google, Ask WaterLog how much water have I drunk today?
WaterLog:Ā In sum you have drunk 600ml today. Let me know when you drink more! See you later.
In case you would like to test this skill on your device, itās available live in Google Assistant directory, or on website:
https://assistant.google.com/services/a/id/12872514ba525cc6/
Getting started
The app is extremely simple but even this kind of project still requires to tie some pieces together to make it working. While we have a lot of freedom when it comes to platform selection (we could build our app in many different languages and host it on any cloud solutions like Google Cloud Platform or Amazon Web Services), at the beginning we choose the most recommended tech stack:
- FirebaseĀ Cloud Functions and Realtime Database for app backend,
- DialogflowĀ for conversation definitions and natural language understanding,
- JavaScript/Node.jsĀ for app code (at this moment this is the only supported language by Firebase Cloud Functions),
- Google Actions SDKĀ for Google Assistant integration (in the future we would give a try to other platforms like Amazon Alexa or Facebook Messenger Platform).
I wonātĀ writeĀ detailed explanation how to connect all of those together.Ā Google Actions websiteĀ has really good step-by-step guide how to do this:
https://developers.google.com/actions/dialogflow/first-app
In short:
- Start with new project onĀ Actions on Google console.
- When itās done, you will be asked to pick a tool or platform to build assistant skill. Like I said, itāll beĀ Dialogflow. If you do it right, your apps (Actions and Dialogflow) should be connected. You can check this in Dialogflow agent setting (see Google Project property):
Dialogflow agent
First big piece of our assistant app is conversational agent, which is built on Dialogflow platform in our case. The most important role of it is to understand what user says to our app and convert natural language sentence into actions and properties which can be handled by our code. And this is exactly what Dialogflow Intents do.
According to theĀ documentation:
AnĀ intentĀ represents a mapping between what a user says and what action should be taken by your software.
Letās start defining our intents. Here are the list of sentences which we would like to handle:
Default FallbackĀ Intent
The only one which we leave untouched for now. Like the name says, this intent isĀ triggeredĀ if a userās input is not matched by any of the regular intents or enabled domains.Ā Documentation. Itās worth mentioning that this intent isnāt even passed into our application code. Itās entirely handled by Dialogflow platform.
welcome_user
Event used to greet our user. Itās used always when user ask for our app (e.g.Ā Ok Google, talk to WaterLog) without any additional intention.
ā Config ā
Action name:Ā input.welcome
Events:Ā WELCOME
,Ā GOOGLE_ASSISTANT_WELCOME
ā events are additional mappings which allow to invoke intents by an event name instead of a user query.
Fulfillment:Ā ā
Use webhook
ā Intent welcome_user will be passed to our backend.
log_water
Event is used to save how much water user would like to log during the conversation. There will be a couple cases which we would like to handle in the same way. Letās list some of them:
- Ok Google, Talk to WaterLog to log 1 liter of water ā intent is triggered immediately when user invoke our action. In this case welcome intent is skipped. More about assistant invocation can be found in Google ActionsĀ documentation.
- Log 500ml of water ā told in the middle of conversation, when app is waiting for userās input.
- 500ml ā usually as an answer for assistant question:
WaterLog:Ā ā¦how much water did you drink today?
User:Ā 500ml
To handle similar cases we need to provide example utterances which could be told by users. Examples then are used by Dialogflow Machine Learning to teach our agent to understand user input. The more examples we use, the smarter our agent becomes.
Additionally we need to annotate fragments of our examples which needs to be handled in special way, so e.g. our app knows that utterance:
I have drunkĀ 500mlĀ of water
contains number and units of volume of water that has been drunk. All we have to do is to select fragment and pick correct entity (there are plenty of built-in entities, see theĀ documentation).
ā Config ā
Action name:Ā log_water
User says (should be much more examples, esp. in more complex apps):
Fulfillment:Ā ā
Use webhook
Google assistant:
Ā ā
End conversation
ā pick this to let Google Assistant know that conversation should be finished here.
get_logged_water
Event used to user how much water he or she has drunk in current day. Similarly to log_water, there different ways to invoke this intent:
- Ok Google, ask WaterLog how much water did I drink today? ā called instead of welcome intent when the action is known,
- How much did I drink? ā asked in the middle of conversation with our app.
ā Config ā
Action name:Ā get_logged_water
User says:
Fulfillment:Ā ā
Use webhook
Google assistant:
Ā ā
End conversation
And thatās it for Dialogflow configuration for now. If you would like to see full config, you can download it and import into you agent from theĀ repository(WaterLog.zipĀ file).
The code
If you followedĀ Actions on Google guideĀ (Build fulfillment), you should already have basic code structure, deployed fulfillment intoĀ Firebase Cloud FunctionsĀ and connected it with Dialogflow agent through fulfillment config.
Now letās build a code for WaterLog app. Repository with final implementation is available on Github:
https://github.com/frogermcs/WaterLog-assistant-app
Basically what we have to do is to handle all Intents defined in DialogFlow app. Weāll define them inĀ functions/assistant-actions.js
Ā file:
The core of our app isĀ index.jsĀ file which is also the implementation of HTTP Trigger for Firebase Cloud Function (an endpoint in short š):
In our Cloud Function we defined mapping of Intents into functions which need to be called as a fulfillment for conversation.
As an example letās seeĀ conversation.actionLogWater()
Ā (fulfillment forĀ log_waterĀ Intent):
Here is what happens:
- App is getting argument from utterance extracted by Dialogflow. For input:Ā LogĀ 500mlĀ of waterĀ weāll get an objectĀ
{āamountā:500,āunitā:āmlā}
Ā . - Through waterLog implementation app is saving this data into Firebase Realtime Database.
- At the end app is getting sum of logged water and pass it as a response into dialogflowApp object.Ā
tell()
Ā function responses to user and closes the conversation (docs).
See full code of Conversation class:Ā conversation.js.
The rest of code doesnāt do much more interesting things.Ā ConversationĀ class is responsible for handling user input.Ā WaterLogĀ saves and retrieves data from Firebase Realtime Database (about logged water).Ā UserManagerĀ adds some helpers for (anonymous) user handling.
Unit testing
While this paragraph isnāt directly connected with assistant apps or voice interfaces, I believe itās still extremely important in each kind of software we build. Just imagine that every time you change something in the code, you need to deploy function and start conversation with you app. In WaterLog app it was relatively simple (but still it took at least tens of deployments). In bigger apps it will be critical to have unit tests. It will speed up development time by order of magnitude.
All unit tests for our classes can be found underĀ functions/test/
Ā directory. Tests in this project arenāt extremely sophisticated (they useĀ sinon.jsĀ andĀ chailibraries without any additional extensions) they still helped a lot with going to production in relatively short time.
Here is the output fromĀ $ npm test
Ā :
Source code
Full source code of WaterLog app with:
- Firebase Cloud Functions
- Dialogflow agent configuration
- Assets required for app distribution
can be found on Github:
https://github.com/frogermcs/WaterLog-assistant-app/
Thanks for reading! š
Further readings
- Guides at Actions on Google
- Ido GreenĀ publicationsĀ at Medium
- Building Google Assistant app in Java