Surface Capabilities in Google Assistant Skills Adjust your conversation to audio and screen surfaces

This post was published in Chatbots Magazine: Surface Capabilities in Google Assistant Skills.

This post is a part of series about building the personal assistant app, designed for voice as a primary user interface. More posts in series:

  1. Your first Google Assistant skill
  2. Personalize Google Assistant skill with user data
  3. This post

In previous posts we have built Google Assistant skill which lets us track daily water intake by our voice or text written in natural language (post 1). Skill can be customised with user’s personal data – it can call us by the name and can use our timezone to define at what time our day starts (post 2).

https://assistant.google.com/services/a/id/12872514ba525cc6/

Surface capabilities

Google Assistant skills are not only Voice User Interfaces — our app can operate on a variety of surfaces like Google Home (audio-only) or mobile devices (audio + display).
Today we’ll customise our app‘s user experience for both cases. But before we’ll do this, let’s take a look at what our options are when it comes to surfaces capabilities support.

According to official documentation, there are a couple ways to handle different surfaces:

App-level surface capabilities

We can define upfront what surfaces are supported by our app. To do this open your Actions on Google developer project, then: Overview -> Surface capabilities.

If users try to invoke the app on an unsupported surface, they receive “device is unsupported” error.

Runtime-level surface capabilities

We can also deal with specific surfaces during runtime. There are a couple ways to do this:

  • Response branching — show response adjusted to the current surface, so you can reply with a simplified message on Google Home or with a rich card(s) including additional text and links on a mobile device.
  • Conversation branching — the entire conversation can look different depending on the current surface. This can be useful when we want to provide simplified flow for Google Home (e.g. repeat the last transaction) or completely custom one for a mobile device (e.g. find the cheapest flight to the destination and buy it).
  • Multi-surface conversations — sometimes we will need to move the user from one surface to another. For example, the user who asks Google Home for directions probably would like to see a map so it would be reasonable to do the transition from audio to display.

All surface capabilities and examples are well described in the official documentation:

https://developers.google.com/actions/assistant/surface-capabilities

Response branching in WaterLog

Now we’ll implement simple response branching in WaterLog app. We’ll add some facts about how much water we should drink during the day. Example scenario:

Audio-only surface

WaterLog: 
User: How much water should I drink?
WaterLog: According to The Telegraph, in climates such as the UK, we should be drinking around 1–2 litres of water. In hotter climates, the body will usually need more.
//End of conversation

Audio+display surface

WaterLog: 
User: How much water should I drink?
WaterLog: Here are some facts I found about drinking water:
// continue with rich card:

Dialogflow Agent

We’ll start with DialogFlow agent update. This time we need to add one new intent:

facts_drinking_water

This intent is fired when the user asks how much water we should drink, so e.g.:

  • How much water should I drink during the day? — asked during the conversation with WaterLog
  • Ask WaterLog why should I drink water? — asked from the main context of Google Assistant.

— Config —
Action name
facts_drinking_water
User says:

Fulfillment: ✅ Use webhook
Google assistant: ✅ End conversation


That’s it. Nothing new compared to the previous posts. If you would like to see full DialogFlow config, you can download it and import into your agent from the repository (WaterLog.zip file, tag: v0.2.1).

The code

Now let’s add some code. assistant-actions.js and index.js have a new mapping to handle:

The methodgetFactForDrinkingWater() is pretty straightforward:

At line 17 we can see how we can detect what kind of surface user is already using. According to the documentation, we are able to check if the screen is available (dialogflowApp.SurfaceCapabilities.SCREEN_OUTPUT) or audio(dialogflowApp.SurfaceCapabilities.AUDIO_OUTPUT).

Now when we have discovered user’s surface, we can build rich (display) or simplified (audio) response:

In our case for the rich response, we build Basic Card with title, text, button link, and image. But there are also other options like List Selector, Carousel Selector, Suggestion Chips. You can see all of them in the documentation.

Suggestion Chips

We could end this post here, but if we have knowledge about when user is using screen surface, why not use this somewhere else?
Let’s move back to our Conversation.js code:

Now thanks to GREETING_USER_SUGGESTION_CHIPS: [‘100ml’, ‘200ml’, ‘500ml’, ‘1L’] we can greet the user with actions suggestions:

xxx

And that’s it. Conversation with WaterLog is even easier now! 🙂

Unit tests

All clear 👍

$ npm test:

Source code

The full source code of WaterLog app with:

  • Firebase Cloud Functions
  • Dialogflow agent configuration
  • Assets required for app distribution

can be found on Github:

https://github.com/frogermcs/WaterLog-assistant-app/

The code described in this post can be found under release/tag v0.2.1.

Thanks for reading! 😊

Author: Mirek Stanek

Head of Mobile Development at Azimo. Artificial Intelligence adept 🤖. I dream big and do the code ✨.

One thought on “Surface Capabilities in Google Assistant Skills Adjust your conversation to audio and screen surfaces

Leave a Reply

Your email address will not be published. Required fields are marked *