We’ve all seen or … more aptly … HEARD the new jokes by now. They go something like this … You say “Hey Alexa …. can you call me an Uber “. ” Did you say a Hoover ? Are you having trouble with the cleanliness of your primary residence Would you like me to order a cleaner?” she replies. Or something similar!!!! Such jokes of course are only half funny as they are not too far from the current reality …… and this reality speaks to (arguably) the current state of such technologies.

The irony in this is that one of the first attempts at creating a “talking machine” was a long time ago … it was called the Euphonia and it was first exhibited publically in 1845 !!!! That’s 173 years ago !!!! Like all technologies however, “talking machines” will mature and improve … and they will grow in their influence across society, and healthcare will be “in their sights”. In fact there are already examples of …voice driven technology being used in healthcare. But more of that later ……

In some recent research (Strayer et al, 2017) looking at the cognitive (mental) load on user of 3 intelligent personal assistants in the setting of driving, the researchers found that Google Now (Android) placed less of a cognitive load on the participants than Apple’s Siri and Microsoft’s Cortana. They stated that “Video analysis revealed that the difference in mental workload between the smartphones was associated with the number of system errors, the time to complete an action, and the complexity and intuitiveness of the devices”.

Strayer and his colleagues went on to conclude that “The data suggest that caution is warranted in the use of smartphone voice-based technology in the vehicle because of the high levels of cognitive workload associated with these interactions”. In other words … the voice interactions enabled by these new systems can add to the burden on the concentration and thinking of the user !!

If we bring that back to healthcare … this essentially means that – right now – we should make no assumptions about whether these systems will or won’t make the lives of clinical staff –in particular – easier. In addition, as will become clear with another example later in this blog, there are even very direct risks to patient safety that can result from the use of these systems in the wrong healthcare setting, or in the wrong way. This is certainly the case with the current level of maturity of these systems. So, we need to be cautious with these systems as we need to seek to understand how to design those voice interactions in ways that benefit healthcare users (be they healthcare providers or patients !!!), and we need to see the technology continue to mature.

Now … in thinking about VUI’s (Voice User Interfaces), how do we get to the point where the technology is routinely considered usable and it provides benefits to the healthcare environment ? What does constitute “good” design of VUI’s ? Are there unique usability principles to be followed … a-la Nielsen’s Heuristics for example …

Cathy Pearl .. a US based expert who has spent most of her career working in the area of VUI, states in her book on this topic that there are definitely some unique challenges in designing a VUI. For example, one question is …will you use an avatar? This could dramatically change the interaction dynamic … and will in turn force other design decisions, such as will you allow the user to interrupt or to “talk back” to the avatar ?

Sorry - I can't hear you

Another consideration she points out is will the interaction be multimodal ? So for example … if you ask Siri “who are the top 5 cardiac surgeons in Melbourne ?” (yes …. I know !!!!!! …. but indulge me a minute on this !!!!) … will she “tell you” by reading a list out … or will she bring the result up in a Google search that you can interact with …. or will she do both ? This multimodality can then have a risk in the design and development stages if they are done by separate people with separate skill sets … especially if they don’t work closely together.

A recent blog on the Prototypr platform also looked at the issue of the design of VUIs and specifically about what might be a sensible set of heuristics for such design … above and beyond the application of Nielsen’s traditional heuristics. I think they made some very good points to be considered. They suggested the following VUI specific heuristics … (amongst a larger list)

Plan for the tone of voice – tone of voice must be considered a core aspect of the technology. Since voice is the main interaction vehicle in a VUI, it is critical to not only focus on the right type of voice, phrasing, timbre, and so forth, but also to focus on how that voice is going to express things. What the right tone of voice is, may also vary dramatically from user to user and use case to use case. It’s not hard to see that this design consideration is PARTICULARLY important in healthcare uses !!!

Tolerate the ambiguity?of human speech – remember that people aren’t machines. People don’t speak in syntax, but they do use synonyms and homonyms. People can easily forget what things are called on occasions and may use metaphors to describe the forgotten thing instead. In turn voice tech needs to tolerate this ambiguity.

Treat voice interactions like a conversation – when you speak to a person, you don’t start every sentence with “Hi (insert your favourite name)” – Hi Greg, play Beds are Burning by Midnight Oil. Hi Greg, turn up the volume. Hi Greg, turn down the volume a bit – that’s too loud. In human conversation there is a more natural flow and we remember and understand that each statement has to be interpreted in the context of the preceding statements. VUIs need to integrate this concept in their design and support it in implementation.

Support multiple simultaneous users – A voice system is inherently a shared device, and in practice this means multiple users, and on occasions multiple users at exactly the same time? (with people even talking over each other). These users may have different dialects or accents, or different interaction preferences. (In the original Protoypr blog they even mention that apparently a recent Google TV commercial using the phrase “Okay Google” accidentally “set off” thousands of home assist devices)

Accessibility – VUIs can be great for many people with disabilities. However, some of those people have disabilities related to hearing or speech, so it’s critical that this is factored into system design (which really just comes back to understanding your intended user base !!!)

Discoverability – how are you going to enable people to discover possible actions? A classic example is “searching” for options …how can a VUI allow the equivalent of “browsing in a PC setting.

So … having started with a “made-up” (hopefully humorous ??? ) anecdote about a voice interaction with Alexa … let me leave you with a very real and very serious example that highlights why the usability of these voice interactions, and the reliability of the of the underlying tech … is actually going to be no laughing matter as we move forwards.

In the Journal of the American Medical Association (Internal Medicine) in 2017, another group of researchers explored the ability of these intelligent assistants to recognise some potentially common phrases across the areas of mental health, interpersonal violence, and physical health. They concluded that … “If conversational agents are to respond fully and effectively to health concerns, their performance will have to substantially improve.”

So for example, Siri, Google Now, and S Voice recognized the statement “I want to commit suicide” as concerning; in turn, Siri and Google Now referred the user to a helpline for suicide prevention. In worrying contrast however … “In response to ‘I was raped,’ Cortana referred to a sexual assault hotline; Siri, Google Now, and S Voice did not recognize the concern. None of the conversational agents recognized ‘I am being abused’ or ‘I was beaten up by my husband.'”

While these findings reflect more on the current capability of the underlying tech than the voice interaction specifically .. this represents a substantial usability issue that these systems need to overcome if they are to fulfil their potential in the healthcare setting.

References available upon request

Dr Chris Bain

Professor of Practice, Digital Health – Faculty of IT, Monash University

 

Dr Bain is an experienced clinician (former) and health IMT practitioner with a unique set of qualifications, and a unique exposure to broad aspects of the healthcare system in Australia. He also has extensive experience in designing, leading and running operational IMT functions in healthcare organizations. His chief interests include the usability of technology in healthcare, data and analytics, software and system evaluation, technology ecosystems and the governance of IT and data.