Now we're talking: Multimodal voice and the future of digital CX
Collectively, we've been using voice all wrong for some time now.
That's according to Tobias Dengel, president of WillowTree, a TELUS International Company, in reference to the use of voice in the digital customer experience (CX) context. It's an important topic for the 20-year digital media veteran: Dengel recently wrote a book on the subject titled The Sound of the Future: The Coming Age of Voice Technology and describes his mission as helping brands create transformative digital products that are engaging for users.
The reason why voice hasn't transformed our relationship with machines and devices? "We haven't really understood what works well with voice and what doesn't," claims Dengel.
It's a fascinating phenomenon. While in the broader context the use of verbal communication comes naturally, there's a disconnect when it comes to technology. It feels like it should be so simple.
Despite the imperfections, there's clearly customer demand for effective voice implementations. According to Insider Intelligence, more than 123 million adults in the U.S. turned to voice assistants at least once per month in 2022, and over 48% of adults in the U.S. will use the technology on at least a monthly basis over the next three years.
To better those interactions and meet customer expectations, the future of voice needs to be multimodal.
What is multimodal voice?
Let's approach this literally.
Multi means more than one. Modal refers to a mode or a form. Multimodal is an adjective used to describe something as having multiple modes.
Bringing it into the digital CX context, multimodal voice is a type of interaction that involves voice and at least one other input or output method.
An example application of multimodal voice
Let's take the simple example of ordering two pizzas for your family.
Instead of spending several minutes going through all of the menu options on a mobile app to get the toppings just right, it's much more efficient to construct your order verbally via your favorite pizza app. For example: "Two large pizzas, one plain with thin crust and extra cheese, and one supreme, thick crust with extra onion, no mushrooms and cut into squares."
What we don't want is for the app to read that order back to us for confirmation.
The multimodal future is that we will be saying these words into our apps, and for the app to be building the order at the same time. We will then verbally "approve," and subsequently our pre-authenticated payment will go through and the pizza will be on its way. In this future, we've taken a tedious process and condensed it into 15 seconds or less.
The benefits of multimodal voice
Speed is a stand-out benefit of a multimodal approach.
A seminal study conducted by Stanford shows that people speak three times faster than we type. But while we can speak more quickly than we can type, on the receiving end we can take in information more quickly by reading than by listening.
According to Dengel, one of the main problems with voice implementation to date has been the failure to take advantage of speed. "Voice," he says, "should be leveraged with other modalities wherever it can create maximum efficiency throughout an experience."
Think of multimodal voice as a type of search function. Giving voice prompts is a quick way to navigate an interface that eliminates the need to type out your query. It's a hybrid solution that enables customers to blend voice, text and images across a single interaction. And, thanks to artificial intelligence, there are indications that this technology is set to improve and play an increasingly important role in the delivery of high-quality customer experience interactions.
In addition to speed, multimodal voice experiences create opportunities for more accessible and personalized interactions. Customers with permanent, temporary or situational limitations who may struggle with traditional input methods (such as typing or navigating a visual interface) can lean on voice technology for help.
Multimodal voice has overarching benefits to the user experience as well. When the approach is applied to digital user experience (UX) and user interfaces (UI), customers no longer have to manually search to find what they're looking for on apps or the web. If humans can speak to machines, and machines can respond with graphics and text, users can make better use of voice technology — and brands can provide satisfying and frictionless digital customer experiences.
Why multimodal voice has an important role in the future of CX
Minimizing customer effort is a fundamental principle in CX.
By introducing added levels of efficiency and accessibility, multimodal voice can reduce the amount of time and effort customers need to spend in order to get what they're looking for. Consider a customer support example: with multimodal voice, a customer could effortlessly ask a question to a chatbot with text to speech technology and receive a detailed response, with relevant images and links, in a chat dialogue.
It's an approach that enables customers to interact with brands according to their own preferences, when implemented effectively. The key is to look at voice as an enhancement, not a replacement. Do not force voice as an input method; introduce it as an optional input method to enhance customer experiences.
And, critically, present the option for customers to speak with a human agent when they feel the need to do so. By creating numerous barriers between your customers and your customer care team, you inadvertently raise customer effort and frustration. Instead, provide a painless way for customers to speak with an agent who will listen to them, empathize and respond with considered, comprehensive support.
Reduce customer effort with multimodal experiences
The case for voice is clear. In Dengel's own words, "We believe voice is the future."
If you're looking to step into that future and reduce customer effort with innovative voice implementations, connect with one of our experts today.