Is voice tech set to change the way we work?
We’ll send you a myFT Daily Digest email rounding up the latest Artificial intelligence news every morning.
In HBO’s satirical political comedy Veep, vice-president Selina Meyer is shadowed by an aide called Gary. He is ever present at her side to offer anything she might need at any given moment, from a tube of lipstick to a snack to a biographical nugget about the person she is about to meet.
Having a “body man” to remember when you last spoke to someone, what their spouse’s or child’s name is, or other vital contexts for your conversation is a privilege generally reserved for top politicians and business leaders. But advances in technology are raising hopes that in time, artificial intelligence-powered, voice-enabled assistants will make it possible for anyone in a company to have their own “digital Gary” — someone to manage their schedule, remind them of tasks, take notes and share information at the right moment.
Amy Webb is a futurist and founder of the Future Today Institute, which helps leaders look ahead and anticipate change. Also a professor of strategic foresight at New York University’s Stern School of Business, she recalls an early demonstration of a voice-powered app from a start-up called MindMeld.
“Imagine you are sitting in a meeting with colleagues talking about the yield curve inverting, but not everyone knows what the yield curve is. The app recognises who is talking and starts feeding a social media-style wall, populating it with the topics or concepts discussed in real time. It is like having Gary whispering in your ear,” she says.
MindMeld was acquired by Cisco Systems in 2017 and now offers an open-source “conversational AI” platform for building chatbots. The network equipment maker is among the Silicon Valley companies trying to bring voice-based AI into the workplace. The promise for enterprises is to make workers more productive and efficient by giving them easy access to information and quantifying the contents of phone calls and meetings.
“We look at these phone conversations and spoken words in business meetings — those are the last offline data sets — and the most important conversations you have are in person or over the phone,” says Craig Walker, a co-founder of Google Voice, the technology group’s digital telephone service.
Employers also see voice interfaces as an important recruitment tool to attract and retain younger workers who may already use smart speakers or a phone-based digital assistant.
“Millennials and other folks are used to a certain experience at home. I can use Siri or Google Assistant or an Amazon Echo device to change the thermostat in my bedroom, play music or see who is at my front door,” says Vinay Goel, chief digital product officer at JLL, the property services group. JLL has launched its own voice-AI assistant, called JiLL (see panel), to help employees arrange meetings, book conference rooms and find out other information about their offices using voice or text conversations.
Some of the biggest names in technology are also betting that enterprises will follow consumers in shifting to voice-powered interfaces. Amazon has rolled out Alexa for Business; customers include General Electric, which has built an app that allows workers at factories to locate parts by asking a smart speaker. Salesforce’s voice assistant Einstein can pull up data in a meeting and give users a customised daily “briefing” of meetings and tasks.
Gartner, the technology research group, estimates that by 2023 a quarter of worker interactions with software will be mediated by voice, up from less than 3 per cent this year. It also predicts 25 per cent of employees who do digital work will use a virtual assistant on a daily basis by 2021.
But for all the excitement over the possibilities for voice-based computer interactions in the office, the applications currently available are limited, and companies say it is too early to predict how widely voice technology will change the nature of work. “When we start talking about voice tech in the workplace, there are more questions than answers,” says Becky Linahon, director of marketing at TetraVX, a cloud communications provider.
Early adopters include customer service and sales staff who conduct scripted conversations over the phone. Walker has raised $120m from investors, including Google parent Alphabet and venture capital firm Andreessen Horowitz, for his phone software start-up, Dialpad. In addition to corporate phone systems and conference calling, Dialpad offers “voice intelligence” services aimed at support and sales representatives.
Using speech recognition, Dialpad provides transcripts of conversations within half a second. Its software can suggest answers to questions — what Walker describes as “surfacing relevant information into the call”. He adds that, for example, “we can also notice that you’ve been on the call for 10 minutes and no one has gone over the agenda, and suggest you do so”.
Dialpad also analyses sentiment, based on the language being used during a call, so that supervisors can identify which employees may need coaching. Walker says he hopes to expand Dialpad’s offerings to in-person meetings and to business areas such as finance, legal, human resources and recruiting.
He anticipates that in time the AI software will coach employees directly. “It would say, here was your opening pitch, your ask, your rapport building. It would grade each one and tell you how you could have been better, and pull up snippets from other calls as examples.”
The science of deep conversation
Voice technologies such as JLL’s app and Dialpad’s voice intelligence rely on artificial intelligence disciplines such as natural language processing and machine learning that enable computers to recognise spoken words, parse meanings and context, and identify relevant information to generate responses.
The complexity of making it possible for a computer to “converse” with a human “depends on the level of depth of conversations you want to enable”, says JLL’s Vinay Goel. “If you want to interact in a very specific way, you have a predictable set of sentences and voice constructs to work with. With more open-ended queries, things get complicated.”
Employees at JLL can use the current form of the JiLL app to book meetings, reserve desks and check cafeteria menus. JLL has emphasised the quality of features over quantity. “We want to make sure the use cases we’re solving for, we can solve really well,” says Goel.
One area JLL is working to improve is name recognition, which comes up when employees try to book meetings. “With ‘non-standard’ names, speech-to-text and text-to-speech [capabilities] are not great,” admits Goel.
The company does not monitor audio but does check logs of when people are unable to complete a task in the app. “We are looking for meta-level pointers to what is going wrong. We don’t optimise one name at a time,” Goel says.
Sceptics warn wider adoption of the technology may be hampered by privacy concerns. “[Privacy] is going to be one of the biggest pushbacks we’re going to get and why voice tech is not going to take off in the workplace the way some people might expect,” says Linahon of TetraVX.
According to Linahon, companies will have to consider the following sorts of questions. “Where is that data stored? Wouldn’t it be great at the end of a call to get a transcript, but what if there are personal details in that? Do I feel comfortable that you can now recognise my voice? It also boils down to, what are we willing to collect about our employees and what are we not?”
Companies will probably have to provide greater disclosure about when and how they are using voice technology. Think of those notifications saying customer service calls will be recorded and monitored for training — on every call and in every meeting.
But privacy concerns extend beyond disclosure, says Prof Webb. With recent revelations of how Amazon, Microsoft and Facebook, among others, have people listening to audio clips from their devices and services, companies may be even more wary about embracing voice apps and smart speakers.
“The problem is that no matter what the promise of the technology is on the enterprise side, now we have deeply seated in our minds: who is listening in?” she says. “Gaining consumer confidence is one thing, but getting past a Fortune 500 company’s risk and compliance department is a whole other thing.”
JLL’s Goel acknowledges there are hurdles to overcome before many of his customers will be willing to try out JiLL. “Many may not have an internal policy on deploying voice apps,” he says.
JLL is its own first customer. It is using the app at its Chicago headquarters and offices in Silicon Valley before starting pilots with clients such as Procter & Gamble, the consumer goods group, later this year. “We’re eating our own internal dog food,” Goel says.