A pair of Meta's glasses takes a photo when you say, “Hey, Meta, take a photo.” A miniature computer that clips onto your shirt, the Ai Pin, translates foreign languages into your native tongue. An artificially intelligent screen features a virtual assistant that you speak to via a microphone.
Last year, OpenAI updated its ChatGPT chatbot to respond with spoken words, and recently, Google introduced Gemini, a replacement for its voice assistant on Android phones.
Tech companies are betting on a renaissance of voice assistants, many years after most people decided that talking to computers wasn't cool.
Will it work this time? Maybe, but it might take a while.
Large swaths of people still have never used voice assistants like Amazon's Alexa, Apple's Siri, and Google's Assistant, and the vast majority of those who do said they never wanted to be seen talking to them in public, according to studies conducted in the last decade.
I also rarely use voice assistants, and in my recent experiment with Meta's glasses, which include a camera and speakers to provide information about the surrounding environment, I concluded that speaking to a computer in front of parents and their children at the zoo was still incredibly awkward.
I wondered if this would ever feel normal. Not so long ago, talking on the phone with Bluetooth headphones made people look weird, but now everyone does it. Will we ever see as many people walking around and talking on computers as in science fiction films?
I posed this question to design experts and researchers, and the consensus was clear: As new AI systems improve the ability of voice assistants to understand what we're saying and actually help us, we're likely to speak to devices more often in vicinity. future, but we still have many years to go before we do it in public.
Here's what to know.
Why voice assistants are getting smarter
The new voice assistants are powered by generative artificial intelligence, which uses complex statistics and algorithms to guess which words belong together, similar to the phone's autocomplete feature. This makes them better able to use context to understand requests and follow-up questions than virtual assistants like Siri and Alexa, which could only answer a finite list of questions.
For example, if you say to ChatGPT, “What are some flights from San Francisco to New York next week?” – and follow up with “What's the weather like there?” and “What should I pack?” – the chatbot can answer these questions because it creates connections between words to understand the context of the conversation. (Last year the New York Times sued OpenAI and its partner Microsoft for using copyrighted news articles without permission to train chatbots.)
An older voice assistant like Siri, which reacts to a database of commands and questions it has been programmed to understand, would fail unless you used specific words, including “What's the weather like in New York?” and “What should I pack for a trip to New York?”
The first conversation seems more fluid, like the way people talk to each other.
One of the main reasons people gave up on voice assistants like Siri and Alexa was that computers couldn't understand much of what they were asked, and it was difficult to figure out which questions worked.
Dimitra Vergyri, director of voice technology at SRI, the research lab behind the initial version of Siri before it was acquired by Apple, said generative AI addressed many of the problems that researchers have struggled with for years. The technology makes voice assistants able to understand spontaneous speech and respond with helpful answers, she said.
John Burkey, a former Apple engineer who worked on Siri in 2014 and has been an outspoken critic of the assistant, said he believes that because generative AI has made it easier for people to get help from computers, it is likely that many of us talk to them. assistants soon — and that when enough of us start doing it, it could become the norm.
“Siri was limited in size — it only knew a certain number of words,” he said. “You have better tools now.”
But it could be years before the new wave of AI assistants are widely adopted because they introduce new problems. Chatbots including ChatGPT, Google's Gemini, and Meta AI are prone to “hallucinations,” which is when they make things up because they can't find the correct answers. They made mistakes in basic tasks like counting and summarizing information from the web.
When voice assistants help and when they don't
Even if voice technology improves, talking is unlikely to replace or replace traditional computer interactions with a keyboard, experts say.
People currently have good reasons to talk to computers in some situations where they are alone, such as setting a destination on the map while driving a car. In public, however, not only can talking to an assistant make you seem strange, but more often than not, it's impractical. When I wore Meta glasses at a grocery store and asked them to identify a piece of produce, an eavesdropping shopper cheekily replied, “That's a turnip.”
You also don't want to dictate a confidential work email to others on a train. Likewise, it would be reckless to ask a voice assistant to read text messages aloud in a coffee shop.
“Technology solves a problem,” said Ted Selker, a product design veteran who has worked at IBM and Xerox PARC. “When do we solve problems and when do we create them?”
Yet it's easy to imagine moments when talking to a computer helps you so much that you won't care how strange it may seem to others, said Carolina Milanesi, an analyst at Creative Strategies, a research firm.
As you walk to your next office meeting, it would be helpful to ask a voice assistant to inform you about the people you were about to meet. While walking along a trail, asking a voice assistant where to turn would be faster than stopping to view a map. While you're visiting a museum, it would be nice if a voice assistant could give a history lesson about the painting you're looking at. Some of these applications are already being developed with new AI technology.
As I was testing some of the latest voice-controlled products, I caught a glimpse of that future. For example, when I was recording a video of me baking a loaf of bread and wearing the Meta glasses, it was helpful to be able to say, “Hey, Meta, shoot a video,” because my hands were full. And asking Humane's Ai Pin to dictate my to-do list was more convenient than stopping to stare at my phone screen.
“As you go, that's the sweet spot,” said Chris Schmandt, who has worked on voice interfaces for decades at the Massachusetts Institute of Technology Media Lab.
When he was an early adopter of one of the first cell phones about 35 years ago, people stared at him as he wandered the MIT campus talking on the phone. Now this is normal.
I am convinced that the day will come when people will occasionally talk to computers when they are out and about, but this will happen very slowly.