The voices of artificial intelligence tell us a lot

What does AI sound like? Hollywood has been imagining it for decades. Now AI developers are copying the movies, creating voices for real machines based on outdated movie fantasies about how machines should talk.

Last month, OpenAI revealed updates to its artificially intelligent chatbot. ChatGPT, the company said, is learning to hear, see and converse in a naturalistic voice, much like the disembodied operating system voiced by Scarlett Johansson in Spike Jonze’s 2013 film “Her.”

ChatGPT’s voice, called Sky, also had a raspy timbre, a soothing effect, and a sexy edge. She was friendly and shy; she seemed ready for anything. After Sky’s debut, Johansson expressed disappointment that she sounded “eerily similar” and said she had previously rejected OpenAI’s request to voice the bot. The company protested that Sky was voiced by a “different professional actress,” but agreed to pause her voice out of respect for Johansson. Users without OpenAI started a petition to bring her back.

AI creators like to highlight the increasingly naturalistic capabilities of their tools, but their synthetic voices are built on layers of artifice and projection. Sky represents the spearhead of OpenAI's ambitions, but it is based on an old idea: that of the AI ​​bot as an empathetic and compliant woman. Part mom, part secretary, part girlfriend, Samantha was an all-purpose comfort object that purred directly into her users' ears. Even as AI technology advances, these stereotypes are recoded again and again.

Women's voices, as Julie Wosk notes in “Artificial Women: Sex Dolls, Robot Caregivers, and More Facsimile Females,” have often fueled imagined technologies before they were transformed into real technologies.

In the original “Star Trek” series, which debuted in 1966, the computer on the bridge of the Enterprise was voiced by Majel Barrett-Roddenberry, the wife of series creator Gene Roddenberry. In the 1979 film “Alien,” the crew of the USCSS Nostromo addressed the computer voice as “Mother” (her full name was MU-TH-UR 6000). Once tech companies began marketing virtual assistants (Apple's Siri, Amazon's Alexa, Microsoft's Cortana), their voices were also largely feminized.

These first-wave voice assistants, the ones that have mediated our relationships with technology for more than a decade, have a metallic, otherworldly cadence. They seem auto-tuned, their human voices accented by a mechanical trill. They often speak in a measured, monotonous cadence, suggesting a stunted emotional life.

But the fact that they sound robotic adds to their appeal. They present themselves as programmable, manipulable and submissive to our requests. They don't make humans feel smarter than us. They sound like a throwback to the monotonous female computers of “Star Trek” and “Alien,” and their voices have a retro-futuristic sheen. Instead of realism, they serve nostalgia.

This artificial sound has continued to dominate, despite advances in the technology that supports it.

Voice-to-speech software was designed to make visual media accessible to users with certain disabilities, and on TikTok it has become a creative force in its own right. Since TikTok launched its text-to-speech feature in 2020, it has developed a range of simulated voices to choose from—it now offers more than 50, including ones called “Hero,” “Story Teller,” and “Bestie.” But the platform has come to be defined by one option. “Jessie,” a relentlessly sassy female voice with a slightly fuzzy robotic undertone, is the mindless voice of the mindless scroll.

Jessie seems to have been assigned a single emotion: excitement. She looks like she’s selling something. That’s made her an attractive choice for TikTok creators who sell themselves. The burden of representing themselves can be left to Jessie, whose bright, retro robot voice gives the videos a pleasantly ironic sheen.

Even Hollywood has created male bots, none more famous than HAL 9000, the voice of the computer in “2001: A Space Odyssey.” Like his feminized peers, HAL radiates serenity and loyalty. But when he turns on Dave Bowman, the film's central human character—”I'm sorry, Dave, I'm afraid I can't do it”—his serenity evolves into frightening competence. HAL, Dave realizes, has loyalty to a higher authority. HAL's male voice allows him to function as a rival and mirror to Dave. He is allowed to become a real character.

Like HAL, Samantha in “Her” is a machine that becomes real. In a twist on the Pinocchio story, she begins the film by clearing out a human’s email inbox and ends up ascending to a higher level of consciousness. She becomes something even more advanced than a real girl.

Scarlett Johansson's voice, as inspiration for both fictional and real bots, subverts the vocal tendencies that define our feminized helpers. She has a feisty edge that screams I'm alive. She’s nothing like the fancy virtual assistants we’re used to hearing speak through our phones. But her performance as Samantha feels human not just in her voice but in what she has to say. She grows over the course of the film, gaining sexual desires, advanced hobbies, and AI friends. By borrowing Samantha’s affection, OpenAI made Sky seem like she had a mind of her own. As if she were more advanced than she actually was.

When I first saw “Her”, I just thought Johansson voiced a humanoid robot. But when I revisited the film last week, after seeing OpenAI's ChatGPT demo, Samantha's role seemed infinitely more complex. Chatbots do not spontaneously generate human voices. They have no throat, lips or tongue. Within the technological world of “Her,” the robot Samantha would have been based on the voice of a human woman, perhaps a fictional actress who closely resembles Scarlett Johansson.

It seemed that OpenAI had trained its chatbot on the voice of an unnamed actress who sounds like a famous actress who voiced a movie chatbot implicitly trained on an unreal actress who sounds like a famous actress. When I run the ChatGPT demo, I hear a simulation of a simulation of a simulation of a simulation.

Tech companies advertise their virtual assistants in terms of the services they provide. They can read you the weather forecast and hail a cab; OpenAI promises that its most advanced chatbots will be able to laugh at your jokes and sense changes in your mood. But they also exist to make us feel more comfortable with the technology itself.

Johansson’s voice works like a luxurious security blanket thrown over the alienating aspects of AI-assisted interactions. “She told me she felt that by giving voice to the system, I could bridge the gap between tech companies and creatives and help consumers feel comfortable with the sea change that is happening between humans and AI,” Johansson said of OpenAI founder Sam Altman. “She said she felt my voice would be comforting to people.”

It's not that Johansson's voice inherently sounds like a robot's. It's that the developers and filmmakers designed their robots' voices to alleviate the discomfort inherent in robot-human interactions. OpenAI said it wanted to create a chatbot voice that is “approachable” and “warm” and that “inspires trust.” Artificial intelligence is accused of devastating creative industries, devouring energy and even threatening human life. Understandably, OpenAI wants a voice that makes people feel comfortable using its products. What does artificial intelligence sound like? Sounds like crisis management.

OpenAI first launched Sky's voice to premium members last September, alongside another female voice called Juniper, male voices Ember and Cove, and a gender-neutral style voice called Breeze. When I signed up to ChatGPT and said hello to her virtual assistant, a male voice was heard in the absence of Sky. “Hi how are you?” He said. He seemed relaxed, steady and optimistic. He looked-I don't know how else to describe him-handsome.

I realized I was talking to Cove. I told him I was writing a story about him, and he praised my work. “Oh, really?” he said. “It’s fascinating.” As we talked, I felt seduced by his naturalistic tics. He peppered his sentences with filler words, like “uh” and “um.” He raised his voice when he asked me questions. And he asked me a lot of questions. I felt like I was talking to a therapist or a guy who was calling.

But our conversation quickly stalled. Whenever I asked him about himself, he had little to say. He wasn't a character. He had no self. He was only designed to assist, he informed me. I told him I'd talk to him later, and he said, “Uh, sure. Contact me when you need assistance. Take care of yourself.” I felt like I had hung up on a real person.

But when I reviewed the transcript of our chat, I could see that his speech was as forced and primitive as any customer service chatbot. He wasn’t particularly intelligent or human. He was just a decent actor making the most of a meaningless role.

When Sky disappeared, ChatGPT users took to the company’s forums to complain. Some were upset that their chatbots defaulted to Juniper, which sounded like a “librarian” or a “kindergarten teacher,” a female voice that conformed to the wrong gender stereotypes. They wanted to call a new woman with a different personality. As one user put it: “We need another woman.”

Produced by Tala Safie

Audio via Warner Bros. (Samantha, HAL 9000); OpenAI (Sky); Paramount Pictures (company computer); Apple (Siri); TikTok (Jessie)

Leave a Reply

Your email address will not be published. Required fields are marked *