background image
News
OpenAI previews new audio tool that can read text, mimic voices
Adam Shahab image
Adam Shahab
September 24, 2023 | 3 Minutes to read
OpenAI previews new audio tool that can read text, mimic voices iamge

Advancement in AI: OpenAI’s Voice Engine

OpenAI is sharing early results from a test for a feature that can read words aloud in a convincing human voice “- highlighting a new frontier for artificial intelligence and raising the specter of deepfake risks.

The company is sharing early demos and use cases from a small-scale preview of the text-to-speech model, called Voice Engine, which it has shared with about 10 developers so far, a spokesperson said. OpenAI decided against a wider rollout of the feature, which it briefed reporters on earlier this month.

A spokesperson for OpenAI said the company decided to scale back the release after receiving feedback from stakeholders such as policymakers, industry experts, educators and creatives. The company had initially planned to release the tool to as many as 100 developers through an application process, according to the earlier press briefing.

“We recognize that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year,” the company wrote in a blog post Friday. “We are engaging with US and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build.”

Other AI technology has already been used to fake voices in some contexts. In January, a bogus but realistic-sounding phone call purporting to be from President Joe Biden encouraged people in New Hampshire not to vote in the primaries “- an event that stoked AI fears ahead of critical global elections.

Unlike OpenAI’s previous efforts at generating audio content, Voice Engine can create speech that sounds like individual people, complete with their specific cadence and intonations. All the software needs is 15 seconds of recorded audio of a person speaking to recreate their voice.

During a demonstration of the tool, Bloomberg listened to a clip of OpenAI Chief Executive Officer Sam Altman briefly explaining the technology in a voice that sounded indistinguishable from his actual speech, but was entirely AI-generated.

“If you have the right audio setup, it’s basically a human-caliber voice,” said Jeff Harris, a product lead at OpenAI. “It’s a pretty impressive technical quality.” However, Harris said, “There’s obviously a lot of safety delicacy around the ability to really accurately mimic human speech.”

Before deciding whether to release the feature more broadly, OpenAI said it’s soliciting feedback from outside experts. “It’s important that people around the world understand where this technology is headed, whether we ultimately deploy it widely ourselves or not,” the company said in the blog post.

OpenAI also wrote that it hopes the preview of its software “motivates the need to bolster societal resilience” against the challenges brought about by more advanced AI technologies. For example, the company called on banks to phase out voice authentication as a security measure for accessing bank accounts and sensitive information. It’s also seeking public education about deceptive AI content and more development of techniques for detecting whether audio content is real or AI-generated.