Unlike OpenAI’s previous efforts at generating audio content, Voice Engine can create speech that sounds like individual people, complete with their specific cadence and intonations. All the software needs is 15 seconds of recorded audio of a person speaking to recreate their voice.
During a demonstration of the tool, Bloomberg listened to a clip of OpenAI Chief Executive Officer Sam Altman briefly explaining the technology in a voice that sounded indistinguishable from his actual speech, but was entirely AI-generated.
“If you have the right audio setup, it’s basically a human-caliber voice,” said Jeff Harris, a product lead at OpenAI. “It’s a pretty impressive technical quality.” However, Harris said, “There’s obviously a lot of safety delicacy around the ability to really accurately mimic human speech.”
Before deciding whether to release the feature more broadly, OpenAI said it’s soliciting feedback from outside experts. “It’s important that people around the world understand where this technology is headed, whether we ultimately deploy it widely ourselves or not,” the company said in the blog post.
OpenAI also wrote that it hopes the preview of its software “motivates the need to bolster societal resilience” against the challenges brought about by more advanced AI technologies. For example, the company called on banks to phase out voice authentication as a security measure for accessing bank accounts and sensitive information. It’s also seeking public education about deceptive AI content and more development of techniques for detecting whether audio content is real or AI-generated.