OpenAI advances responsible voice cloning technology

Early applications of Voice Engine

The underlying AI model powering Voice Engine has been utilized in various OpenAI products, including the ChatGPT chatbot and the text-to-speech API. Although details about the training data are guarded, it is sourced from a mix of licensed and publicly available data.

Voice Engine differs from other voice cloning products as it does not fine-tune on user data. Instead, it utilizes a combination of a diffusion process and transformer to generate speech from small audio samples and text data. This approach ensures privacy as the audio used is discarded after each request. A few early examples include:

Translating content, like videos and podcasts, so creators and businesses can reach more people around the world, fluently and in their own voices. One early adopter of this is HeyGen, an AI visual storytelling platform that works with their enterprise customers to create custom, human-like avatars for a variety of content, from product marketing to sales demos.

Reference Audio

Generated audio – French

Generated audio – Japanese

Generated audio – Spanish

Providing reading assistance to non-readers and children through natural-sounding, emotive voices representing a wider range of speakers than what’s possible with preset voices. Age of Learning, an education technology company dedicated to the academic success of children, has been using this to generate pre-scripted voice-over content.

Reference Audio

Generated audio – Physics

Generated audio – Chemistry

Generated audio – Biology

Reaching global communities, by improving essential service delivery in remote settings. Dimagi is building tools for community health workers to provide a variety of essential services, such as counseling for breastfeeding mothers.

Supporting people who are non-verbal, such as therapeutic applications for individuals with conditions that affect speech and educational enhancements for those with learning needs. Livox, an AI alternative communication app, powers Augmentative & Alternative Communication (AAC) devices that enable people with disabilities to communicate.

Reference Audio

Generated audio – English

Generated audio – Portugese

Helping patients recover their voice, for those suffering from sudden or degenerative speech conditions. The Norman Prince Neurosciences Institute at Lifespan, a not-for-profit health system that serves as the primary teaching affiliate of Brown University’s medical school, is exploring uses of AI in clinical contexts. They’ve been piloting a program offering Voice Engine to individuals with oncologic or neurologic etiologies for speech impairment.

While voice cloning technology has existed for some time, OpenAI claims its approach offers higher-quality speech. Additionally, Voice Engine is priced competitively, offering synthetic voices at rates lower than some competitors.

OpenAI introduces Sora: A text-to-video GenAI model

Ethics and deepfakes

However, concerns arise regarding the potential impact on the voice actor industry. OpenAI’s technology could commoditize voice work, potentially displacing human voice actors. Although OpenAI requires explicit consent from individuals whose voices are cloned, questions remain about the broader implications for the industry.

Despite advancements in voice cloning technology, ethical concerns persist. Instances of misuse, such as generating fake voices for malicious purposes, highlight the need for responsible deployment. OpenAI is taking steps to address potential misuse by limiting access to Voice Engine and prioritizing socially beneficial use cases. Those steps include;

Phasing out voice based authentication as a security measure for accessing bank accounts and other sensitive information
Exploring policies to protect the use of individuals’ voices in AI
Educating the public in understanding the capabilities and limitations of AI technologies, including the possibility of deceptive AI content
Accelerating the development and adoption of techniques for tracking the origin of audiovisual content, so it’s always clear when you’re interacting with a real person or with an AI

Looking ahead, OpenAI plans to evaluate the preview release of Voice Engine and consider its wider release based on feedback and safety considerations. The company remains committed to advancing voice technology responsibly, ensuring clarity between artificial and human voices.