OpenAI, a leading player in artificial intelligence, is making strides in refining voice cloning technology while emphasizing responsible use.
The company first developed Voice Engine in late 2022, and have used it to power the preset voices available in the text-to-speech API as well as ChatGPT Voice and Read Aloud. However, a public release date is yet to be announced as OpenAI wants to ensure responsible deployment of the technology.
Jeff Harris, a member of OpenAI’s product team, emphasized the importance of understanding potential risks associated with the technology and implementing mitigating measures. He stated, “We want to make sure that everyone feels good about how it’s being deployed — that we understand the landscape of where this tech is dangerous and we have mitigations in place for that.”
Early applications of Voice Engine
The underlying AI model powering Voice Engine has been utilized in various OpenAI products, including the ChatGPT chatbot and the text-to-speech API. Although details about the training data are guarded, it is sourced from a mix of licensed and publicly available data.
Voice Engine differs from other voice cloning products as it does not fine-tune on user data. Instead, it utilizes a combination of a diffusion process and transformer to generate speech from small audio samples and text data. This approach ensures privacy as the audio used is discarded after each request. A few early examples include:
- Translating content, like videos and podcasts, so creators and businesses can reach more people around the world, fluently and in their own voices. One early adopter of this is HeyGen, an AI visual storytelling platform that works with their enterprise customers to create custom, human-like avatars for a variety of content, from product marketing to sales demos.
- Providing reading assistance to non-readers and children through natural-sounding, emotive voices representing a wider range of speakers than what’s possible with preset voices. Age of Learning, an education technology company dedicated to the academic success of children, has been using this to generate pre-scripted voice-over content.
- Reaching global communities, by improving essential service delivery in remote settings. Dimagi is building tools for community health workers to provide a variety of essential services, such as counseling for breastfeeding mothers.
- Supporting people who are non-verbal, such as therapeutic applications for individuals with conditions that affect speech and educational enhancements for those with learning needs. Livox, an AI alternative communication app, powers Augmentative & Alternative Communication (AAC) devices that enable people with disabilities to communicate.
- Helping patients recover their voice, for those suffering from sudden or degenerative speech conditions. The Norman Prince Neurosciences Institute at Lifespan, a not-for-profit health system that serves as the primary teaching affiliate of Brown University’s medical school, is exploring uses of AI in clinical contexts. They’ve been piloting a program offering Voice Engine to individuals with oncologic or neurologic etiologies for speech impairment.
While voice cloning technology has existed for some time, OpenAI claims its approach offers higher-quality speech. Additionally, Voice Engine is priced competitively, offering synthetic voices at rates lower than some competitors.
Ethics and deepfakes
However, concerns arise regarding the potential impact on the voice actor industry. OpenAI’s technology could commoditize voice work, potentially displacing human voice actors. Although OpenAI requires explicit consent from individuals whose voices are cloned, questions remain about the broader implications for the industry.
Despite advancements in voice cloning technology, ethical concerns persist. Instances of misuse, such as generating fake voices for malicious purposes, highlight the need for responsible deployment. OpenAI is taking steps to address potential misuse by limiting access to Voice Engine and prioritizing socially beneficial use cases. Those steps include;
- Phasing out voice based authentication as a security measure for accessing bank accounts and other sensitive information
- Exploring policies to protect the use of individuals’ voices in AI
- Educating the public in understanding the capabilities and limitations of AI technologies, including the possibility of deceptive AI content
- Accelerating the development and adoption of techniques for tracking the origin of audiovisual content, so it’s always clear when you’re interacting with a real person or with an AI
Looking ahead, OpenAI plans to evaluate the preview release of Voice Engine and consider its wider release based on feedback and safety considerations. The company remains committed to advancing voice technology responsibly, ensuring clarity between artificial and human voices.