As world leaders scramble to tackle the menace of deepfakes in a global election year, Sam Altman-run OpenAI is trying to develop beneficial AI, with a text-to-speech model called ‘Voice Engine’.
The AI model uses text input and a “single 15-second audio sample” to generate natural-sounding speech.
“It is notable that a small model with a single 15-second sample can create emotive and realistic voices,” according to OpenAI.
The company admitted that generating speech that resembles people’s voices has serious risks, which are especially top of mind in an election year.
“We are engaging with the US and international partners from across government, media, entertainment, education, civil society and beyond to ensure we are incorporating their feedback as we build,” said OpenAI.
The partners testing ‘Voice Engine’ have agreed to OpenAI’s usage policies, which prohibit the impersonation of another individual or organisation without consent or legal right.
“In addition, our terms with these partners require explicit and informed consent from the original speaker and we don’t allow developers to build ways for individual users to create their own voices,” the company said in a blog post.
Partners must also clearly disclose to their audience that the voices they’re hearing are AI-generated, the company added.
“Finally, we have implemented a set of safety measures, including watermarking to trace the origin of any audio generated by Voice Engine, as well as proactive monitoring of how it’s being used”.