Stephen Fry Issues Warning After AI Stole His Voice From ‘Harry Potter’ Audiobooks
Stephen Fry is warning others about the dangers of artificial intelligence after he found his voice was “stolen” and used in a documentary without his permission from his narration of Harry Potter audiobooks. While Fry is best known as an actor, comedian, and broadcaster, he has also devoted much time to narration work. Most notably, he read and recorded all seven Harry Potter novels so they could be released in the UK as audiobooks. He is a phenomenal narrator who often works voice acting and impressions into his soothing narration, with his Harry Potter work earning him the most attention.
However, his narration work was recently used against him due to AI. Recent advancements in AI have raised many concerns, mainly because of all the ill purposes for which AI can be used. One major concern is AI’s ability to recreate the likeness of real people, both their image and their voices. There have already been many disturbing instances of women becoming victims of nonconsensual “deepfake” pornography. SAG-AFTRA is also currently on strike over concerns that studios plan to use AI to generate and own the likeness of actors.
Meanwhile, voice actors have also increasingly been expressing concern over generative AI being used to emulate their voices without their permission, whether it’s for fan work or used professionally, especially because these voices have also been used to create and distribute non-consensual deepfake pornography. Among those speaking out is Fry, who experienced firsthand having his voice stolen by AI.
Stephen Fry warns AI voice cloning is just the beginning
While appearing at the CogX Festival in London, Fry opened up about how AI cloned his voice without permission. He explained that an AI system had used his Harry Potter audiobooks as a dataset to recreate his voice, which was then used in a historical documentary without his permission. Fry was quite shocked at how sophisticated the AI was in recreating his voice. He stated, “What you heard was not the result of a mash-up, this is from a flexible artificial voice, where the words are modulated to fit the meaning of each sentence.”
Fry also revealed that his agents were shocked by the voice stealing and weren’t even aware that such a thing was possible. It is very concerning just how realistic he claimed the voice emulation was, warning that his voice could’ve been used to make him “read anything” convincingly, whether that be “a call to storm parliament” or “hard porn.” He also issued an ominous warning about AI, “You ain’t seen nothing yet. This is audio. It won’t be long until full deepfake videos are just as convincing.”
He is correct that deepfake videos are getting to the point where it will be hard to differentiate real from fake without the aid of technology. Some very advanced videos may already be at that point, but there’s minor comfort in the fact that the usage of deepfake and CGI to recreate actors’ likenesses in film and TV so far has been fairly detectable. However, what’s startling is that Fry thinks we’re already at this point with voice emulation, where you truly can’t tell fake from the real. While there is technology to make this detection, AI voice emulation opens a world of terrifying possibilities, such as Fry’s suggestion that a leader’s or public figure’s voice could be recreated to issue dangerous orders to others.
Fry’s story is a scary reminder of the speed at which AI is developing, and that the concerns of SAG-AFTRA are very valid, as the scenarios they fear most are already happening. If AI continues advancing without regulation, we may soon live in a world where we’re constantly trying to decipher what’s real and what’s not.
(via Deadline, featured image: Matthew Eisman/Getty Images)
Have a tip we should know? tips@themarysue.com