Humans can detect artificially generated speech about 73% of the time, a new study has found. That's the majority of the time, but it's not an overwhelming success — indicating that there may be plenty of opportunities for deepfake voice audio to scam you in the near future.
After the study, participants were trained in how to detect generated speech audio clips and became slightly better but were still not perfect. Even with training, deepfake audio can fool the typical person.
The study even found similar results across the two languages it tested, English and Mandarin.
How the New Study Tackled Deepfake Speech
The study was conducted by researchers at University College London, who trained a text-to-speech AI on two datasets to generate 50 speech samples in the two different languages. Then, 529 participants listened to audio clips and attempted to determine which were fake and which were spoken by a real, flesh-and-blood human.
🔎 Want to browse the web privately? 🌎 Or appear as if you're in another country?
Get a huge 86% off Surfshark with this special tech.co offer.
The results: 73% accuracy. In other words, one in every four deepfake audio attempts can expect to work without raising any red flags for their targets.
Kimberly Mai, the first author of the study, explained that this is bad news:
“In our study, we showed that training people to detect deepfakes is not necessarily a reliable way to help them to get better at it. Unfortunately, our experiments also show that, at the moment, automated detectors are not reliable either.”
Deepfake Speech Scams Are More Common Than You Think
Audio deepfakery might sound like Mission: Impossible spy technology, but one in four adults have already experienced one, according to one survey.
That same McAfee survey found that 10% were personally targeted, and another 15% knew someone who had been. At the same time, victims were pretty sure they couldn't be fooled. As Tech.co senior writer Aaron Drapkin put it at the time:
The McAfee survey also found that 70% of people said they were “unsure” if they’d be able to tell the difference between an AI voice and a human one. Almost one-third (28%) of US respondents said they wouldn’t be able to tell the difference between a voicemail left by an “Artificial Imposter,” as McAfee puts it, and a loved one.
Of the victims in the US who lost money through AI voice cloning scams, 11% lost between $5,000–$15,000.
Now, the new survey indicates that many of those people would still be suckered by the right audio clip.
Could You Still Identify Fake Audio Speech?
Look, the good news is that the average person can still figure out when speech is computer generated a full 73% of the time. And with the right training, you can slightly boost your average.
The best way to stay safe, however, is likely to use a little analytical thought outside of the audio itself: Are you being asked to reveal sensitive information? This indicates a motive behind a potential scam. Have you initiated the process yourself? A scammer will target you and be the first to reach out.
Hopefully, automated deepfake speech detectors will continue to improve as well, helping to take some of the burden off of our fallible human ears.