Voice Cloning and Audio Deepfakes, Fully Explained

Voice Cloning and Audio Deepfakes

How voice cloning technology works, real cases of audio manipulation in politics and fraud, and what defenses exist against synthetic voices.

How Voice Cloning Works

Modern voice cloning requires as little as three seconds of audio to create a synthetic replica of someone's voice. Services like ElevenLabs, Resemble.AI, and open-source tools like VALL-E and Tortoise TTS analyze the unique characteristics of a voice — pitch, cadence, accent, breathing patterns — and build a model that can speak any text in that voice.

The technology has legitimate uses: dubbing films into other languages while preserving the original actor's voice, helping people who have lost their voices due to illness, and creating audiobook narration. But the same technology enables fraud, political manipulation, and harassment at a scale that was previously impossible.

In January 2024, voters in New Hampshire received robocalls featuring a synthetic version of President Biden's voice telling them not to vote in the primary. The calls were convincing enough that the FCC issued an emergency declaratory ruling making AI-generated voice calls illegal under existing robocall laws.

Ask about this lesson