Researchers from IBM Security have recently identified a concerning vulnerability named “audio-jacking,” showcasing the potential for artificial intelligence (AI) to manipulate live conversations without detection.
The attack method relies on generative AI, including models like OpenAI’s ChatGPT and Meta’s Llama-2, along with deepfake audio technology.
In the experiment conducted by IBM Security, the AI was instructed to process audio from ongoing live communication, such as a phone conversation. The AI, triggered by a specific keyword or phrase, intercepted and manipulated the related audio before delivering it to the intended recipient.
The experiment successfully demonstrated the AI intercepting and altering a speaker’s audio when prompted to provide sensitive information, such as a bank account number. Importantly, the victims of the experiment were unaware of the attack.
Audio-Jacking Experiment Results
While executing the Audio-Jacking attack would involve some level of social engineering or phishing, the researchers highlighted that developing the AI system itself was surprisingly easy. The main challenge was capturing audio from the microphone and feeding it to generative AI.
Traditionally, creating a system capable of autonomously intercepting specific audio strings and replacing them with dynamically generated audio files would require a complex interdisciplinary computer science effort. However, modern generative AI simplifies this process, with the researchers noting that only three seconds of an individual’s voice are needed to clone it.
The implications of audio jacking extend beyond financial scams, posing a potential invisible form of censorship. The researchers emphasize that this vulnerability could be exploited to change the content of live news broadcasts or political speeches in real time. The discovery highlights the need for increased awareness and robust security measures to counteract the potential misuse of AI in live communication scenarios.