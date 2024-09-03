The world of AI jailbreaks is diverse and ever-evolving. Some attacks are surprisingly simple, while others involve elaborate scenarios that require the expertise of a sophisticated hacker. What unites them is a common goal: pushing these digital assistants beyond their programmed limits.

These exploits tap into the very nature of language models. AI chatbots are trained to be helpful and to understand context. Jailbreakers create scenarios where the AI believes ignoring its usual ethical guidelines is appropriate.

While multi-step attacks like Skeleton Key grab headlines, Lee argues that single-shot techniques remain a more pressing concern. “It’s easier to use one shot to attack a large language model,” he notes. “Imagine putting a prompt injection in your resume to confuse an AI-powered hiring system. That’s a one-shot attack with no chance for multiple interactions.”

According to cybersecurity experts, the potential consequences are alarming. “Malicious actors could use Skeleton Key to bypass AI safeguards and generate harmful content, spread disinformation or automate social engineering attacks at scale,” warns Stephen Kowski, Field CTO at SlashNext Email Security+.

While many of these attacks remain theoretical, real-world implications are starting to surface. Lee cites an example of researchers convincing a company’s AI-powered virtual agent to offer massive, unauthorized discounts. “You can confuse their virtual agent and get a good discount. That might not be what the company wants,” he says.

In his own research, Lee has developed proofs of concept to show how an LLM can be hypnotized to create vulnerable and malicious code and how live audio conversations can be intercepted and distorted in near real time.