Defending against these attacks is an ongoing challenge. Lee outlines two main approaches: improved AI training and building AI firewalls.
“We want to do better training so the model itself will know, ‘Oh, someone is trying to attack me,'” Lee explains. “We’re also going to inspect all the incoming queries to the language model and detect prompt injections.”
As generative AI becomes more integrated into our daily lives, understanding these vulnerabilities isn’t just a concern for tech experts. It’s increasingly crucial for anyone interacting with AI systems to be aware of their potential weaknesses.
Lee parallels the early days of SQL injection attacks on databases. “It took the industry 5-10 years to make everyone understand that when writing a SQL query, you need to parameterize all the inputs to be immune to injection attacks,” he says. “For AI, we’re beginning to utilize language models everywhere. People need to understand that you can’t just give simple instructions to an AI because that will make your software vulnerable.”
The discovery of jailbreaking methods like Skeleton Key may dilute public trust in AI, potentially slowing the adoption of beneficial AI technologies. According to Narayana Pappu, CEO of Zendata, transparency and independent verification are essential to rebuild confidence.
“AI developers and organizations can strike a balance between creating powerful, versatile language models and ensuring robust safeguards against misuse,” he said. “They can do that via internal system transparency, understanding AI/data supply chain risks and building evaluation tools into each stage of the development process.”