Securing Generative AI Solutions

Overview

Generative AI systems present a number of unique security challenges. Alongside the typical challenge of securing access to generative AI models, organizations must balance the creative power of large language models (LLMs) and other generative technologies with the risk that the models will generate incorrect or undesirable outputs, disclose sensitive or private information, or execute undesirable or incorrect / disallowed / illegal actions.

The OWASP Top 10 for LLMs and Generative AI Apps

The Open Web Application Security Project, OWASP, has published version 1 of the top 10 risks and vulnerabilities for LLMs and generative AI applications. The diagram below illustrates these vulnerabilities in the context of an agentic AI architecture.

Prompt Injection occurs when an attacker is able to insert malicious content into LLM prompts. Content can range from prompts / instructions embedded within a larger prompt, hyperlinks to content that will be read by the LLM (eg. “Read and parse the text at the following URL..”), or other means. Prompt injection can enable an attacker can manipulate the model into ignoring instructions and/or providing undesirable or incorrect outputs.
Insecure Output Handling occurs when the outputs of a LLM are not sufficiently validated for malicious potential or intent. Examples of this type of vulnerability are an LLM is asked to generate Javascript code which is passed to the user’s browser for execution, and direct execution of shell scripts or other ‘system’ code generated by an LLM.
Training Data Poisoning occurs when an attacker is able to modify or manipulate a models training and/or configuration data to introduce vulnerabilities into a model. For example, an attacker could modify a business process description to allow for unlimited transfers of money to a specific individual; or a competitor could modify fine-tuning data so the model will recommend their products over those of the enterprise.
Model Denial of Service occurs when an attacker is able to manipulate a model to consume a high amount of resources, resulting in poor performance or the model being unavailable for other users. Examples of model denial of service include repeatedly submitting prompts that are just below the size of the model’s context window, consuming large amounts of memory; and submitting prompts that cause the model to recursively expand and process the context window (an endless loop).
Supply Chain Vulnerabilities are both the typical vulnerabilities associated with using third-party software that may have unknown vulnerabilities that can be exploited by an attacker, as well as vulnerabilities created by models using unverified and/or crowd-sourced data in their training process.
Sensitive Information Disclosure occurs when a model discloses sensitive or personal information. This can occur as a consequence of a successful prompt injection attack, through insecure handling of enterprise system outputs, or through malicious prompts that manipulate the model into producing sensitive outputs, eg. valid credit card numbers.
Insecure Plugin Design occurs when tools called directly by models are not securely designed; eg. tools running as an administrative user, or tools that enable prompt injection through their outputs.
Excessive Agency occurs when a model or autonomous agent has the ability to perform damaging or unauthorized actions in response to unexpected or ambiguous outputs from an LLM.
Overreliance occurs when a model’s output is not verified for correctness against factual sources or procedural controls. The most common example of overreliance is when a model hallucinates and the incorrect output is accepted as factual, eg. a chatbot providing an incorrect answer to a customer on a store’s return policy, but overreliance can also occur with model generated code or images.
Model Theft occurs when an attacker is able to compromise, physically steal, or copy a model, its weights, and/or its parameters. Once in possession of a model, an attacker can capitalize on the valuable intellectual property embedded in the model, or create a duplicate of the model for their own use.

Protecting Generative AI Systems

The figure below augments the architecture to show the placement of security components to protect against / mitigate the vulnerabilities in the OWASP Top 10.

An Identity and Access Management (IAM) component is added to provide strong user identities and roles; mitigating the risk of model theft by controlling access to application functionality and APIs that could lead to theft or model disclosure.

Agent identify and access control (Agent Access Control), which functions similar to privileged user, is added to match agent access rights to user identities and roles; guarding against excessive agency and abnormal agent actions as a result of hallucinations, or poorly formed or ambiguous prompts.

Generative AI monitoring components (GenAI Monitoring) are added throughout the architecture to guard against prompt injection, insecure output handling, sensitive data disclosure, and overreliance. A combination of GenAI Monitoring and traditional Data Leakage Monitoring is deployed to guard against prompt-/reponse-based attacks, eg. a prompt injected into the results of a SQL query, as well as the disclosure of sensitive information that may appear in the results of API calls, database queries, and the like.

Training data poising attacks are mitigated by the addition of Configuration Management and monitoring tools, as well as a structured Version Control and release process around model training, fine-tuning, and configuration data.

Finally, an Integrated Behavior Monitoring and event correlation component is added to identify potential vulnerabilities and attacks from individual component logs. A Notification and Alerting component is added to notify system operators of potential issues, and a Response Orchestration component is added to automate and/or coordinate system and manual responses to identified issues.

Resources

IBM's Generative AI Architecture

IBM's Generative AI Architecture is the complete IBM Generative AI Architecture in IBM IT Architect Assistant (IIAA), an architecture development and management tool. Using IIAA, architects can elaborate and customize the architecture to create their own generative AI solutions.

Next steps

Talk to our experts about how you can accelerate your adoption of generative AI.

Securing Generative AI Solutions

Contributors

Chris Kirby, Wissam Dib, Manav Gupta

Updated: January 31, 2025