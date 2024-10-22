Organizations have a whole new pipeline of projects being built that leverage generative AI. During the data collection and handling phase, you need to collect huge volumes of data to feed the model and you’re providing access to several different people, including data scientists, engineers, developers and others. This inherently presents a risk by centralizing all that data in one place and giving many people access to it. This means that generative AI is a new type of data store that can create new data based on existing organizational data. Whether you trained the model, fine-tuned it, or connected it to a RAG (Vector DB), that data likely has PII, privacy concerns and other sensitive information in it. This mound of sensitive data is a blinking red target that attackers are going to try and get access to.

Within model development, new applications are being built in a brand-new way with new vulnerabilities that become new entry points that attackers will try to exploit. Development often starts with data science teams downloading and repurposing pre-trained open-source machine learning models from online model repositories such as HuggingFace or TensorFlow Hub. Open-source model-sharing repositories have been born out of inherent data science complexity, practitioner shortage, and the value they provide to organizations in dramatically reducing the time and effort required for generative AI adoption. However, such repositories can lack comprehensive security controls, which ultimately pass the risk on to the enterprise—and attackers are counting on it. They can inject a backdoor or malware into one of these models and upload the infected model back into the model-sharing repositories, affecting anyone who downloads it. The general scarcity of security around ML models, coupled with the increasingly sensitive data that ML models are exposed to, means that attacks targeting these models have a high propensity for damage.

And during inferencing and live use, attackers can manipulate prompts to jailbreak guardrails and coax models into misbehaving by generating disallowed responses to harmful prompts including biased, false and other toxic information, inflicting reputational damage. Or, attackers can manipulate the model and analyze input-output pairs to train a surrogate model to mimic the behavior of the target model, effectively “stealing” its capabilities, costing that enterprise its competitive advantage.