Microsoft Office wants your employees to use Copilot when filing reports. PixelPose wants them to rely on a headshot generator instead of sitting for a photo shoot. Amazon suggests they write code with the help of its CodeWhisperer tool. And then there’s the ever-present temptation to outsource research and analysis to ChatGPT.
Workers at all kinds of organizations, pressured to deliver more and keep up with competition, turn to every available tool that might save them time, automate repetitive tasks or give an insightful edge—whether those tools are available through official channels…or not-so-official ones. But dabbling in the latter can create problems.
The unsanctioned use of consumer-facing generative AI tools like public LLMs is known as “shadow AI.” This is a growing subset of shadow IT, which is the deployment of any software, hardware or information technology (IT) on an enterprise network without an IT department’s and/or CIO’s approval, knowledge or oversight.
The rise in unsanctioned LLM use by employees increases the risk of leaking organizations’ sensitive data, such as financial data or trade secrets. In fact, according to published reports, employees have uploaded lines of proprietary code, sensitive business emails and more to ChatGPT and similar public AI tools. One data protection firm counted 6,352 attempts to input corporate data into ChatGPT for every 100,000 workers on its customers’ payrolls.1
When employees input sensitive data, these tools may retain it, and even use that data to train the model. The model may then output that data, creating a data breach and potentially exposing that information to the world.
How do organizations address this risk as employee demand for AI adoption grows? They can choose from four broad approaches:
In response to critics’ and customers’ concerns, OpenAI, the maker of ChatGPT, introduced data privacy and security guardrails last year, including data retention limits and additional protections for business subscribers.2, 3 But such policies are limited, explains IBM Fellow Jerry Cuomo, Vice President, Technology.
“Even though they’re saying that they’re not intentionally storing data, that doesn’t mean that they’re not storing data unintentionally,” says Cuomo.
There are myriad ways that LLMs might retain information from users. For instance, if a server for an LLM crashes, information about the crash is stored in an error log. If a user’s data was involved in the crash, that data might be stored in the log, too—and then exposed through a security breach.
The providers of public-facing LLMs “aren’t saying that the chance of data exposure is zero,” Cuomo says. “Depending on what’s at stake, that might either not be a problem…or it could be a big, big problem.”
Experts agree that it’s important and necessary to implement and enforce policies that govern AI use in the office, including employee use of consumer-focused services.
“Without corporate AI policies in place—and, importantly, policies that are observed—this can lead to issues regarding security, privacy [and] compliance,” IBM master inventor Martin Keen explains in an IBM Technology video.
The IBM Institute for Business Value (IBV) recommends that companies explore policies that address the use of confidential data, proprietary data or personally identifiable information (PII) within public AI models and third-party applications. Organizations can also monitor user activity, including the content of their inputs into public models, to confirm that employees aren’t uploading sensitive information.
Since 2023, multiple organizations have gone beyond restricting public-facing generative AI tools, and instead outright banned them. Various banks, tech firms and government agencies determined they’d rather forbid access to third-party AI tools than police their usage.
Restrictive policies and explicit bans have their benefits and drawbacks.
By allowing some public generative AI use, organizations can help employees use familiar tools to bolster their productivity. But complicated policies can also be confusing, burdening workers with the task of interpreting whether a specific use of AI is allowed under company policy or not.
Bans are easier to interpret and follow. But when businesses don’t combine their bans with access to more secure AI tools, they can frustrate and disappoint their innovative employees. Some employees even circumvent these bans. Workers in certain online forums openly swap tips on how to obscure their ChatGPT usage from their bosses.4
Rather than relying on public LLM guardrails or banning public LLMs entirely, companies can institute their own corporate interfaces for accessing generative AI models. The bank JPMorgan Chase announced its launch of a corporate interface in August 2024, after restricting ChatGPT access for its employees the year before.5
Such interfaces act as intermediaries between enterprise users and public LLMs. They allow employees to take advantage of various AI tools while also including input filters that discourage or prevent sharing sensitive information with models hosted by third parties. Cuomo explains that IBM’s own platform, IBM Consulting Advantage, warns users when they’re potentially sharing sensitive information or informs them that it will not accept certain information because of its sensitive nature. In the case of the latter, the platform explains why the data is deemed sensitive and may suggest an alternative prompt that will help the user get the results they seek without compromising data security.
But there are limits to the effectiveness of such interfaces if used exclusively with public AI models: If certain inputs are disallowed by the system, employees may still be dissatisfied with the ensuing outputs.
Fortunately, corporate interfaces can also include access to a range of other models, including proprietary, enterprise AI models that don’t require the same level of filtering. When employees have multiple options to choose from, they will likely find certain models to be a better choice for handling sensitive inputs.
“If you’re playing around with a personal email, you might use GPT-4,” Cuomo says. “If you’re doing something based on a response you need to make for a customer and you’re legally not allowed to share that on public sources, then you can use another model that perhaps your company securely set up.”
When companies deploy enterprise AI, they can meet employees’ desire to use cutting-edge tools without sacrificing data security.
Better data security happens by putting the companies themselves in the driver’s seat. Instead of relying on a third party to administer an AI model, companies can take on administrative duties and limit who has access to the information fed into models. Companies can also deploy their AI systems in more secure private cloud environments or on premises.
For some businesses, especially small to medium-sized ones, the prospect of deploying their own models might sound intimidating. The good news is that creating and deploying models is now “orders of magnitude easier than it was, and it just keeps getting easier,” Cuomo says.
What’s behind this increasing ease? Foundation models, for starters. Also known as pre-trained models, these deep learning models are first trained on enormous datasets. Then they can be adapted for more specific AI use cases. Many of these models are available in open source libraries like those offered by IBM partner Hugging Face.
In addition to foundation models, businesses today have other resources at their disposal, including:
In other words, businesses interested in enterprise AI don’t need to start from scratch, nor must they go it alone.
Outside of security concerns, there are other reasons for companies to invest in enterprise AI instead of relying on public LLMs. For one thing, businesses might find foundation models that offer a good fit for their use cases and deliver better outcomes. “If a model is pre-trained on a use case close to ours, it may perform better when processing our prompts,” IBM’s Martin Keen explains.
After the right foundation model or models are identified, businesses can unlock value from training those models on their proprietary data. That marks a sharp contrast to what public LLMs can deliver.
“You’re going to be able to do something that is hard for ChatGPT to replicate, because that’s your data,” Cuomo says. “ChatGPT runs on public data, but most of the world’s data is not public.”
As such, businesses can use their proprietary data and enterprise AI models to gain a competitive advantage.
“You should embrace your proprietary data and create something to build better customer experiences,” Cuomo adds. “Don’t give away your valuable data. Ask ‘how do I use it to win?'”
Links reside outside of ibm.com.
1 “Generative AI leaks are a serious problem, experts say.” IT Brew. 8 May 2023.
2 “New ways to manage your data in ChatGPT.” OpenAI. 25 April 2023.
3 “Enterprise privacy at OpenAI.” OpenAI. Accessed 12 September 2024.
4 “The employees secretly using AI at work.” BBC. 24 October 2023.
5 “JPMorgan Chase is giving its employees an AI assistant powered by ChatGPT maker OpenAI.” CNBC.com 9 August 2024.
Learn more
Learn more
Learn more
Learn more