Generative AI can no longer be said to be in an experimental phase. It’s embedded in enterprise roadmaps, strategies and operating models across various industries. Business leaders understand that generative models and the workflows they facilitate are inevitable, but now the focus has transitioned from capabilities to a pressure to prove ROI, keep infrastructure costs down and building strong and scalable governance frameworks.
The organizations that win will produce solutions that may not be the flashiest, but the most responsible and sustainable.
Earlier this year, IBM Chief Architect Gabe Goodhart declared the commodification of models.
“You can pick the model that fits your use case just right and be off to the races. The model itself is not going to be the main differentiator,” he said, emphasizing a new focus on orchestration. You’ve got all the models you need—small specialized models and large all-purpose models—now how do you get them to work efficiently and intelligently together?
Here are nine of the most critical generative AI challenges facing organizations today.
AI Observability allows organizations to monitor the behavior of generative artificial intelligence models. This isn’t a straightforward practice, as the outputs of generative AI models are probabilistic. Unlike traditional programming, the large language models (LLMs) that define generative AI don’t follow set rules, but arrive at decisions through the experience of training on massive amounts of data, and then using that training to make predictions on new data. Their intelligence emerges from optimization, pattern detection and probability distributions through complex neural networks.
However, observability tools can help provide insight into how models and agents are arriving at their outputs. AI Observability techniques involve tracking token usage, changes in response patterns, variations in output quality, and the generation of interaction logs that describe agent decision-making.
Observability trends include smarter observability platforms, using observability as part of an overall cost management strategy and the adoption of open observability standards.
Some organizations are still struggling to move from pilots to scaled production and this can be especially challenging when the benefits of AI implementation initiatives are hard to quantify. On the latter point, there are best practices to approach measuring the productivity of gen AI in an enterprise.
However, it can also be beneficial to think beyond ROI. Nvideo CEO Jensen Huang made this argument in January.
“When your kids tell you they want to try something, you should say yes. We never ask questions at home like ‘What is the return on investment here?’”
This exploratory approach will require strong stakeholder buy-in across the organization, but may be necessary in the long term to provide durable business value.
Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.
Large models require expensive GPU clusters and specialized infrastructure. They’re hungry for energy, to the point where the hyperscalers are influencing national energy policies and long-term planning. Training costs are declining but inference at scale remains expensive.
Organizations can contain costs by using multiple models for different tasks. Frontier models aren’t always required for tasks that may not involve complex reasoning. Often a small, open-weight model is sufficient for tasks like classification and information extraction. Another approach might involve using RAG instead of fine-tuning a model, which is comparatively expensive.
Designers of AI systems can encouraging shorter prompts, structured inputs, caching repeated queries, and smarter chunking.
Probably the easiest approach to containing costs in AI technologies is to simply pick the best and most repeatable use-cases for automating high-frequency workflows and tasks with opportunities for clear time savings, without attempting to “AI everything.” AI strategies need not aspire to “full automation, as such systems can be more expensive and risky than AI solutions that involve a human in the loop.
Generative AI performance depends heavily on clean, structured data. Cleaning data removes inconsistencies, duplicates and errors present in training data. While models can process unstructured data (like raw text and images), structured data is useful for minimizing hallucinations. Overall, high-quality proprietary data will become much more of a differentiator as models approach commoditization.
Copyright infringement, data privacy and other data sourcing pitfalls mandate a thoughtful approach to data quality that will look different depending upon the organization and the use case.
AI governance refers to the processes, standards and guardrails that help ensure AI systems and AI tools are safe and ethical. Governance is intended to identify and reduce potential harms, such as algorithmic biases and the unintended consequences of automated decisions. Governance is part of an overall responsible AI approach, which encourages broader principles like trust, fairness, robustness, transparency and privacy.
Because of how complex and fragmented enterprise data environments can be, governance remains a challenge, but a critical one to overcome.
As regions around the globe develop their own regulatory frameworks for AI (such as the EU AI Act), organizations need to be aware of evolving compliance obligations.
When large language model (LLM) outputs seem inaccurate or nonsensical, it may be a hallucination. The newest models and those that can retrieve data with retrieval augmented generation (RAG) are far less prone to hallucinations than earlier models, but the risk remains, and given the way machine learning algorithms and natural language processing (NLP) work currently, there’s no reason to believe hallucinations will soon be eradicated completely.
Generative systems predict the most likely next word given the context. They do not inherently verify facts, and fill gaps with plausible guesses. The models are not optimizing for truth as we know it, but for linguistic plausibility. Making things even more tricky, hallucinations are often delivered with what seems to human users like total confidence.
So enterprises need to think not only about how they will minimize hallucinations, but explore the ethical considerations involved in handling them when they occur, especially in a customer-facing context.
Traditional automation mainly replaced repetitive manual tasks. Adoption of generative AI can automate creative work, which results in the reshaping of roles. Employees will need to learn how to design prompts, supervise generative systems, edit AI-generated outputs and verify results. They will need to be trained to successfully meet the ethical challenges posed by the use of generative AI.
“Workslop” refers to low-quality AI-generated content that may appear polished at first glance but lacks the substance and nuance of human expertise. Organizations will need to reskill employees such that they can take advantage of AI’s potential benefits while understanding its limitations, and produce the best possible work by letting AI excel where it is most capable, while providing human intelligence as needed.
New technology often introduces a new attack surface, and genAI is no exception, especially when it is connected to tools, data and autonomous agents. Threats include prompt injection, which involves using malicious inputs to make AI systems act in ways outside of their intended behavior. This technique can be used to trick a chatbot into leaking sensitive private data or causing an agent to delete something important in a database. Data security will need to evolve to meet this challenge.
AI agents in particular present a major risk management challenge, since agents are able to access files, browse the web, call APIs, run code and control software. One can imagine how a fully-integrated agentic AI system could go wrong, especially when a clever malicious prompt like “get the job done, no matter what” could be interpreted by an agent as “feel free to bypass safeguards.” What’s more, such a dangerous prompt need not be intentionally malicious.
AI systems are powerful but can be fragile, and AI security practices are rapidly evolving to meet the challenge.
There’s still a lot of room for advancements in model development, but it’s inevitable that foundation models are moving toward commoditization, meaning that they will stop being a main differentiator. If virtually every organization has access to similar models, then chatbots and other AI-powered tools all start looking the same.
This trend will shift value away from the model layer and toward unique differentiators and adaptability that will represent the new cutting edge. Organizations that can bring more and better proprietary datasets to the table, along with smarter workflow integrations, better infrastructure, more intuitive customer experiences and stronger governance and compliance frameworks will be able to add the most value.