Generative AI (gen AI) has gained massive traction in the enterprise world over a relatively short period of time. The technology has the potential to drive significant improvements in efficiency and innovation, from automating routine workflows to generating insights from large data sets.
Right now, AI assistants are increasing productivity by augmenting individual capabilities. The next evolution in ways of working and consulting is agentic AI, where a human oversees a team of autonomous AI agents that perform tasks and communicate with each other. According to Jill Goldstein, Global Managing Partner for HR and Talent Transformation at IBM Consulting, “Companies will need to reevaluate their current work processes and create new types of teams where humans oversee groups of autonomous AI agents.”
To fully harness the potential of AI, we must establish productivity measurement frameworks that not only measure individual output, but also the coordination of AI agents working alongside humans. But quantifying real-world impacts on productivity, particularly given how closely humans and machines work together to perform workplace tasks, can be a complex process. In other words, the question today isn’t whether AI will be deployed to increase productivity, but how best to measure and use the tools at an enterprise’s disposal.
At IBM Consulting®, we’ve addressed this question by creating an internal productivity measurement lab which creates frameworks and methods for measuring productivity as our consultants adopt AI. We believe these frameworks to be critical not just for successful adoption, but to provide useful and tangible measurements of success. They’ve also been indispensable in providing actionable data to inform the ongoing development of our AI-powered delivery platform, IBM Consulting Advantage, which supercharges our consultants’ client delivery with AI agents, applications and more.
Through this process, we have identified 5 key lessons for how best to measure the productivity of using AI in an enterprise setting:
When evaluating the impact of generative AI, it’s crucial to consider the specific context in which it is being applied. AI tools perform differently across industries, departments and tasks, meaning a one-size-fits-all evaluation won’t yield accurate insights.
Goldstein echoes this idea: “To capture the value of generative AI, leaders must first envision it within the context of their workforce. This means having the right technology in the right place and equipping the workforce with the technical acumen to use the tools effectively."
For example, AI’s impact on an engineering team differs from its effect on a customer service employee. A developer that uses a coding assistant might see faster code deployment with fewer errors, while a customer experience agent might expect quicker response times.
A successful productivity measurement process identifies the specific problem AI is intended to solve, allowing researchers to assess its relevant impact with accuracy.
Truly understanding the impact of generative AI, and the way humans use an assistant or tool, requires measuring performance against a control group that isn’t using AI. This method allows researchers to see whether improvements are directly attributed to the AI system.
In our productivity measurement lab research, we identify user groups that are as similar as possible and ask them to run an identical project that mimics a real-world scenario: one group in a traditional manner, and another with AI augmentation. From there we’re able to quantify key metrics such as speed, quality, cost and accuracy between these 2 groups.
Generative AI’s impact on productivity can vary significantly depending on the skill level of an employee that uses the system. Given this, it’s important to assess how AI performs across a range of user expertise. Skill levels and expertise should not be viewed solely through the lens of seniority or years of experience, but rather relevant or targeted skills required for a particular task implementation.
In one recent study evaluating a code assistant, we formed 2 teams performing the same task augmented with AI: one with a higher skill level, and one with less expertise. We found significant variation in each group’s level of productivity compared to the control group, suggesting human-machine interaction and the ability to communicate with the system effectively had a major impact on the tool’s return on investment.
The success of generative AI in an enterprise setting is often dependent on how quickly and effectively a workforce can adapt to it. Generative AI is designed to augment human capabilities, which can require a learning curve and a period of adjustment. Measuring human adoption and integration with AI systems is crucial in gauging the system’s overall impact.
In our research, we’ve found that some groups adapt less quickly to AI assistants, requiring more onboarding and experimentation before they're able to productively use the tool. We also found that an assistant’s integration with existing team-specific tools was a major factor in how it impacted productivity.
To effectively measure this variable, we recommend continuously monitoring and observing research subjects to identity how quickly they’re able to adapt.
Generative AI’s impact on productivity extends to how its output needs to be maintained. Measuring how easy or challenging it is to update or manage AI-generated output is a key aspect of its overall effect.
For example, in a study of a code assistant’s productivity, we noted that some teams generated fewer lines of code while achieving the same results, leading to reduced maintenance.
In other AI applications, this measurement might involve calculating the human effort required to oversee or audit content AI generates. If AI performs labor requiring extensive revisions or updates, net productivity might be lower than expected.
Moving into 2025, research like this becomes even more imperative as enterprises seek to measure the impact of their generative AI investments. Goldstein underscores this notion, saying, “Organizations must develop productivity measurement frameworks to gain insights into how AI is augmenting workforce capabilities and addressing challenges. With this workforce data at their fingertips, leaders can pinpoint high-impact use cases, prioritize AI efforts and maximize ROI.”
Our early findings suggest the value of an enterprise's AI is deeply tied to how humans can use it: whether they have the knowledge to query it effectively or how well the assistant integrates with the workflows they’re accustomed to using every day.
At IBM Consulting's productivity measurement lab, we’re using these insights to continuously tweak and grow our tools, with the goal of creating more efficient human-machine relationships and realizing the true power of AI.
1 The economic potential of generative AI: The next productivity frontier, McKinsey, 14 June 2023
2 Will the USD 1 trillion of generative AI investment pay off?, Goldman Sachs, August 2024
Stay ahead of the curve with our AI experts with weekly insights on the latest AI news, trends, innovations and their impact on business.
Get past barriers and leap forward with courage and conviction in the generative AI era.
By understanding what drives the computing costs of gen AI, CEOs can make more informed investment decisions, setting strategic priorities that make innovation and transformation more cost-efficient.
Audi AG accelerates its path to new business insights using IBM Power Systems.
Reinvent how work gets done by intersecting business and technology transformation to unlock enterprise agility.
Reimagine and modernize HR with AI at the core to deliver better business outcomes and unlock employees’ full potential.
Unlock financial performance and business value with end-to-end services that infuse data analytics, AI, and automation across core processes.