What code LLMs mean for the future of software development

We have entered an era of code coauthored with machines.

When OpenAI’s ChatGPT first burst into the tech scene, it launched the age of generative AI, with code generation as one of the earliest use cases. Artificial intelligence (AI) coding assistants then followed, from the pioneering GitHub Copilot to the likes of Amazon Q Developer, Anthropic’s Claude Code, Google’s Gemini Code Assist, IBM watsonx Code Assistant™ and Mistral Code.

But underneath these coding tools lies a powerful technology: large language models (LLMs) for code. And they’re changing the way software is built.

We spoke with a few IBM experts to get the inside scoop on how code LLMs are redefining the software developer role and their forecasts for the future of these models. Sharing their insights are:

Kaoutar El Maghraoui, Principal Research Scientist

Bridget McGinn, Research Software Engineer, AI for Code

Rameswar Panda, Distinguished Engineer

Fumiko Satoh, Senior Technical Staff Member and Senior Manager, AI for Code and Security

Industry newsletter

The latest AI trends, brought to you by experts

Get curated insights on the most important—and intriguing—AI news. Subscribe to our weekly Think newsletter. See the IBM Privacy Statement.

What are LLMs for code?

Code LLMs are specialized models trained on source code. They can either be built from scratch or crafted from a pretrained model fine-tuned on coding datasets. This training data must be of high quality and diverse enough to make sure code LLMs can handle various coding scenarios in different programming languages.

Because LLMs for code are typically derived from AI models designed for natural language processing (NLP), they often take natural language descriptions as prompts. They’re capable of accomplishing these real-world coding tasks:

Assisting with debugging

Code completion (also known as autocomplete)

Code generation

Code refactoring

Code summarization

Generating documentation (as inline comments for code snippets)

Modernizing legacy code (such as converting COBOL to Java)

Suggesting code optimizations

Translating existing code from one programming language to another (such as from Python to JavaScript)

Some popular examples of code LLMs include Google’s CodeGemma, Meta’s Code Llama and Mistral’s Codestral and Devstral. Meanwhile, open-source code models include DeepSeek Coder V2, IBM Granite Code and Qwen3 Coder.

Other LLMs that aren’t exclusively for programming but have been trained on and optimized for coding include Anthropic’s Claude Sonnet 4, Opus 4 and Google’s Gemini 2.5 Pro.

AI Academy

The rise of generative AI for business

Learn about the historical rise of generative AI and what it means for business.

Go to episode

The transforming role of software developers

Code LLMs democratize access to software creation, supporting citizen coders, domain experts and other nontraditional developers who have no formal computer science education or training. They allow for rapid prototyping, faster iteration cycles and quicker onboarding for new software engineers.

Beyond reshaping software development, code LLMs are redefining the role of software developers.

From low level to high level

According to El Maghraoui, software engineers are evolving from code producers to code curators. “While coding remains essential, the emphasis is shifting toward prompt engineering. How do we frame the right queries in the context of these LLMs? And instead of writing every single line of code, developers are increasingly orchestrating AI-generated code, stitching the pieces together.”

This evolution, however, is just the tip of the iceberg. El Maghraoui believes developer roles will progress into what she dubs “intent-driven engineering.” The idea is to veer away from syntax and focus on structure, leave the finer details and zoom out to the bigger picture, and switch from the what to the why, highlighting aims, outcomes and impact.

For McGinn, code LLMs can be treated as libraries imported into a program. “Problems can be solved more quickly in the same way libraries help us as engineers take functionality that’s already been built and not have to reinvent the wheel.”

This view aligns with generative computing. In this framework, a code model is integrated into systems as a modular software component and handled like a programmable interface. As such, it can abstract away low-level tasks, with developers directing more of their efforts toward higher-order problem-solving.

“They need to consider design thinking rather than writing code,” Satoh says. “We need higher-level engineers who can create the architecture of a system.”

It’s a role that’s becoming more multidisciplinary, incorporating not only design and architecture but also other dimensions like ethics and security. “It’s less manual and there is more automation, but it’s more strategic,” says El Maghraoui. “You’re blending software engineering with system-level thinking, product thinking and ethical reasoning. Now we’re not just coding-literate—we’re becoming AI-literate and AI-first engineers.”

This opens up avenues for upskilling on those higher-level tasks, especially for junior developers. “The goals of a lot of benchmarks are to be at the functionality of a junior developer. So if that’s a metric we’re aiming for, then the idea is to not have as much of a need for that,” McGinn says. “We need people who have a deeper level than a junior developer, or they might not need to have gone through the role of junior developer anymore to understand what a senior developer does.”

Human coders must remain in the loop

Panda notes that as with any large language model, code LLMs have their drawbacks. He suggests caution when it comes to copyright, bias and security, especially when working on crucial applications in sectors like finance and healthcare.

“You cannot just blindly rely on them,” says Panda. “You have to always take their generated code with a grain of salt.”

This skepticism must also be paired with performance optimization mechanisms and validation measures to verify soundness and accuracy. “You want to still make sure you're writing test cases and you’re checking code as thoroughly as you would if it was manually written,” McGinnis says.

El Maghraoui emphasizes support from rather than reliance on code LLMs. “If we use them as a teaching aid or a pair programmer or an idea generator, they can boost our learning, creativity and productivity. But if we use them as a crutch—without introspection, without validation—they can erode our judgment and accountability.”

Foundational programming matters even more

Validation can only be possible if developers grasp the underlying computing principles. After all, if you don’t understand the fundamentals of programming, how can you confirm the validity of the code generated by these models?

“It’s fast coding but it’s not always robust or correct or secure,” El Maghraoui says. She adds that using generated code as it is can be dangerous. “It may cause fragility in codebases. If you overly rely on these models or overly trust their outputs, this can propagate subtle bugs or inefficiencies, especially in critical systems. That’s why it’s important to understand what’s happening.”

This is where deep expertise comes in, made possible by cardinal software development concepts.

“In schools, we still teach people to do long division manually, but what does that serve if they’ll be using a calculator? It’s similar to coding. What are the foundational pieces we think are important to understand what’s going on?” says McGinn.

Some of these technical underpinnings can include compilers, computer architecture, databases, memory management and operating systems. “Developers might start to skip learning these concepts deeply because code LLMs are abstracting them,” El Maghraoui says. However, it’s essential to understand software development’s inner workings because of their implications on source code functionality and performance, she adds.

And while code LLMs can reduce cognitive overhead through automating repetitive tasks, they also have the potential to increase “cognitive atrophy,” as El Maghraoui calls it. She likens it to the greater use of GPS eroding our natural sense of navigation. “If developers rely heavily on code suggestions, they will be less fluent in debugging. Code LLMs can weaken our ability to think algorithmically if we are not balancing them with foundational practice or foundational knowledge.”

Predictions for the next frontier of code LLMs

As code LLMs advance, they’ll be unveiling new features and even novel use cases, as demonstrated by the vibe coding trend. So what’s in store for these models in the future? Our experts lay out their predictions.

Rise of the agents

The world is abuzz with talk of AI agents transforming the future of work in almost every industry—including software development.

“I see the shift toward these multiagent coding systems with self-healing codebases,” says El Maghraoui. These agentic AI systems are already taking shape. For instance, IBM’s software engineering (SWE) agents can autonomously resolve GitHub issues by first “localizing” to where the bugs are in a codebase and then editing those lines of code to resolve them.

Small and sustainable

McGinn is hoping for a more energy-efficient strategy “where everything isn’t done by the largest model, but we can save some energy with a smaller agentic approach” that entails different agents fulfilling specific programming tasks.

Similarly, Panda is excited to make small language models (SLMs) like Granite Code available on resource-constrained devices such as laptops and edge devices. “At the same time, there are some things which are challenging for SLMs to solve, so then the bigger Granite models will come into the picture. It’s a balance between multiple models, not just a single model to solve everything.”

Memory and reasoning boost

Integrating contextual memory and reasoning into code LLMs is another improvement to watch out for, notes El Maghraoui. “They can remember the entire history of the project and how it evolved. It allows for better suggestions for the future.”

The reasoning component aids in “long-horizon tasks,” as Panda terms it. “Code LLMs will be interacting with your code repositories to take many actions sequentially, not just a one-stop action. Only if your code LLM is strong in reasoning can it do these tasks.”

Evolving developer environments and deployments

Today’s integrated development environments (IDEs) incorporate code models and coding assistants as services or plug-ins. But tomorrow’s IDEs might push toward AI-native developer environments. “We will see these IDEs be redesigned with code LLMs from the ground up—not just as plug-ins but as part of their core,” El Maghraoui says.

She also sees increasing private infrastructure deployments of code LLMs as a growing trend. For instance, the Mistral Code AI assistant, powered by Codestral and Devstral, supports deployment on premises.

“Private deployments for these code LLMs are being accelerated by performance, cost and privacy concerns,” says El Maghraoui. “Enterprises want to avoid sending proprietary code to third-party APIs. Private deployments ensure their source code, internal libraries and stack traces never leave their internal company network. There are also regulatory compliance issues, especially in finance or healthcare, and the cost of running high-throughput use cases in-house can be cheaper on a large scale.”

Bridging the coding and testing gap

Writing code and creating tests are distinct parts of the software development process. Satoh envisions code LLMs helping seamlessly link these two stages, aiding in both code generation and test generation.

“We should connect them with the same model to make a seamless product,” she says. Satoh ventures even further, foreseeing code models “supporting the entire software development lifecycle,” including earlier phases such as requirements specification and design.

But no matter where this growing era of code coauthored with machines takes us, El Maghraoui remains cautious yet optimistic. “It’s like a knife that has two sides. It could be something that hurts us as developers or something that helps us. So it’s really important to understand how to properly use these code LLMs so you don’t lose your edge but you become more creative.” And those who will likely thrive, she believes, will be “developers and companies who embrace these workflows and treat AI not just as a tool, but as a partner.”

Unpacking the agentic AI journey: what delivers, what distracts, and what deserves your investment

Join us to explore where agentic AI is already delivering measurable value, where the technology is still evolving, and how to prioritize investments that align with your organization’s strategic goals.

Resources

Take your gen AI skills to the next level

Learn fundamental concepts and build your skills with hands-on labs, courses, guided projects, trials and more.

Put AI to work: Driving ROI with gen AI

Want to get a better return on your AI investments? Learn how scaling gen AI in key areas drives change by helping your best minds build and deliver innovative new solutions.

From AI projects to profits: How agentic AI can sustain financial returns

Learn how organizations are shifting from launching AI in disparate pilots to using it to drive transformation at the core.

The CEO's guide to generative AI

Learn how CEOs can balance the value generative AI can create against the investment it demands and the risks it introduces.

watsonx Developer Hub

Support your next project with some of our most commonly used capabilities. Get started and learn more about the supported models that IBM provides.

The truth about successful generative AI

Uncover the benefits of AI platforms that enable foundation model customization through technology, processes, and best practices, to help you easily operationalize the genAI lifecycle.

IBM is named a Leader in Data Science & Machine Learning

Learn why IBM has been recognized as a Leader in the 2025 Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms.

AI in Action 2024

We surveyed 2,000 organizations about their AI initiatives to discover what's working, what's not and how you can get ahead.

Explore IBM Granite

IBM® Granite™ is our family of open, performant and trusted AI models tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.

How to choose the right foundation model

Learn how to select the most suitable AI foundation model for your use case.

How to thrive in this new era of AI with trust and confidence

Dive into the 3 critical elements of a strong AI strategy: creating a competitive edge, scaling AI across the business and advancing trustworthy AI.