Open-source software has been the cornerstone of many software technologies, and artificial intelligence (AI) is no exception. Within the AI realm, open source’s influence is expected to extend beyond AI models such as large language models (LLMs) and reach the entire AI ecosystem.
This connection flows both ways, with AI in turn impacting how open-source software is built. Since this relationship will define the future of AI and open source, it’s worth examining. Here are a few ways AI is helping—and hindering—open-source development.
Generative AI currently excels foremost as a coding assistant. “Responsible open-source developers are using AI as a rubber duck,” says JJ Asghar, a developer advocate at IBM.
He’s referencing the programming practice of “rubber duck debugging.” This technique involves talking to a rubber duck or other inanimate object about a specific error in the code, which can help developers figure out the solution in the process of articulating the problem out loud.
“Now with AI, you can ask questions about what you’re trying to do, and it can give you real feedback, helping guide you to the correct place,” Asghar says. “That is the way the most responsible open-source developers are using AI. They’re not just copying code directly from it—they are coming to the resolution through engaging with AI.”
Industry newsletter
Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement.
Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.
For first-time contributors or younger maintainers, AI is breaking down barriers to entry. According to a 2024 survey of open-source maintainers conducted by open-source security specialist Tidelift, nearly half of the respondents are using AI-based coding tools, with those under the age of 26 more likely to use them.
“There is a significant number of early-career engineers who leverage AI as their pair programmer and as an expert they can reference,” says Asghar. “More senior developers don’t have any huge desire to leverage it, apart from administrative work, mainly because a lot of senior developers see it as a crutch.”
This administrative support from AI comes in the form of writing unit tests, which validate that a function or method in the source code works as expected. “The ability to automatically create unit tests for code is extremely useful,” Asghar says. “The challenge, though, is you can’t just take it at face value. You still need to verify the work, so it’s ‘trust but verify’.”
Additionally, AI can help automate and enhance the creation and updating of release notes, README files, changelogs and code-level documentation—necessary yet tedious tasks, especially for maintainers managing large or multiple open-source projects. In an episode of the IBM Think podcast AI in Action, Miha Kralj, global senior partner of IBM Hybrid Cloud, notes that generative AI is a boon for developers when it comes to writing good documentation.
“A developer these days is just going to write the whole class or function or routine…, typically without massive amounts of comments in there, and then ask generative AI to comment out the code, to actually explain in human language what the flow of the code is,” he says.
Despite aiding in the peripheral parts of open-source development, AI can also hamper the process.
“There’s a level of rigor that open-source maintainers have to maintain that is exhausting. And some people are just abusing open-source maintainers with creating ‘AI slop’ that doesn’t do anything or breaks projects,” says Asghar.
Seth Larson, security developer-in-residence at the Python Software Foundation, echoes Asghar’s sentiment in a recent blog post, writing that he has “noticed an uptick in extremely low-quality, spammy, and LLM-hallucinated security reports to open-source projects,” also pointing to other similar findings. And while Larson observes that these reports appear to be legitimate at first, further investigation uncovers them to be false positives, wasting the time and effort maintainers could have allotted to more vital work.
“The overhead in the open-source community of AI trash being upstreamed is crippling to some projects, to the point where actual things can’t get done,” Asghar says.
The problem lies in AI lacking the needed context for open-source projects, thereby generating pull requests that seem correct but might actually break the codebase. This means that both human oversight and developer expertise are still crucial, particularly with matters involving security in open-source software.
AI also falls short in coming up with the right solution for deeper, more complex issues. As Kralj says in the AI in Action podcast, generative AI can suggest constructs that might work but “will make code unmaintainable [or] incomprehensible, and it is not a good code to then actually commit. Code is an extremely human collaborative thing [and it] needs to be understandable for generations.”
And while he considers generative AI to be a powerful tool, it “is not replacing the creativity that we still expect from human developers to shine through.”
Looking to the future, Asghar envisions the open-source community using AI in more directed ways.
“I’m in the camp that [believes] the agentic ecosystem of micro-LLMs can exist,” says Asghar. “The idea that we can have agents focused on one job and one job well is how we’re going to make sure AI is best for our industry.” As an example, he cites using IBM® Granite™ and InstructLab to build small language models (SLMs) and fine-tune them, equipping these SLMs with knowledge and skills tailored to an open-source project’s codebase.
Kralj illustrates this scenario in the AI in Action podcast, describing a chain of agents where one generates code, another tackles semantic parsing, the next handles static code analysis and the last conducts a full end-to-end black box testing.
This trend is further reflected in GitHub’s 2024 Octoverse report on the state of open-source software. The report notes that “the role of generative AI models in software development has shifted from helping developers write code to a new building block in developing applications,” adding that “there’s a growing need among developers for smaller models with good performance and lower compute costs.”
For now, open-source maintainers and contributors will need to make the most of AI’s capabilities while continuing to remain vigilant against its drawbacks. As Asghar points out, “the world is going to use AI, but we need to find a way to use it responsibly.”