What is open-source AI?

Open book with glowing light flowing in all directions

Authors

Rina Diane Caballar

Staff Writer

IBM Think

Cole Stryker

Staff Editor, AI Models

IBM Think

What is open-source AI?

Open-source AI refers to artificial intelligence systems that can be used, examined, altered and distributed for any purpose, without having to request permission.

These freedoms align with the definition of open-source AI established by the Open Source Initiative (OSI), which is regarded globally as the steward of open-source principles and policies.¹

The rise of generative AI helped catalyze the era of open-source AI. According to a report by Economist Impact, two-thirds of the large language models (LLMs)—a category of foundation models commonly used to craft genAI applications like chatbots and coding assistants—released in 2023 were open-source.²

Open-source AI versus open-source software

For software to be considered open source, anyone must be able to use, study, modify and redistribute its source code as they see fit and usually at no cost. However, open-source AI’s scope is much broader than open-source software.

AI systems encompass not only the AI models themselves but also the datasets used during training, the model weights and parameters and the source code. This source code includes code for filtering and processing training data, code for model training and testing, any supporting libraries and the inference code for running the model. All these components must adhere to and be made available under open-source AI terms.

The OSI’s open-source AI definition allows the exclusion of unshareable non-public training data, such as personally identifiable information (PII).³ For this type of data, a detailed description must be provided, including its provenance, characteristics and scope, how the data was collected and selected, any labeling procedures and data processing and filtering methods.⁴

Open-source AI versus open weights

Weights are the central parameters of pretrained models. They’re learned during training and determine how a model interprets new data and makes predictions.

Open weights are publicly shared and typically available under open-source licenses, providing a peek into a deep learning model’s final state. And while they signify gradual advancement toward transparency in AI, open weights still don’t offer the full picture that open-source AI does. Without the training data or training code, others can’t scrutinize or recreate the training process.

Benefits of open-source AI

According to a recent IBM study, more than 80% of surveyed IT decision-makers reported that at least a quarter of their company’s AI platforms or solutions are based on open source. And the enterprises harnessing open-source ecosystems are more likely to be achieving positive ROI than those that aren’t.

In addition to driving ROI, open-source AI offers these key advantages:

● Accessibility

● Collaborative innovation

● Cost-efficiency

● Customization

● Transparency

Accessibility

Open-source AI breaks down barriers to entry, especially for those new to the field. It also provides access to organizations that are unable to invest significant financial resources on AI development, such as small businesses or companies without specialized expertise.

Collaborative innovation

Community is at the heart of open source, with AI developers, researchers, organizations and other stakeholders working together to continuously improve AI technologies. This collective effort leads to learning and sharing, opening up opportunities to build on the work of others and spurring innovation.

Cost-efficiency

Open-source AI models are generally free to use. This allows enterprises to save on the initial costs of developing and training their own models or procuring them from closed-source providers with high subscription pricing or licensing fees.

Customization

Organizations can change open-source AI systems on their terms, giving them greater control. They can tailor these systems to their particular needs and use cases, fine-tuning open-source AI models on their own business data and optimizing these models for specific tasks.

Transparency

The open nature of open-source AI cultivates AI transparency. Knowing how an AI system was built and trained and how it makes decisions helps instill confidence and trust, especially for industries where AI outcomes can impact lives, such as healthcare, human resources and the justice system.

This transparency also makes it easier to pinpoint bugs, identify biases and detect security flaws for AI developers to quickly address. Additionally, the visibility into open-source AI’s inner workings allows for better auditability by policymakers in sectors like government and finance where regulatory compliance is paramount.

Challenges of open-source AI

Despite its many benefits, open-source AI comes with limitations. Here are some challenges associated with open-source AI:

● Lack of dedicated or timely support

● Possibility of misuse

● Security vulnerabilities

Lack of dedicated or timely support

Unlike proprietary models, open-source AI models don’t often have set response times for urgent issues, a dedicated support team to help resolve problems or consistent timelines for releasing security patches or updates. Enterprises must take it upon themselves to monitor their AI applications and create their own support procedures.

Possibility of misuse

Because anyone can use open-source AI for whatever their aims are, it has the potential to be employed for malicious purposes. Threat actors can apply open-source AI to automate cyberattacks, generate deepfakes or spread misinformation and disinformation.

Security vulnerabilities

While open-source AI is transparent, its visibility exposes security vulnerabilities that bad actors can exploit. Again, the responsibility falls on organizations to establish guardrails around their open-source AI solutions.

The latest AI News + Insights  

Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter.

Examples of open-source AI models

Myriad open-source AI models exist, most of which can be accessed on Hugging Face or through their GitHub repositories. Here are some popular ones:

● Amber

● Crystal

● DeepSeek-R1

● Falcon-7B and Falcon-40B

● Granite

● OLMo

● Pythia

● Qwen

● T5

Amber

Amber is a 7-billion-parameter English language model developed by LLM360, an initiative for community-owned AI through open-source large model research and development. Amber is based on Meta’s Llama architecture and is available under the Apache 2.0 license. According to the OSI, Amber complies with the OSI’s open-source AI definition.¹

Crystal

Crystal is another large language model from LLM360 with a parameter size of 7 billion. It’s released under the Apache 2.0 license and excels in balancing coding and natural language processing (NLP) tasks. According to the OSI, Crystal complies with the OSI’s open-source AI definition.¹

DeepSeek-R1

DeepSeek-R1 is a reasoning model from Chinese AI startup DeepSeek. It uses a Mixture of Experts (MoE) machine learning architecture and was trained using large-scale reinforcement learning to refine its reasoning abilities. It’s available under the MIT license.

Falcon-7B and Falcon-40B

Falcon-7B and Falcon-40B are causal decoder-only models with 7 and 40 billion parameters, respectively. Developed by researchers at the UAE’s Technology Innovation Institute (TII), both models were trained on TII’s own RefinedWeb, a huge dataset containing filtered English web data. Falcon-7B and Falcon-40B are available under the Apache 2.0 license.

Granite

IBM® Granite™ is a series of enterprise-ready multimodal AI models. They’re built on a foundation of open-source instruction datasets with permissive licenses alongside internally curated synthetic datasets. The models are available under the Apache 2.0 license.

The Granite foundation models consist of small language models with reasoning capabilities designed for agentic workflows, a vision model specialized on vision tasks for document and image understanding, speech models for automatic speech recognition and translation, and code models for code generative tasks.

OLMo

OLMo is a family of language models from Ai2, a nonprofit AI research institute. The models come in parameter sizes of 1, 7, 13 and 32 billion. The models, training code, evaluation suite to reproduce OLMo’s results and training data used across each phase—including pretraining, mid-training and post-training—are all freely available under the Apache 2.0 license. According to the OSI, OLMo complies with the OSI’s open-source AI definition.¹

Pythia

Developed by nonprofit research lab EleutherAI, Pythia is a suite of LLMs ranging in size from 14 million to 12 billion parameters and released under the Apache 2.0 license. All associated data, code, models and checkpoints are publicly available, along with instructions to replicate training, with the aim of furthering AI interpretability, AI ethics and transparency. According to the OSI, Pythia complies with the OSI’s open-source AI definition.¹

Qwen

Qwen is a series of LLMs from Chinese cloud computing company Alibaba Cloud. Qwen includes language models, a vision language model and variants optimized for audio, coding and math. Most Qwen models are available under the Apache 2.0 license, though larger models have proprietary licenses.

T5

T5 is a text-to-text transfer transformer model developed by researchers at Google. It excels in a broad array of NLP tasks and is released under the Apache 2.0 license. According to the OSI, T5 complies with the OSI’s open-source AI definition.¹

The OSI has also analyzed Meta’s Llama 2, Microsoft’s Phi-2, Mistral’s Mixtral and xAI’s Grok and concluded that these models don’t comply with the OSI’s open-source AI definition “because they lack required components and/or their legal agreements are incompatible with the Open Source principles.”¹

Open-source AI tools and frameworks

Working on open-source AI projects can get overwhelming. Here are some well-known open-source AI tools that can help:

● Keras

● OpenCV

● PyTorch

● Scikit-learn

● TensorFlow

Keras

Keras is an application programming interface (API) written in Python for building, training and evaluating deep learning models. It’s compatible with and can run on top of JAX, PyTorch or TensorFlow frameworks.

OpenCV

OpenCV is an open-source computer vision library operated by the Open Source Vision Foundation. It houses more than 2,500 optimized algorithms for real-time vision applications, including image recognition, image classification, object detection and object tracking.

PyTorch

PyTorch is a framework originally developed by Meta and now part of the Linux Foundation. It supports dynamic neural networks and GPU acceleration, integrates seamlessly with Python libraries and packages, offers an intuitive interface and has minimal framework overhead.

Scikit-learn

Scikit-learn is a Python module for machine learning. It features algorithms for classification, clustering and regression, among others, and offers tools for data processing, model selection and evaluation and creating visualizations.

TensorFlow

TensorFlow is a platform for building and deploying machine learning models. Created by Google, TensorFlow contains a library of datasets and models, APIs for different programming languages and tools for optimizing machine learning workflows. It also has a robust open-source community and helps people build their machine learning expertise through books, curated curriculums and online courses.

Mixture of Experts | 2 January, episode 88

Decoding AI: Weekly News Roundup

Join our world-class panel of engineers, researchers, product leaders and more as they cut through the AI noise to bring you the latest in AI news and insights.

Watch all episodes of Mixture of Experts

Start realizing ROI: A practical guide to agentic AI

Discover ways to get ahead, successfully scaling AI across your business with real results.

Resources

Start realizing ROI: A practical guide to agentic AI

Discover ways to get ahead, successfully scaling AI across your business with real results.

How AI agents and assistants can benefit your organization

Dive into this comprehensive guide that breaks down key use cases, core capabilities, and step-by-step recommendations to help you choose the right solutions for your business.

Top strategic technology trends for 2025: Agentic AI

Download this Gartner® research to learn the potential opportunities and risks of agentic AI for IT leaders and how to prepare for this next wave of AI innovation.

Level up your AI expertise

Access our full catalog of over 100 online courses by purchasing an individual or multi-user subscription today, enabling you to expand your skills across a range of our products at a low price.

From AI projects to profits: How agentic AI can sustain financial returns

Learn how organizations are shifting from launching AI in disparate pilots to using it to drive transformation at the core.

Explore IBM Granite

IBM® Granite® is a family of open, performant and trusted AI models tailored for business and optimized to scale your AI applications. Explore language, code, time series and guardrail options.

IBM is named a Leader in Data Science & Machine Learning

Learn why IBM has been recognized as a Leader in the 2025 Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms.

IBM AI Academy

Led by top IBM thought leaders, the curriculum is designed to help business leaders gain the knowledge needed to prioritize the AI investments that can drive growth.

The 2025 CEO’s guide: 5 mindshifts to supercharge business growth

Activate these five mindshifts to cut through the uncertainty, spur business reinvention, and supercharge growth with agentic AI.

Unlock the power of generative AI and ML

Learn how to confidently incorporate generative AI and machine learning into your business.

How to thrive in this new era of AI with trust and confidence

Dive into the three critical elements of a strong AI strategy: creating a competitive edge, scaling AI across the business and advancing trustworthy AI.

Footnotes

¹ The Open-Source AI Definition 1.0, Open Source Initiative, Accessed 12 May 2025

² Open sourcing the AI revolution, Economist Impact, 2024

³ Answers to frequently asked questions, Open Source Initiative, 29 October 2024

⁴ The Open Source AI Definition – 1.0, Open Source Initiative, Accessed 12 May 2025

What is open-source AI?

Authors

What is open-source AI?

Open-source AI versus open-source software

Open-source AI versus open weights

Benefits of open-source AI

Accessibility

Collaborative innovation

Cost-efficiency

Customization

Transparency

Challenges of open-source AI

Lack of dedicated or timely support

Possibility of misuse

Security vulnerabilities

The latest AI News + Insights

Examples of open-source AI models

Amber

Crystal

DeepSeek-R1

Falcon-7B and Falcon-40B

Granite

OLMo

Pythia

Qwen

T5

Open-source AI tools and frameworks

Keras

OpenCV

PyTorch

Scikit-learn

TensorFlow

Decoding AI: Weekly News Roundup

Share

Resources

Footnotes

The latest AI News + Insights