For software to be considered open source, anyone must be able to use, study, modify and redistribute its source code as they see fit and usually at no cost. However, open-source AI’s scope is much broader than open-source software.

AI systems encompass not only the AI models themselves but also the datasets used during training, the model weights and parameters and the source code. This source code includes code for filtering and processing training data, code for model training and testing, any supporting libraries and the inference code for running the model. All these components must adhere to and be made available under open-source AI terms.

The OSI’s open-source AI definition allows the exclusion of unshareable non-public training data, such as personally identifiable information (PII).3 For this type of data, a detailed description must be provided, including its provenance, characteristics and scope, how the data was collected and selected, any labeling procedures and data processing and filtering methods.4