Governed ModelOps with Anaconda and IBM Cloud Pak® for Data

By | 3 minute read | July 2, 2021

Using open source packages and libraries during the development stage for artificial intelligence and machine learning (AI/ML) models can enable data scientists to capitalize on the latest innovations. But these packages and libraries also pose security and governance challenges for enterprises.

Given the excitement and growth in data science and AI, companies around the world have been developing AI/ML models on a large scale over the past few years. However, the reality is that many of these models never make it to production. Ultimately, this is because of difficult requirements, such as legal compliance, that are needed before models can be put into production.

Unmanaged Open Source technology comes with risk

According to Anaconda’s 2020 State of Data Science report, developers and system administrators cite IT security standards as their biggest blocker to getting models into production. And a concerning 30% of respondents who have knowledge of their company’s security practices stated their organization does not have any mechanism in place to secure open source data science. Given the prevalence of Open Source software in production workflows, this creates risks that can deliver far-reaching negative impacts.

Often, admins need to ensure that their developers and data scientists use only approved packages in enterprise projects. In addition, enterprises may have their own proprietary packages that also need to be made available to data scientists. How can enterprises operationalize machine learning models for production use and align with necessary compliance and internal requirements?

Supporting enterprise data science

IBM and Anaconda have partnered to integrate Anaconda Team Edition with IBM Cloud Pak® for Data to address these challenges. Adding the benefits of Anaconda to your ModelOps strategy allows admins to block, exclude, and include packages according to enterprise standards. You can also keep vulnerabilities and unreliable software out of your data science and machine learning pipeline.

A model is more than just a set of weights — it includes the code, libraries, and packages used to build and execute the inference operations of the model. Because of this, it’s essential to ensure that the specific set of libraries and versions of those libraries are consistent between the model developer’s environment, the model validator’s environment, and the final deployment platform. This is a genuine challenge of reproducibility that the Anaconda package ecosystem attempts to address effectively.

With Anaconda Repository for IBM Cloud Pak for Data, admins can govern access to open source packages based upon users, groups, and roles. To ensure developers and data scientists only use approved packages, customers may block access to packages on the internet from the IBM Cloud Pak for Data environment, forcing all package loading to go through Anaconda Repository for IBM. Anaconda Repository for IBM caches packages originating from the internet and allows admins to upload a customer’s proprietary packages alongside them. Packages are served up securely and with consistent performance.

As visualized in the picture below, custom runtime environments can be defined to load packages from Conda channels served by Anaconda Repository for IBM Cloud Pak for Data, to run Notebooks and Scripts using these packages. Alternatively, code in Notebooks or Scripts can load approved packages via Conda.

Joint customers using IBM Anaconda Repository for IBM Cloud Pak for Data have already begun securely managing and governing open source packages and libraries for production models. Examples of these use cases are opioid abuse tracking, COVID case analysis, and data analytics for education programs. The possibilities for AI and ML models are endless, but ultimately make for a quicker way to take advantage of open source innovation without data or security risks.

Next steps

Join this upcoming webinar on July 14 to learn more about governed ModelOps and how IBM Anaconda Repository for IBM Cloud Pak for Data can help you manage and secure open source data science in the enterprise. Register now.