The general process of machine learning model deployment and its realization with the IBM Cloud Pak® for Data.
IBM Cloud Pak® for Data is an integrated data and AI platform to support the complete data science lifecycle. It enables rapid prototyping, production-ready scalable model development and deployment, and delivers trust and transparency in artificial intelligence (AI) models. In the previous articles (see links at the end of this post), we already covered the data collection, data governance, and model building steps of the AI Model Lifecycle.
Deployment process
After data scientists build (train) an AI model that meets their performance criteria, they make that model available for other collaborators — such as software engineers, AI Operations, and business analysts — to validate (or quality test) the model before it gets deployed to production.
Once a model has gone through the iterations of development, build, and test, the AI Operations (or Model Ops) team deploys the model into production. Deployment is the process of configuring an analytic asset for integration with other applications or access by business users to serve production workload at scale.
There are two main popular types of deployment:
- Online: A real-time request/response deployment option. When this deployment option is used, models or functions are invoked with a REST API. A single row or multiple rows of data can be passed in with the REST request.
- Batch: A deployment option that reads and writes from/to a static data source. A batch deployment can be invoked with a REST API.
Typically, the enterprise would have separate clusters or namespaces (depending on isolation needs) to support the various stages of training, validation, and deploying AI models. Furthermore, the roles of collaborators at each stage are different. Data scientists leverage the development system for experimentation and exploration, where they collaborate with other data scientists, data engineers, and business SMEs to identify the right data sets and train the best performing AI models. The AI Operations team deploys the trained model into a pre-production environment. Model validators and Quality Assurance (QA) engineers validate the model in pre-production, and once validated and approved, the AI Operations team deploys the model to production.
Four steps of the deployment process
- Step 1: Once a model is trained, the assets (typically code assets and metadata) are checked into the enterprise’s Git repository, which, in turn, triggers the CI/CD (continuous integration/continuous delivery) approval process. Additionally, any relevant project assets (data, scripts, notebooks, flows) are copied to the build system. The main collaborators in the build system are IT and DevOps Engineers, part of the AI Operations team.
- Step 2: The CI/CD process initiates a set of actions in the build system to run certain scripts (or possibly notebooks, although less desirable) to validate the data and push the trained binary model to the enterprise’s deployable assets repository (binary repository, other).
- Step 3: Checking the trained model into the enterprise’s binary repository triggers other CI/CD approval processes, which invoke a set of unit tests in the test system to validate the performance and quality of the trained model against the holdout (unseen) test data. Sometimes the test system is referred to as quality assurance (or validation or user acceptance test system), but all these refer to similar tasks of independently validating the performance and quality of the trained AI models on unseen test data.
- Step 4: Once the model passes the quality tests, the CI/CD process initiates deployment of the model in production system at scale to support production workloads.
AI model deployment with IBM Cloud Pak for Data
The deployment process described above can be effectively implemented with the IBM Cloud Pak for Data functionality. Figure 1 below outlines a representative enterprise flow for model training, validation, and deployment as it integrates with CI/CD processes in the Cloud Pak for Data.
IBM Watson Machine Learning (WML) is the component of the platform that enables deployment of trained AI models at scale. WML provides APIs that can be called to store, deploy, update, and delete models.
Various systems (Dev/Build/Test/Prod) in Figure 1 may consist of different hardware — like different proportions of GPUs (Graphics Processing Units) and CPUs (Central Processing Units). The Dev system may consist primarily of GPUs to enable efficient experimentation and algorithm exploration, while the Test and Prod systems may consist mainly of CPUs to optimize cost.
Note the Data Pipeline illustrated at the bottom of Figure 1 that shows data collected from various data sources, processed through ETL and governance pipelines to deliver business-ready data that would be used for training and testing the AI models. It is important that production workload data gets processed through the same data pipeline so the final data schema and structure is consistent with the data used for training and testing.
Watson Machine Learning supports three deployment options:
- Online: A real-time request/response deployment option. When this deployment option is used, models or functions are invoked with a REST API. A single row or multiple rows of data can be passed in with the REST request.
- Batch: A deployment option that reads and writes from/to a static data source. A batch deployment can be invoked with a REST API.
- Virtual: Virtual deployment provides the capability to save a model for deployment in CoreML runtime. CoreML is not included with Watson Studio.
In Watson Machine Learning, data scientists deploy models (and various analytics assets) into a deployment space that is a packaging environment that’s used to organize analytics assets for deployment. A deployment space is a useful concept as it separates the roles of data scientists and AI Operations (ML/DevOps) engineers, who bring models to production.
The deployment flow for most asset types consists of the following steps:
- Create a deployment space.
- Associate a deployment space with a project.
- Save the asset (model, PMML, SPSS, etc.) into project repository.
- Promote the asset to the deployment space.
- Configure deployment (online or batch).
- Test the deployed asset.
- Integrate the deployed asset with another application (via REST API).
Learn more
IBM Cloud Pak for Data empowers data scientists and ML engineers to simplify AI deployment into production at scale. It supports a hybrid multicloud environment — available both on-premises and in the cloud — for flexibility in training, deploying, and optimizing machine learning models.
IBM Watson Machine Learning supports Git integration and the concept of deployment spaces for packaging of relevant AI model assets, enabling CI/CD processes to promote assets to test (pre-prod) and production environments for validation and deployment at scale.
Furthermore, Cloud Pak for Data empowers the separation of roles and responsibilities for training, testing, and deploying AI models.
To learn the other phases of AI Model Lifecycle Management, please check out the blog series linked below or see the detailed white paper.
- AI Model Lifecycle Management: Overview
- AI Model Lifecycle Management: Collect Phase
- AI Model Lifecycle Management: Organize Phase
- AI Model Lifecycle Management: Deploy Phase
- AI Model Lifecycle Management: Monitor Phase (Technical Perspective)
- AI Model Lifecycle Management: Monitor Phase (Customer Perspective)