Important:

IBM Cloud Pak® for Data Version 4.6 will reach end of support (EOS) on 31 July, 2025. For more information, see the Discontinuance of service announcement for IBM Cloud Pak for Data Version 4.X.

Upgrade to IBM Software Hub Version 5.1 before IBM Cloud Pak for Data Version 4.6 reaches end of support. For more information, see Upgrading IBM Software Hub in the IBM Software Hub Version 5.1 documentation.

AutoAI tutorial: Build a Binary Classification Model

This tutorial guides you through training a model to predict whether a customer is likely to subscribe to a bank promotion. In this tutorial, you create an AutoAI experiment that analyzes your data and selects the best model type and algorithms to produce, train, and optimize pipelines, which are model candidates. After reviewing the pipelines, save one as a model, deploy it, and then test it to get a prediction.

Overview of the data sets

If you preview the sample data, you can see it is structured demographic data in rows and columns, and saved in a .csv file.

Preview of training data that contains customer information

The data set is from a direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to train a model that can predict if a new client subscribes (yes/no) a term deposit (variable y).

Tasks overview

This tutorial presents the basic steps for building and training a machine learning model with AutoAI:

  1. Create a project
  2. Create an AutoAI experiment
  3. Training the experiment
  4. Deploy the trained model
  5. Test the deployed model
  6. Creating a batch to score the model

Task 1: Create a project

  1. Download the following files as a .csv file. Click the Download or Raw button then right-click to Save as to save the sample training data file and sample payload file.

  2. From the Projects page to create a new project, select New Project.

    1. Select Create an empty project.
    2. Enter your project name.
    3. Click Create.

Task 2: Create an AutoAI experiment

In this section, define and run the experiment on the banking data to generate pipelines or model candidates.

  1. In the project page, select New asset and choose AutoAI experiment.
  2. Specify a name and optional description for your new experiment, then click Create.
  3. To add a data source, you can choose one of the following:
    1. To download your file locally, upload the training data file, bank-full.csv, from your local computer by dragging the file onto the data panel or by clicking browse and then following the prompts.
    2. If you already uploaded your file to your project, click select from project, then select the data asset tab and choose bank-full.csv.

Task 3: Train the experiment

After adding the data, you choose a prediction column, which represents the problem you are trying to solve with the experiment. For this experiment, we want to know if a new bank customer will subscribe to a bank promotion, represented by the column labeled y.

  1. In Configuration details, select No for the option to create a Time Series Forecast.

  2. Select y as the column to predict. You can see that when you choose a column to predict, AutoAI selects a model type that matches the data. AutoAI analyzes your data and determines that the y column contains Yes/No information, making this data suitable for a binary classification model.

  3. Click Run experiment. As the model trains, you see an infographic that shows the process of building the pipelines. Pipeline creation infographic For a list of algorithms or estimators that are available with each machine learning technique in AutoAI, see AutoAI implementation detail.

  4. After all the pipelines are created, you can compare their accuracy on the Pipeline leaderboard.

Pipeline leaderboard

  1. You can also click Pipeline comparison tab to view differences between pipelines. When you are done reviewing the pipelines, choose one to save as a model.

Metric chart of pipeline comparison

  1. Select the pipeline with Rank 1 and click Save as to create your model. Then, select Create. This saves the pipeline under the Models section in the Assets tab.

Task 4: Deploy the trained model

  1. You can deploy the model from the model details page. You can access the model details page in one of these ways:
    1. Clicking the model’s name in the notification displayed when you save the model.
    2. Open the Assets tab for the project, select the Models section and select the model’s name.
  2. Click Promote to Deployment Space then select or create the space where the model will be deployed.
    1. To create a deployment space:
      1. Enter a name.
      2. Select Create.
  3. After you create your deployment space or selected an existing one, select Promote.
  4. Click the deployment space link from the notification.
  5. From the deployment space do one of the following:
    1. Click *New deployment.
    2. Hover over the model’s name and click the deploy icon Cloud Pak for Data deploy icon.
  6. In the page that opens, complete the fields:
    1. Select Online as the Deployment type.
    2. Specify a name for the deployment.
    3. Click Create.

After the deployment is complete, click the deployment name to view the details page.

Task 5: Test the deployed model

You can test the deployed model from the deployment details page.

  1. On the Test tab of the deployment details page, browse for the payload file bank_payload.csv that you downloaded as part of the set up. The values from the CSV populate the test interface, providing values for the deployment.

Test the deployment

  1. Click Predict and the resulting prediction indicates that a customer with the attributes entered has a low probability of signing up for the bank promotion.

Viewing the prediction for an online deployment

Task 6: Create a batch job to score the model

For a batch deployment, you provide input data, also known as the model payload, in a CSV file. The data must be structured like the training data, with the same column headers. The batch job processes each row of data and creates a corresponding prediction.

In a real scenario, you would submit new data to the model to get a score. However, this tutorial creates and runs a batch deployment using the training data bank-payload.csv that you downloaded as part of the tutorial setup. When deploying a model, you can add the payload data to a project, upload it to a space, or link to it in a storage repository such as a Cloud Object Storage bucket. In this case, you upload the file directly to the deployment space.

Step 1: Setup batch deployment

  1. Open a local copy of the training data.
  2. Delete the y column.
  3. Save the file as bank-payload.csv.
  4. Upload the bank-payload.csv file that you saved locally.

Step 2: Create the batch deployment

Now you can define the batch deployment.

  1. Go to the Assets tab and hover over the model’s name click the deploy icon Cloud Pak for Data deploy icon.
  2. Enter a name a name for the deployment.
    1. Select Batch as the Deployment type.
    2. Choose the smallest hardware specification.
    3. Click Create.

Step 3: Create the batch job

The batch job executes the deployment. To create the job, you must specify the input data and the name for the output file. You can set up a job to run on a schedule or run immediately.

  1. Click New job.
  2. Specify a name for the job
  3. Configure to the smallest hardware specification
  4. (Optional): To set a schedule and receive notifications.
  5. Upload the input file: bank-payload.csv
  6. Name the output file: bank-tutorial-output.csv
  7. Review and click Create to run the job.

Step 4: View the output

When the deployment status changes to Deployed, confirm that the file bank-tutorial-output.csv was created and added to your assets list.

Download the output file and open it in an editor. You can review the prediction results for the customer information submitted for batch processing.

View the batch predictions

For each case, the prediction returned indicates the confidence score of whether a customer will buy a tent.

Next steps

Building an AutoAI experiment

Parent topic: AutoAI overview