March 19, 2018 By IBM Blog 5 min read

A predictive Machine Learning model from Build to Retrain

This post is an excerpt from our solution tutorial that walks you through the process of building a predictive machine learning model, deploying it as an API to be used in applications, testing the model and retraining the model with feedback data. All of this happening in an integrated and unified self-service experience on IBM Cloud.

In this post, the famous Iris flower data set is used for creating a machine learning model to classify species of flowers.

In the terminology of machine learning, classification is considered an instance of supervised learning, i.e. learning where a training set of correctly identified observations is available.

For a deep dive into the differences between supervised vs. unsupervised learning, check out “Supervised vs. Unsupervised Learning: What’s the Difference?

Import data to a project

A project is how you organize your resources to achieve a particular goal within Watson Data Platform. Your project resources can include data, collaborators, and analytic tools like Jupyter notebooks and machine learning models.

You can create a project to add data and open a data asset in the data refiner for cleansing and shaping your data.

Create a project:

  1. Go to the IBM® Cloud catalog and select Data Science Experience under the Data & Analytics section. Create the service. Click on the Get Started button to launch the Data Science Experience dashboard.

  2. Create a New Project (Projects > All Projects > New Project). Add a name say iris_project and optional description for the project.

  3. Leave the Restrict who can be a collaborator checkbox unchecked as there’s no confidential data.

  4. Under Define Storage, Click on Add and choose an existing object storage service or create a new one (Select Lite plan > Create). Hit Refresh to see the created service.

  5. Under Define compute engine, Click on Add and choose an existing Spark service or create a new one.

  6. Click Create. Your new project opens and you can start adding resources to it.

Import data:

As mentioned earlier, you will be using the Iris data set. The Iris dataset was used in R.A. Fisher’s classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository. This small dataset is often used for testing out machine learning algorithms and visualizations. The aim is to classify Iris flowers among three species (Setosa, Versicolor or Virginica) from measurements of length and width of sepals and petals. The iris data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant.

Courtesy: DataCamp

Download iris_initial.csv which consists of 40 instances of each class. You will use the rest 10 instances of each class to re-train your model.

  1. Under Assets in your project, click the Find and Add Data icon

2. Under Load, click on browse and upload the downloaded iris_initial.csv.

3. Once added, you should see iris_initial.csv in the Data assets section of the project. Click on the name to see the contents of the data set.

Build a machine learning model

  1. Back in the Assets overview, under Models click on New model. In the dialog, add iris-model as name and an optional description.

  2. Under Machine Learning Service section, click on Associate a Machine Learning service instance to bind a machine learning service (Lite plan) to your project. Click Reload.


3. Select Model builder as your model type and Manual to manually create a model. Click Create.

For the automatic method, you rely on automatic data preparation (ADP) completely. For the manual method, in addition to some functions that are handled by the ADP transformer, you can add and configure your own estimators, which are the algorithms used in the analysis.

4. On the next page, select iris_initial.csv as your data set and click Next.

5. On the Select a technique page, based on the data set added, Label columns and feature columns are pre-populated. Select species (String) as your Label Col and petal_length (Decimal) and petal_width (Decimal) as your Feature columns.

6. Choose Multiclass Classification as your suggested technique.


7. For Validation Split configure the following setting:

  • Train: 50%,

  • Test 25%,

  • Holdout: 25%

8. Click on Add Estimators and select Decision Tree Classifier, then Add.

You can evaluate multiple estimators in one go. For example, you can add Decision Tree Classifier and Random Forest Classifier as estimators to train your model and choose the best fit based on the evaluation output.

9. Click Next to train the model. Once you see the status as Trained & Evaluated, click Save.


10. Click on Overview to check the details of the model.

Your journey doesn’t halt here.Following the steps below, you will deploy your model as an API, test it and retrain by creating a feedback data connection.

Was this article helpful?

More from Cloud

Enhance your data security posture with a no-code approach to application-level encryption

4 min read - Data is the lifeblood of every organization. As your organization’s data footprint expands across the clouds and between your own business lines to drive value, it is essential to secure data at all stages of the cloud adoption and throughout the data lifecycle. While there are different mechanisms available to encrypt data throughout its lifecycle (in transit, at rest and in use), application-level encryption (ALE) provides an additional layer of protection by encrypting data at its source. ALE can enhance…

Attention new clients: exciting financial incentives for VMware Cloud Foundation on IBM Cloud

4 min read - New client specials: Get up to 50% off when you commit to a 1- or 3-year term contract on new VCF-as-a-Service offerings, plus an additional value of up to USD 200K in credits through 30 June 2025 when you migrate your VMware workloads to IBM Cloud®.1 Low starting prices: On-demand VCF-as-a-Service deployments begin under USD 200 per month.2 The IBM Cloud benefit: See the potential for a 201%3 return on investment (ROI) over 3 years with reduced downtime, cost and…

The history of the central processing unit (CPU)

10 min read - The central processing unit (CPU) is the computer’s brain. It handles the assignment and processing of tasks, in addition to functions that make a computer run. There’s no way to overstate the importance of the CPU to computing. Virtually all computer systems contain, at the least, some type of basic CPU. Regardless of whether they’re used in personal computers (PCs), laptops, tablets, smartphones or even in supercomputers whose output is so strong it must be measured in floating-point operations per…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters