March 19, 2018 By IBM Blog 5 min read

A predictive Machine Learning model from Build to Retrain

This post is an excerpt from our solution tutorial that walks you through the process of building a predictive machine learning model, deploying it as an API to be used in applications, testing the model and retraining the model with feedback data. All of this happening in an integrated and unified self-service experience on IBM Cloud.


In this post, the famous Iris flower data set is used for creating a machine learning model to classify species of flowers.

In the terminology of machine learning, classification is considered an instance of supervised learning, i.e. learning where a training set of correctly identified observations is available.

For a deep dive into the differences between supervised vs. unsupervised learning, check out “Supervised vs. Unsupervised Learning: What’s the Difference?

Import data to a project

A project is how you organize your resources to achieve a particular goal within Watson Data Platform. Your project resources can include data, collaborators, and analytic tools like Jupyter notebooks and machine learning models.

You can create a project to add data and open a data asset in the data refiner for cleansing and shaping your data.

Create a project:

  1. Go to the IBM® Cloud catalog and select Data Science Experience under the Data & Analytics section. Create the service. Click on the Get Started button to launch the Data Science Experience dashboard.


  2. Create a New Project (Projects > All Projects > New Project). Add a name say iris_project and optional description for the project.

  3. Leave the Restrict who can be a collaborator checkbox unchecked as there’s no confidential data.

  4. Under Define Storage, Click on Add and choose an existing object storage service or create a new one (Select Lite plan > Create). Hit Refresh to see the created service.

  5. Under Define compute engine, Click on Add and choose an existing Spark service or create a new one.

  6. Click Create. Your new project opens and you can start adding resources to it.

Import data:

As mentioned earlier, you will be using the Iris data set. The Iris dataset was used in R.A. Fisher’s classic 1936 paper, The Use of Multiple Measurements in Taxonomic Problems, and can also be found on the UCI Machine Learning Repository. This small dataset is often used for testing out machine learning algorithms and visualizations. The aim is to classify Iris flowers among three species (Setosa, Versicolor or Virginica) from measurements of length and width of sepals and petals. The iris data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant.


Courtesy: DataCamp

Download iris_initial.csv which consists of 40 instances of each class. You will use the rest 10 instances of each class to re-train your model.

  1. Under Assets in your project, click the Find and Add Data icon
     


2. Under Load, click on browse and upload the downloaded iris_initial.csv.
 


3. Once added, you should see iris_initial.csv in the Data assets section of the project. Click on the name to see the contents of the data set.

Build a machine learning model

  1. Back in the Assets overview, under Models click on New model. In the dialog, add iris-model as name and an optional description.

  2. Under Machine Learning Service section, click on Associate a Machine Learning service instance to bind a machine learning service (Lite plan) to your project. Click Reload.
     


     

3. Select Model builder as your model type and Manual to manually create a model. Click Create.

For the automatic method, you rely on automatic data preparation (ADP) completely. For the manual method, in addition to some functions that are handled by the ADP transformer, you can add and configure your own estimators, which are the algorithms used in the analysis.

4. On the next page, select iris_initial.csv as your data set and click Next.

5. On the Select a technique page, based on the data set added, Label columns and feature columns are pre-populated. Select species (String) as your Label Col and petal_length (Decimal) and petal_width (Decimal) as your Feature columns.

6. Choose Multiclass Classification as your suggested technique.
 


 

7. For Validation Split configure the following setting:

  • Train: 50%,

  • Test 25%,

  • Holdout: 25%

8. Click on Add Estimators and select Decision Tree Classifier, then Add.

You can evaluate multiple estimators in one go. For example, you can add Decision Tree Classifier and Random Forest Classifier as estimators to train your model and choose the best fit based on the evaluation output.

9. Click Next to train the model. Once you see the status as Trained & Evaluated, click Save.


 

10. Click on Overview to check the details of the model.

Your journey doesn’t halt here.Following the steps below, you will deploy your model as an API, test it and retrain by creating a feedback data connection.

Was this article helpful?
YesNo

More from Cloud

Attention new clients: exciting financial incentives for VMware Cloud Foundation on IBM Cloud

4 min read - New client specials: Get up to 50% off when you commit to a 1- or 3-year term contract on new VCF-as-a-Service offerings, plus an additional value of up to USD 200K in credits through 30 June 2025 when you migrate your VMware workloads to IBM Cloud®.1 Low starting prices: On-demand VCF-as-a-Service deployments begin under USD 200 per month.2 The IBM Cloud benefit: See the potential for a 201%3 return on investment (ROI) over 3 years with reduced downtime, cost and…

24 IBM offerings winning TrustRadius 2024 Top Rated Awards

2 min read - TrustRadius is a buyer intelligence platform for business technology. Comprehensive product information, in-depth customer insights and peer conversations enable buyers to make confident decisions. “Earning a Top Rated Award means the vendor has excellent customer satisfaction and proven credibility. It’s based entirely on reviews and customer sentiment,” said Becky Susko, TrustRadius, Marketing Program Manager of Awards. Top Rated Awards have to be earned: Gain 10+ new reviews in the past 12 months Earn a trScore of 7.5 or higher from…

IBM Tech Now: April 8, 2024

< 1 min read - ​Welcome IBM Tech Now, our video web series featuring the latest and greatest news and announcements in the world of technology. Make sure you subscribe to our YouTube channel to be notified every time a new IBM Tech Now video is published. IBM Tech Now: Episode 96 On this episode, we're covering the following topics: IBM Cloud Logs A collaboration with IBM watsonx.ai and Anaconda IBM offerings in the G2 Spring Reports Stay plugged in You can check out the…

IBM Newsletters

Get our newsletters and topic updates that deliver the latest thought leadership and insights on emerging trends.
Subscribe now More newsletters