Introduction

Learn how to create a machine learning model in IBM Cloud Pak for Data to predict coffee sales of the day based on factors such as temperature, day of the week, holiday, promotion, location, and store type.

This paper uses the following tools:

IBM Cloud Pak for Data
AutoAI in IBM Watson Machine Learning (WML) for a no-code approach
A Jupyter notebook in IBM Watson Studio for a self-paced manual approach

Note: This document describes a proof of concept solution only.

Note: The IBM Cloud Pak for Data menu options and screens are subject to change. Refer to the official documentation for the latest versions.

alt text The architecture diagram above shows the following:

IBM Cloud Pak for Data is installed on IBM Z and LinuxONE.
The dataset required for the model creation is uploaded from the IBM Cloud Db2 account in IBM Cloud Pak for Data using connectors
Following this, IBM Data Refinery analyzes, visualises and refines the dataset in an integrated Jupyter Notebook (code editor).
A model is developed through a no-code, self-paced approach.
Created model is deployed for testing.
Note: Retrieving the dataset from the IBM Cloud account is optional. You can also upload the dataset to IBM Cloud Pak for Data from your local machine.

Requirements

You need to install theses products:

IBM Cloud Pak for Data self-hosted version on your OpenShift Cluster. Installation information is available here.
IBM Watson Machine Learning (WML). Installation information is available here.
IBM Watson Studio service in your installed version of IBM Cloud Pak for Data. Installation information is available here.

Concepts

IBM Cloud Pak for Data is an integrated platform that simplifies data collection, organization, and analysis, enabling organizations to gain insights and make informed decisions. In addition, IBM Cloud Pak for Data supports the entire AI lifecycle by providing tools and capabilities that help organizations develop, deploy, and manage AI solutions more effectively. Its seamless integration of data and AI tools into a single platform significantly reduces complexity, enhances efficiency, and accelerates time-to-insight, making it a formidable solution for AI-driven innovation.

IBM Watson Studio is a collaborative environment for data scientists, developers, and domain experts to prepare, analyze, and model data. Whether you’re an expert or a novice, you can find a tool to suit your needs from the wide range of open source, graphical canvas, and automatic builder tools.

IBM Watson Machine Learning (WML) provides a full range of tools and services so that you can build, train, and deploy Machine Learning models.

IBM Data Refinery - this service reduces the amount of time it takes to prepare data. Use pre-defined operations in data flows to transform large amounts of raw data into consumable, quality data that’s ready for analysis.

Scope

This paper outlines the development of a predictive machine learning model using two distinct approaches:

no-code
self-paced

Each approach follows five key steps.

No-code: You leverage AutoAI to develop our model without requiring any previous coding experience.

Self-paced: You use a Jupyter Notebook, available with IBM Watson Studio, to build your own model in a self-paced way.

Steps to create a machine learning model:

Problem need to be solved: Predict the coffee sales of the day based on factors such as temperature, day of the week, holiday, promotion, location, and store type.
Data preparation: Collect the data needed to solve the problem from the cloud database and refine, visualize, and analyze the data as required.
Algorithm selection: Select the most appropriate algorithm for the model.
Model training: Train the model to enable it to learn from the data, improve accuracy, adapt to the particular task, and solve the problem.
Model evaluate: Use the testing set to evaluate the model performance after it has been trained. To improve accuracy, refine the data again, modify the algorithm, and so on as necessary.

Check out the videos here to find out more about how to create a machine learning model.

Next step: Prepare the data