Introduction to modeling
A model is a set of rules, formulas, or equations that can be used to predict an outcome based on a set of input fields or variables. For example, a financial institution might use a model to predict whether loan applicants are likely to be good or bad risks, based on information that is already known about past applicants.
The ability to predict an outcome is the central goal of predictive analytics, and understanding the modeling process is the key to using flows in Watson Studio.
This example uses a decision tree model, which classifies records (and predicts a response) using a series of decision rules. For example:
IF income = Medium AND cards <5 THEN -> 'Good'
While this example uses a CHAID (Chi-squared Automatic Interaction Detection) model, it is intended as a general introduction, and most of the concepts apply broadly to other modeling types in Watson Studio.
To understand any model, you first need to understand the data that goes into it. The data in this example contains information about the customers of a bank. The following fields are used:
|Credit_rating||Credit rating: 0=Bad, 1=Good, 9=missing values|
|Age||Age in years|
|Income||Income level: 1=Low, 2=Medium, 3=High|
|Credit_cards||Number of credit cards held: 1=Less than five, 2=Five or more|
|Education||Level of education: 1=High school, 2=College|
|Car_loans||Number of car loans taken out: 1=None or one, 2=More than two|
The bank maintains a database of historical information on customers who have taken out loans with the bank, including whether or not they repaid the loans (Credit rating = Good) or defaulted (Credit rating = Bad). Using this existing data, the bank wants to build a model that will enable them to predict how likely future loan applicants are to default on the loan.
Using a decision tree model, you can analyze the characteristics of the two groups of customers and predict the likelihood of loan defaults.
This example uses the flow named Introduction to Modeling, available in the example project you imported previously. The data file is tree_credit.csv.
Let's take a look at the flow.
- Open the Example Project.
- Scroll down to the Modeler flows section, click View all, and select the Introduction to Modeling flow.