Overview
In the GSDB database, transaction data is stored in two tables: GOSALES.ORDER_HEADER and GOSALES.ORDER_DETAILS. Information about items that were returned is stored in the table GOSALES.RETURNED_ITEM. You will add all three table sources to your mining flow.
Then, you will join the tables to produce a single table that can provide a Prediction operator with details about each order, the items included in each order, and whether an item was returned.
A stratified sample contains an approximately equal amount of data points for each possible value. You should use a stratified sample when you want to predict a result that is rare in your training data. For example, suppose you want to predict a result that is present in 1,000 out of the 1,000,000 records of your training data. Your prediction model can predict that the rare result will never be present, and the model would be 99.9% accurate for your data. However, this model would lack any real predictive power.
In this tutorial, you train your prediction model with a stratified sample that contains an equal number of items that were returned and items that were not returned. By using a stratified sample, your prediction model can better identify the factors that correlate with a customer returning an item.
Tasks in this lesson
Procedure
To add Table Source operators:
Procedure
To add Table Join operators:
Procedure
To add a Random Split operator: