Predicting with Watson Analytics: A quick start guide

Blog Home > Predicting with Watson Analytics: A quick start guide

Predicting with Watson Analytics: A quick start guide

When you know your data well, you can use the Watson Analytics predict feature as a kind of shortcut to analytic insights. You use data that you have already uploaded for your prediction, which is different from an exploration. To learn about exploration, see the “Exploring data with Watson Analytics: A quick start blog,” which you can also find here in the Watson Analytics Community.

Here’s how you can use your data to create predictions.

1. Log in to Watson Analytics. From the Welcome page, click Predict.


2. From the data selection page that opens, select the set of data you want to use by clicking its name.

3. When the Create a Prediction page appears, you see that Watson Analytics has taken your source data and provided you with suggested targets. In this example, the target is churn. You can have up to 5 targets. You can also edit the target fields (for example, adding labels). After you make any changes or if you are satisfied with the suggested target, enter a name for your workbook at the top and click Create Prediction.



4. Watson Analytics builds a prediction workbook by automatically running thousands of algorithms to find the right model and likely predictors.


5. After the workbook is created, Watson Analytics launches a page with a spiral graph and relevant predictors.


6. This is where it gets fun. Click through the predictor level chooser to see what other fields affect your target and look at the results in the visualization tiles to the right. Or add fields to create a combination model, which enables you to drill into the rules behind a decision, and navigate down into the aspects of the decision.




And that’s it. The fastest way to start using Watson Analytics to predict and explain the meaning behind the data. For more tutorials, browse the Watson Analytics Community.



More Predict Stories

Getting Started

Using Customer Behavior Data to Improve Customer Retention

We’ve uploaded some sample data sets in the IBM Watson Analytics community for you to work with as you learn more about Watson Analytics. This expert blog uses the Telco Customer Churn data set. WA_Fn-UseC_-Telco-Customer-Churn What’s in the Telco Customer Churn data set? This data set provides info to help you predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs. A telecommunications company is concerned about the number of customers leaving their landline business for cable competitors. They need to understand who is leaving. Imagine that you’re an analyst at this company and you have to find out who is leaving and why. The data set includes information about: Customers who left within the last month – the column is called Churn Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges Demographic info about customers – gender, age range, and if they have partners and dependents If you don’t have the data set… Go to Download the Telco Customer Churn sample data file. In Watson Analytics, tap Add and upload Telco Customer Churn. The filename is a bit longer: WA_Fn-UseC_-Telco-Customer-Churn.csv. The data set appears as a tile in the Welcome page and you’re ready to get to work. Which customers are likely to leave? To find the answer to this question, tap the WA_Fn-UseC_-Telco-Customer-Churn tile and tap Prediction. You want to learn more about customers who’ve left the company in the past month – this is the target that you want to investigate. The data is in the column called Churn, which is the column we’ve already picked as the target for the prediction. Let’s find out which variables influence customers who leave. Name the prediction and tap Create Prediction. Watson Analytics analyzes the data and generates visualizations to provide insights into this issue. The spiral shows you the top predictors, or key drivers, of churn in color; other drivers appear in gray. The closer the driver is to the center of the spiral, the stronger the predictive strength of the driver is.   The key drivers are tenure, contract, and online security. The visualizations to the right of the spiral show how one driver at a time drives churn. The blue or green dots in the upper right of the visualizations identify which driver is being shown. Tap tenure drives Churn. This new visualization shows that customers who have been customers for shorter periods are more likely to leave. Close this visualization by tapping the X in its upper right corner. You can look at the visualizations for the other drivers on your own. Let’s move on and explore churn in more depth. To the left of the spiral are options for creating visualizations that show more than one driver at a time. Let’s go straight to the deeper and more predictive analysis of the data. Tap Combination. You get a new set of visualizations on the right, including a decision tree, that show the combination of variables that influence your target. Let’s look at the combination of key drivers that influence whether customers leave. Tap the decision tree. Let’s look at a word cloud about the key factors that influence churn. Tap Predictor Importance. Contract, Internet Service, Tenure, and Total Charges are the most important factors. Let’s get some more details on who is leaving so we can predict who is likely to leave in the future. Tap Top Decision Rules. The rules are specific and detailed, and are sorted by accuracy. They currently focus on customers who do not leave. We need to change that. Change the No to Yes. A clearer view emerges. Customers who leave tend to be ones who are on a month-to-month contract, have fiber optic internet service, and have been customers for shorter periods. You can now predict which customers are at risk to churn. Use the decision rules to identify customers who fit the churn profile so you can proactively offer them an incentive to stay.

Getting Started

Quality In, Quality Out

When you add a data set, IBM Watson Analytics reads the data and assesses it for data quality. The data quality score measures the degree to which the data is suitable for predictive analysis. Data sets with low quality scores may be suitable for data exploration even if they are not suitable for predictive analysis. The overall score is an average of the data quality score for every field in the data set, as determined by missing and constant values, influential categories, outliers, imbalance and skewness. In this example from SportsData_NFL_2014_REG_PST_players.csv (which is available here), Watson Analytics excludes fields with more than 25% missing values and fields with constant values. You access the Data Quality Report from a prediction, using the menu in the upper-left corner. The Data Quality Report highlights areas where you could optimize your source data. Adding more rows and columns to the data often improves the quality of the data. The more data that Watson Analytics has available to choose from, the more accurate its results are. Note that you can choose to include a field that Watson Analytics has excluded; for example you may want to use a field that has more than 25% missing values because you know this field is important to your analysis. In this case, use the Predict Menu to select Field Properties, change the role of the field to input or target, and regenerate your prediction. This action may affect the quality of your prediction. How to influence data quality? Do your best to clean your data before you add it into Watson Analytics. List files work best. Some of the typical issues with data sets can be resolved by: Removing blank rows from your data file Removing summary rows and columns from your data file Eliminating column headings and row headings that appear in the same cell Avoiding look up tables Avoiding subtotals and aggregations More tips for cleaning your data before uploading to Watson Analytics: Watson Analytics assumes that the first row of your file contains headers files; descriptive column headers are preferred. You must have a header for every column. The number of columns in the header row is assumed by Watson Analytics to be the number of columns of data. For example, if the first six columns have headers but there are eight columns of data, the last two columns of data are ignored. You cannot have empty columns inserted before the data. You can have empty rows above the data. Empty rows preceding the data are ignored. You cannot have textual rows above the header row. For example, if you have a title or description of what the data is about above the header row, the file is not read appropriately. You cannot have textual rows following the data. For example, a row following the data that says “This information came from…” is considered to be part of the data. More details are in this helpful document: Introduction to Data Loading and Data Quality, including specific conditions that apply to MS Excel and CSV files.


Displaying top predictors and predictive strength

Once you have a new prediction displayed in Watson Analytics, you can click on the View All option near the upper right to display charts with the ranking of the top predictors and their respective predictive strength value. Each predictive strength value is displayed in parentheses after each predictor. To see the statistical details behind each predictor, click on a predictor chart. From the Main Insight blade you can select to show or hide statistical details. //