Getting Started

Quality In, Quality Out

Blog Home > Quality In, Quality Out

Quality In, Quality Out

When you add a data set, IBM Watson Analytics reads the data and assesses it for data quality. The data quality score measures the degree to which the data is suitable for predictive analysis. Data sets with low quality scores may be suitable for data exploration even if they are not suitable for predictive analysis.

The overall score is an average of the data quality score for every field in the data set, as determined by missing and constant values, influential categories, outliers, imbalance and skewness. In this example from SportsData_NFL_2014_REG_PST_players.csv (which is available here), Watson Analytics excludes fields with more than 25% missing values and fields with constant values.

Data Quality Score

You access the Data Quality Report from a prediction, using the menu in the upper-left corner.

Predict Menu

The Data Quality Report highlights areas where you could optimize your source data. Adding more rows and columns to the data often improves the quality of the data. The more data that Watson Analytics has available to choose from, the more accurate its results are.

Note that you can choose to include a field that Watson Analytics has excluded; for example you may want to use a field that has more than 25% missing values because you know this field is important to your analysis. In this case, use the Predict Menu to select Field Properties, change the role of the field to input or target, and regenerate your prediction. This action may affect the quality of your prediction.

How to influence data quality?
Do your best to clean your data before you add it into Watson Analytics. List files work best. Some of the typical issues with data sets can be resolved by:

  • Removing blank rows from your data file
  • Removing summary rows and columns from your data file
  • Eliminating column headings and row headings that appear in the same cell
  • Avoiding look up tables
  • Avoiding subtotals and aggregations

More tips for cleaning your data before uploading to Watson Analytics:

  • Watson Analytics assumes that the first row of your file contains headers files; descriptive column headers are preferred.
  • You must have a header for every column. The number of columns in the header row is assumed by Watson Analytics to be the number of columns of data. For example, if the first six columns have headers but there are eight columns of data, the last two columns of data are ignored.
  • You cannot have empty columns inserted before the data.
  • You can have empty rows above the data. Empty rows preceding the data are ignored.
  • You cannot have textual rows above the header row. For example, if you have a title or description of what the data is about above the header row, the file is not read appropriately.
  • You cannot have textual rows following the data. For example, a row following the data that says “This information came from…” is considered to be part of the data.

More details are in this helpful document: Introduction to Data Loading and Data Quality, including specific conditions that apply to MS Excel and CSV files.

More Getting Started Stories

Getting Started

Getting Started Tutorial for IBM Watson Analytics

We all ask questions about our data every day. Some questions are about a status or situation. Some are about why something happened. In short, when it comes to data, we want to know what is happening, why it’s happening, and what insights need to be communicated with others. IBM® Watson™ Analytics can help you understand your data, find insights that are hidden in your data and provide you with answers to make confident decisions – all on your own. This tutorial uses sample data to walk you through the skills you need to get started. Each chapter covers a different area of Watson Analytics, from importing data to discovering insights to sharing these insights in a dashboard to social media analytics. Take 15-30 minutes to complete a chapter, or complete the entire tutorial in about 90 minutes. Getting started with IBM Watson Analytics Last updated: 2017-11-22

Getting Started

VIDEO: Welcome to Watson Analytics

Take a short tour and see how quickly you can get started on analyzing your data.

Filters

Applying filters in a Watson Analytics dashboard

There’s a few different ways to apply filters to your visualizations in a dashboard. Here’s an overview of the different types of filters and how they work. You can filter visualizations in your dashboard in three main ways: Filter all visualizations in your dashboard Filter one visualization based on a column in the visualization (Keep/Exclude) Filter one visualization based on a column not in the visualization (Local Filter) What’s Filtered Right Now? To get started, here’s a quick way to check filter status. TIP: Click the Filter Status icon in a visualization to see the current filtering that is applied. Applying a global filter across all visualizations in the dashboard Use the data tray to configure a filter that applies to all visualizations in the view. This type of filter applies across all the tabs in the view for any visualization that uses that same data set. Click on a column title in the data tray and then click the filter icon. Select your filter criteria and then click away from the filter menu to close it. Here’s an example of a global filter for the Region column set to only “Mid-Atlantic” and “Northeast”. TIP: The blue line above a column in the data tray means that column has a global filter. Filter a single visualization using the Keep/Exclude option Use the Keep/Exclude filter to display or hide specific data points in a visualization. A data point can be an element or data point displayed in the visualization. For example, a bar in a bar chart, a bubble in a bubble chart, an item in a legend or an item on an axis. Right-click one or more data points in a visualization and then choose Keep or Exclude. The filter is applied to that visualization only. The other visualizations in the view do not update. After setting a filter, you can click the Filter Status icon in the visualization to see the current filter status. Tip: This type of filter can also be configured in the column panel when you edit a visualization. Filter a single visualization for a column not displayed Use the Local filter option to slice your data on a column that’s not displayed in a visualization. This type of filter is available only for visualizations you create in Assemble and does not update any other visualizations in your view. 1.Change the view into Edit mode and then click the Expand icon for the visualization. 2.Drag the column you want to filter on from the data tray to the Local filters option. 3.Select or type the criteria for the filter, and then click away from the filter pane. 4.Click the Collapse icon to return to the view. To verify the filter, click the filter icon on the border of the visualization. For more information and details see the following resources: Documentation: IBM Watson Analytics Docs > Assemble > Filtering Video: How to filter all visualizations in a dashboard or story https://www.youtube.com/watch?v=FiU2d_2PRSE