Blog

What's happening? What's new? What can I do? Find answers to these questions in the blog.

Archive Results

Getting Started

Quality In, Quality Out

When you add a data set, IBM Watson Analytics reads the data and assesses it for data quality. The data quality score measures the degree to which the data is suitable for predictive analysis. Data sets with low quality scores may be suitable for data exploration even if they are not suitable for predictive analysis. The overall score is an average of the data quality score for every field in the data set, as determined by missing and constant values, influential categories, outliers, imbalance and skewness. In this example from SportsData_NFL_2014_REG_PST_players.csv (which is available here), Watson Analytics excludes fields with more than 25% missing values and fields with constant values. You access the Data Quality Report from a prediction, using the menu in the upper-left corner. The Data Quality Report highlights areas where you could optimize your source data. Adding more rows and columns to the data often improves the quality of the data. The more data that Watson Analytics has available to choose from, the more accurate its results are. Note that you can choose to include a field that Watson Analytics has excluded; for example you may want to use a field that has more than 25% missing values because you know this field is important to your analysis. In this case, use the Predict Menu to select Field Properties, change the role of the field to input or target, and regenerate your prediction. This action may affect the quality of your prediction. How to influence data quality? Do your best to clean your data before you add it into Watson Analytics. List files work best. Some of the typical issues with data sets can be resolved by: Removing blank rows from your data file Removing summary rows and columns from your data file Eliminating column headings and row headings that appear in the same cell Avoiding look up tables Avoiding subtotals and aggregations More tips for cleaning your data before uploading to Watson Analytics: Watson Analytics assumes that the first row of your file contains headers files; descriptive column headers are preferred. You must have a header for every column. The number of columns in the header row is assumed by Watson Analytics to be the number of columns of data. For example, if the first six columns have headers but there are eight columns of data, the last two columns of data are ignored. You cannot have empty columns inserted before the data. You can have empty rows above the data. Empty rows preceding the data are ignored. You cannot have textual rows above the header row. For example, if you have a title or description of what the data is about above the header row, the file is not read appropriately. You cannot have textual rows following the data. For example, a row following the data that says “This information came from…” is considered to be part of the data. More details are in this helpful document: Introduction to Data Loading and Data Quality, including specific conditions that apply to MS Excel and CSV files.

Resources

Loading data: A Watson Analytics quick start guide

Watson Analytics takes your data, reviews and processes it. It then presents you with interpretations you can understand and visualizations that show you trends and patterns you might not be aware of. All you have to do is upload your data to get started. This quick start guide shows you how to do that. 1. Log in to Watson Analytics and from the Welcome page, click Add. 2. Click Drop here or tap to browse. 3. Browse for your data, select it, and click Create to start the upload.   4. When the upload completes, the Welcome page will be populated with your new set of data. Click it to see the options for creating your new workbook. To learn how to explore your data, look for the quick start guide for exploring data here in the Watson Community, along with other guides and tutorials.   //

Getting Started

Introduction to Data Loading and Data Quality

Are you having trouble loading your data into IBM Watson Analytics? Or do you have a low Data Quality score? Some of the typical issues with data sets can be resolved by: Removing blank rows from your data file Removing summary rows and columns from your data file Eliminating column headings and row headings that appear in the same cell Avoiding look up tables Avoiding subtotals and aggregations Note, if a Microsoft Excel file contains several worksheets, the first worksheet is added as a data set. More details are in this helpful document, including specific conditions that apply to MS Excel and CSV files.