Flashback to the 1970s, when cars were big, heavy and used lots of gas. The Auto MPG sample data set is a collection of 398 automobile records from 1970 to 1982. It contains attributes like car name, MPG, number of cylinders, horsepower and weight. With Watson Analytics, I was able to use modern capabilities to quickly explore and predict the relationships between retro MPG, horsepower and weight data. You can use this data to practice some useful analysis techniques and visualizations that you can then apply to your own data sets.
Here’s a quick overview of the data and its relationships. I created this image in Watson Analytics Assemble with some key visualizations that I saved from Explore.
Want to try this same data set? Download a copy of the data from the University of California (UCI) Machine Learning Repository: https://archive.ics.uci.edu/ml/datasets/Auto+MPG
The actual data is in the “auto-mpg.data” file and the column names are in the “auto-mpg.names” file.
The raw data file needs a title row before uploading, so I used a text editor to add the following column names as the first row:
After that, I saved and named the file in .csv format. For example: “auto-mpg.data.csv”. The file looked like this.
Exploring the data and relationships
The first thing I did was use Explore to visualize the relationship between MPG, horsepower and weight. Unfortunately, more horsepower means a heavier car and lower fuel efficiency (MPG). Here’s an example I created in Explore to show this downward trend with some notes I added.
Quick Tip: Turn on data labels to see the actual car names. Use the Show item labels option to display labels at each data point.
Another way to analyze horsepower is to look at the main attributes of a car’s engine: number of cylinders and engine size (displacement). A more powerful engine (as measured by horsepower) usually means more cylinders and a larger displacement value. This relationship is shown in the following visualization.
Here, I combined horsepower and weight while also displaying the data grouped into three clusters based on number of cylinders (4, 6, or 8). I assigned the bubble size to represent engine displacement.
How about looking at where these cars were manufactured? Here’s a tree map visualization showing the breakdown by where some of the automobiles were manufactured (field = origin). In this case, I did some preprocessing to extract the car make from the combined make and model text. I also used Refine to re-encode the values for origin (1, 2, 3) into “North America”, “Europe”, and “Asia”. More on this in a future blog dedicated to Refine.
Verifying the trends in Predict
With Predict, I was able to verify the main trends I saw when exploring the data: weight, horsepower and engine displacement all impact MPG. The main prediction screen summarizes these impacts with color-coded visualizations for the target (MPG) and each of the main predictor attributes.
Here’s a closer look at the details that display when you hover over the predictors in the spiral diagram. I took some screenshots, combined them in an image editor, and then added some text to create the following basic infographic.
The Predict feature also provided some deeper and more statistical insights into these findings.
Weight has a negative impact on MPG (negative correlation)
I clicked the top predictor, wt drives mpg, to view the following main insight. This insight displays the negative correlation between MPG and weight by showing different groups of weight values. Color intensity is used to denote the related ranges of MPG values.
More horsepower means more weight (positive correlation)
Here’s an example of the positive correlation between horsepower and weight with some added notes.
I wanted to see an approximation of the correlation, so I turned on the following option to display a smoothed line that represents the relationship between weight and MPG.
A smoothed line displays as a fit to the data.
More horsepower also means lower MPG (negative correlation)
The flip side to lots of horsepower usually means more weight, too, which of course means lower MPG. For this visualization, I added some notes to emphasize the negative correlation.
Here’s the same visualization about horsepower and MPG, but with the smoothed line displayed to approximate the correlation.
Combining visualizations and infographics to communicate these findings
After using Explore and Predict, I was ready to jump into Assemble so I could combine multiple visualizations and create some infographics.
I started by creating this word cloud of all the car names in the data set, filtered for 1970 to 1974, sized by horsepower and color-coded by number of cylinders.
Here’s a combination I created with a packed bubble and a word cloud visualization. I filtered this data to show only cars with 8 cylinder engines from 1970 to 1974.
In this next example, I took one of the bubble plot visualizations I saved from Explore and added it into a new view. I then enhanced the visualization with images that I found on Wikipedia of some of the exact cars in the data set.
Displaying web pages in a view
Watson Analytics also enables you to include web pages in a view so you can build interactive “information and data mashups.” Here are some examples, but more on this in a future blog.
In this example, I added a word cloud and the Wikipedia search web page into the same view. I used this as a quick way to look up and research cars from the data set by copying and pasting car names into the search box of the embedded web page.
By blending data and web pages, I created a dynamic and interactive mashup of data and information.
What does all this mean for you?
As I mentioned earlier, exploring publicly available data is a very good way to practice using Watson Analytics so you can confidently use it on your own data in the future. In addition, even historic data can be used to help make business decisions. In this case, the insights I found could be used today to help the auto industry build more fuel efficient cars as emissions regulations continue to tighten. Or, the auto industry could compare this data with their own current MPG data to identify where there have been improvements and where more work needs to be done. You can access the data from https://archive.ics.uci.edu/ml/datasets/Auto+MPG
Imagine what you can do with Watson Analytics. It’s easy to get started. Just visit www.watsonanalytics.com and sign up for free.
Go pro and get more of what you love about Watson Analytics. Learn more by viewing the Watson Analytics Professional Demo.
Data set reference
Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Images from Wikipedia
“1974 Pontiac Grand Safari” by Josephew at English Wikipedia. Licensed under CC BY-SA 3.0 via Wikimedia Commons: https://commons.wikimedia.org/wiki/File:1974_Pontiac_Grand_Safari.jpg#/media/File:1974_Pontiac_Grand_Safari.jpg
“1973-1978 Honda Civic 5-door hatchback 01” by OSX – Own work. Licensed under Public Domain via Wikimedia Commons: https://commons.wikimedia.org/wiki/File:1973-1978_Honda_Civic_5-door_hatchback_01.jpg#/media/File:1973-1978_Honda_Civic_5-door_hatchback_01.jpg
“Pontiac Catalina front” by IFCAR – Own work. Licensed under Public Domain via Wikimedia Commons: https://commons.wikimedia.org/wiki/File:Pontiac_Catalina_front.jpg#/media/File:Pontiac_Catalina_front.jpg