The Iris flower data set is a classic, well-known data set example for data mining and data exploration. The data set contains 150 records of three different types (classes) of iris flowers with numeric values for petal length and width and sepal length and width.
This data set is traditionally used for classification and prediction – to see which features of an iris can identify the flower as a certain type of iris. The values for length and width can be used to classify an iris into one of three iris types: Iris setosa, Iris versicolor, or Iris virginica. Visually exploring this data also lets you see the grouping (clustering) of the records into these three different types of irises.
I used IBM Watson Analytics to explore this classic data set by employing the different features available in Explore, Predict, and Assemble.
You can find a copy of the data set here:
Preparing and uploading the data set
I downloaded a copy of the data set, converted it to .csv format, and uploaded it to Watson Analytics. I used the following steps.
- Download the “data” file and rename it to “iris-data.csv”.
- Open the file in a text editor and add the following five column titles, separated by commas, as the first line in the file.
- Save the file.
- Upload the file to Watson Analytics.
On the Welcome page, tap Add, then browse and select the file from your local computer.
My new data set was added to the Welcome page.
I used the Refine feature to view the new data set. Refine enables you to format and re-encode the data if needed. If you make any changes, you can save the new version as a separate refined data set.
In my case, on the Welcome page, I tapped the new data set and then tapped Refine.
You can also use Refine as a way to browse and inspect your uploaded data. View the columns and data values in the data set, view a quality score for each column, and view thumbnail visualizations of the data value distributions for each column.
In this case, the data set was ready to go as-is, so I didn’t need to make any changes. I closed the Refine page and return to the Welcome page.
I used Explore to visualize the relationship between petal length, petal width and class.
On the Welcome page, I tapped the data set to see a set of starting points. Right away, Watson Analytics suggested a number of things based on the key attributes of length and width compared to class.
I tapped one of the starting points to visualize and explore the relationship between length, width and class.
When I chose petal-length by class, Watson Analytics provided a visualization of how each class has a different average petal length.
When I chose “What is the relationship between petal-length and petal-width by class,” I saw how average petal length and width group the flowers into the three distinct classes of setosa, versicolor, and virginica. I could have used the Columns > Size by option to display another feature of the data, such as sepal-length or sepal-width, in the same visualization.
What type of iris is that? Setosa? Versicolor? Virginica? I used Predict to see how petal length and width are predictors of iris type (class) by following these steps.
- On the Welcome page, tap the data set and then tap Predict.
- Enter a name for the new Prediction.
- Set the prediction target to “class.” (Remove any other fields as target)
- Tap Create Prediction.
I hovered over the spiral diagram to see which fields are strong predictors of the target class. Petal-width (88.0%) and petal-length (84.7%) turned out to be the top predictors of how to classify an iris into one of the three types.
To get more details about the prediction, I tapped the main insight to see a distribution of iris class compared to petal width.
I used Assemble to create a multimedia infographic about the data set. I combined text, images, videos and shapes with different visualizations of the data and:
- Created an interactive word cloud of the iris names. Assign a data value, such as petal-width, to set the text size of the iris names in the word cloud.
- Added a scatter plot of petal-width and petal-length colored by type of iris (class).
- Added a grid to display a summary of the numeric values from the data set.
- Added images of iris flowers showing the petal and sepal width and length.
Any visualizations I added to the view were automatically linked together and updated when I interacted with the data points.
When I selected a word in the word cloud, it updated the other visualizations and highlighted the related records.
When I selected an individual data point in the scatter plot, it highlighted just that record and updatd the grid with the numeric values for that point.
So there you go. The classic iris data set: explored and visualized in Watson Analytics. Try it out with the iris data or upload your own data to explore and visualize.
If you haven’t used Watson Analytics yet, now’s a great time sign up for free at www.watsonanalytics.com.
Data set reference:
Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Iris flower data set, https://en.wikipedia.org/wiki/Iris_flower_data_set