Watson Analytics has recently been expanded with a new set of visualizations that can help you find more informative answers to your data questions quickly. In this blog post, I highlight these capabilities in the Freemium version of Watson Analytics which (among others) help users better visualize network data.
Network data is a very common, but an underused data type. Network data conceptually consists of a collection of items and a collection of connections between a pair of items. Items, in this case, could be people on a social network site and a connection could exist if person A has friended person B. Or, items could be warehouse locations and a connection means there is a direct supply route between both locations. Many real world problems can be modeled as networks, and should be presented back to the user as visual network as well.
However, typical business intelligence tools don’t typically support querying or visualizing network data. To illustrate how Watson Analytics can help, I’ll use a dataset obtained from the US Bureau of Transportation Statistics that describes airline departure and arrival delays for all US domestic flights, which can be downloaded for free here. This dataset has, for each US departing flight, the flight carrier, flight origin and destination, as well as amount and reason of delay. In this case we’ll use part of the data for December 2014, which consists of a little more than 500 thousand rows.
Since the dataset contains departure and arrival locations in separate columns, I should be able to extract a visual route map of each airline by treating a single flight as a connection between an origin and a destination city. Loading up the data in Watson Analytics and starting a new data exploration quickly gets me to the following screen.
Rather than having to shape the data into a network or loading the data up in an external network visualization tool, I can ask Watson Analytics for the connections between each origin and destination city in the data. Clicking the most relevant suggestion shows me all connections between all domestic US airlines origins and destinations. In passing, Watson Analytics has also detected that destination state forms a hierarchy with destination city and has auto included state names for each city to help disambiguate them. Each city (called a node) is indicated with a blue dot, and a line (also called an edge) is drawn between two cities if the dataset contained a flight connecting both of them. The size of each dot is proportional to the number of in and outbound connections, so hubs tend to have a larger size. The visual weight of each edge is proportional to the number of rows, but we could easily change this to the total incurred minutes of delay by changing the line weight mapping.
Althouh this network diagram is not very readable because there is a very dense cluster of major cities in the center, I can already see Alaskan cities on the periphery of the large central blob. This means that airports like Wrangell, AK, are at least 4 stops away from any other airport in the US. However, a good way to break apart a complex diagram like this is to filter the data down by carrier. In the graphic below I’ve extracted route information for three airlines by simply using the filter capabilities at the bottom of the user interface. You can clearly see the differences in carriers size of network, hubs and geographic region.
In this case I’ve used a network visualization to show connections between cities where the data contained both origin city and destination city fields. In the same way, you could use Watson Analytics to show a diagram of connections between friends in a social network, or if that interests you, connections between genes in a genetic network. A different but also useful case for network visualization is where you have two different sets of items, and you want to show how one type of item such as a customer relates to another type of item such as a product. These types of networks are technically called bipartite networks because their set of nodes falls into two sets and you only see connections between items in different sets, not among items in the same set. In this case, I can choose to map carriers and destination cities, to highlight geographic differences between carriers. I can easily modify my visualization by dragging the “Destination City” attribute to the ‘From’ slot in the visualization and dragging “Carrier” to the ‘To’ slot.
One group of nodes (green) represents the carriers, the other (blue) represents cities. This visualization clearly shows geographical differences between airlines. State based airlines like Alaska or Hawaiian are obviously catering to these remote states. SkyWest caters to most small airports in the northwestern states, while American Eagle and ExpressJet operate in smaller airports in the south east. Bipartite graphs are useful if you want to see how different data items associate with other types of data items. These could be age groups and products, tweets and locations or stores and products. Computing layouts for network diagrams is very computationally expensive, but if there are at most a few thousand items, the network diagram can be applicable to almost all datasets.
We also deployed two other visualizations that help users quickly determine the most salient items in a set. The wordcloud is a well known visualization type that uses a compact arrangement of words to show the most salient terms in a larger set. Individual words can be colored by a different attribute if needed. In this case we can use the wordcloud to show the carriers with the highest average arrival delay:
The packed bubble is a highly scalable visualization that compactly represents data items as circles, where the radius of the circle represents a value of interest. Here we show the largest arrival delays for all 300 departure cities. The below visualization shows the average departure delay in Macon, GA, for December 2014 was 67 (!) minutes.
I hope I have shown you how to get more out of your data using these new visualization types. They’re freely available to try for everyone, so please let us know what you think of them.
Frank Van Ham
Master Inventor. Information Visualization and Visual Interaction Expert. IBM Watson Analytics Architect