Watson Analytics – classic
When looking at the recent blog on network diagram visualizations in the Watson Analytics community. I was interested to see how a network diagram can show relationships even in dense datasets. Looking at the dataset in more detail, I was curious as to which origin cities had more delayed or cancelled flights and it got me thinking about people’s reactions to the flight delays. To find out, I compared delayed flights from November 2015 through January 2016 and sentiment on flight delays for the same time period found within Social Media. For the analysis, I chose larger cities. Using the Refine feature, I selected flights that were at least 30 minutes delayed or longer for the departure time and selected the major cities. The major city that had the most occurrences of flight delays was found to be Chicago, which may not be a surprise since it is a major hub and weather in this timeframe is not ideal. As you can see, many hubs saw flight delays, including Atlanta, Denver and Los Angeles. If there was a delay over 30 minutes, the average delay time was 84 minutes for Chicago, but it was not the highest. The lowest average delays times were in Baltimore, Las Vegas and Los Angeles. Now let’s compare this to volume of social conversations about flight delays for the cities. I created a social media project, which captures conversations about flight delays (in English) as shown in the following diagram. I also created Themes for major cities to get a sense of what cities people are referencing when talking about delayed flights. After running the analysis on social media, I am able to see that Los Angeles, New York and Chicago have the largest conversation volume for the same time period. Let's focus on sentiment using a network diagram. The network diagram works well for showing how negative conversations are, which is expected as we are evaluating flight delay conversations. More importantly, you can select any of the nodes in the diagram to highlight the relationship as we see below. By clicking Los Angeles, I see Los Angeles is predominantly negative; this is surprising considering the occurrences happen more for Chicago. I can also look at the cities by the number of mentions and filter that based on negative sentiment as shown in the next visualization, which shows that Los Angeles is much more negative than the other major cities. Next I want to see all of the visualizations in a single view. By pinning each of these visualizations to the collection, Watson Analytics makes easy for me to create dashboard with key elements from both datasets. From this dashboard, it is easy to decide that the Los Angeles audience is much less tolerant about flight delays. If you were working in the airline industry, you may want to pay particular attention to the Los Angeles market when dealing with delayed flights. While I have combined social data with non-social data in the past, I have not done it with this much ease. I strongly recommend that you use analytics with social data together with your transactional data. The observations that you can derive with social and non-social data can be very interesting. You will be able to vet the data and insights better than with a single data set. Try it out yourself!
There’s a few different ways to apply filters to your visualizations in a dashboard. Here’s an overview of the different types of filters and how they work. You can filter visualizations in your dashboard in three main ways: Filter all visualizations in your dashboard Filter one visualization based on a column in the visualization (Keep/Exclude) Filter one visualization based on a column not in the visualization (Local Filter) What’s Filtered Right Now? To get started, here’s a quick way to check filter status. TIP: Click the Filter Status icon in a visualization to see the current filtering that is applied. Applying a global filter across all visualizations in the dashboard Use the data tray to configure a filter that applies to all visualizations in the view. This type of filter applies across all the tabs in the view for any visualization that uses that same data set. Click on a column title in the data tray and then click the filter icon. Select your filter criteria and then click away from the filter menu to close it. Here’s an example of a global filter for the Region column set to only “Mid-Atlantic” and “Northeast”. TIP: The blue line above a column in the data tray means that column has a global filter. Filter a single visualization using the Keep/Exclude option Use the Keep/Exclude filter to display or hide specific data points in a visualization. A data point can be an element or data point displayed in the visualization. For example, a bar in a bar chart, a bubble in a bubble chart, an item in a legend or an item on an axis. Right-click one or more data points in a visualization and then choose Keep or Exclude. The filter is applied to that visualization only. The other visualizations in the view do not update. After setting a filter, you can click the Filter Status icon in the visualization to see the current filter status. Tip: This type of filter can also be configured in the column panel when you edit a visualization. Filter a single visualization for a column not displayed Use the Local filter option to slice your data on a column that’s not displayed in a visualization. This type of filter is available only for visualizations you create in Assemble and does not update any other visualizations in your view. 1.Change the view into Edit mode and then click the Expand icon for the visualization. 2.Drag the column you want to filter on from the data tray to the Local filters option. 3.Select or type the criteria for the filter, and then click away from the filter pane. 4.Click the Collapse icon to return to the view. To verify the filter, click the filter icon on the border of the visualization. For more information and details see the following resources: Documentation: IBM Watson Analytics Docs > Assemble > Filtering Video: How to filter all visualizations in a dashboard or story https://www.youtube.com/watch?v=FiU2d_2PRSE
Recently announced at IBM Interconnect is the Analytics Exchange beta available in Bluemix. The Analytics Exchange gives you access to free and open data in categories such as economy and business, leisure, transportation, and others. The way to access the Analytics Exchange is by registering for BlueMix here with your IBM ID. Once you’re signed in with your IBM ID, click Dashboards. Then click Work with Data under “Data & Analytics.” Click Exchange in the left hand column. This will bring you to the Analytics Exchange. To access a data set, simply click one of the topics that interest you. I’ve clicked Environment, which returns 31 results. I’m going to select the data set Country Statistics: Refined Petroleum Products – Consumption. This brings up a brief description of the data, a preview, column details, and more information about where the data set came from. To explore the data set in Watson Analytics, click Explore Data in the top right. You’ll be asked to accept terms and conditions and then if you want to open the data in Watson Analytics. The data is now in your Watson Analytics instance. From there it can be analyzed like any other data set in Watson Analytics. We hope you enjoy access to these exciting data sets that can help you get up and running on Watson Analytics so you can find solutions to your business problems.
If you are Irish or just support the St. Patrick’s Day rally cry, find out what is top of mind with your compatriots. I happen to know university campuses are all a buzz with the green spirit because I drove by some this morning and saw a good amount of green hats milling about. But what are the topics that are capturing the interest of the leprechauns of this day? I did a quick peek at Watson Analytics topic suggestions to find out what people are chattering about. Below you will see some interesting Topic Suggestions from Watson Analytics for Social Media for St. Patrick’s Day. After a quick analysis, the demographics show a close, almost a 1:1 ratio, for males and females but there were slightly more females talking about St. Patrick’s Day. That is interesting. I was not expecting this demographic. Here is a breakdown of the things people were talking about. I am a little surprised that “luck” is in the top three conversation themes. I was expecting beer, wear green or parade to be up on this list. I guess you don’t know what you don’t know until you do the analysis. I am going to stop here and put on my green cap, shamrock and paint the town green. I encourage you to discover your own “Pot ‘o Gold” with Watson Analytics! Let me know what you find with your comments on the community forum! It’s fun, easy and insightful If you are not already using Watson Analytics, sign up for free here!
Watson Analytics has recently been expanded with a new set of visualizations that can help you find more informative answers to your data questions quickly. In this blog post, I highlight these capabilities in the Freemium version of Watson Analytics which (among others) help users better visualize network data. Network data is a very common, but an underused data type. Network data conceptually consists of a collection of items and a collection of connections between a pair of items. Items, in this case, could be people on a social network site and a connection could exist if person A has friended person B. Or, items could be warehouse locations and a connection means there is a direct supply route between both locations. Many real world problems can be modeled as networks, and should be presented back to the user as visual network as well. However, typical business intelligence tools don’t typically support querying or visualizing network data. To illustrate how Watson Analytics can help, I’ll use a dataset obtained from the US Bureau of Transportation Statistics that describes airline departure and arrival delays for all US domestic flights, which can be downloaded for free here. This dataset has, for each US departing flight, the flight carrier, flight origin and destination, as well as amount and reason of delay. In this case we’ll use part of the data for December 2014, which consists of a little more than 500 thousand rows. Since the dataset contains departure and arrival locations in separate columns, I should be able to extract a visual route map of each airline by treating a single flight as a connection between an origin and a destination city. Loading up the data in Watson Analytics and starting a new data exploration quickly gets me to the following screen. Rather than having to shape the data into a network or loading the data up in an external network visualization tool, I can ask Watson Analytics for the connections between each origin and destination city in the data. Clicking the most relevant suggestion shows me all connections between all domestic US airlines origins and destinations. In passing, Watson Analytics has also detected that destination state forms a hierarchy with destination city and has auto included state names for each city to help disambiguate them. Each city (called a node) is indicated with a blue dot, and a line (also called an edge) is drawn between two cities if the dataset contained a flight connecting both of them. The size of each dot is proportional to the number of in and outbound connections, so hubs tend to have a larger size. The visual weight of each edge is proportional to the number of rows, but we could easily change this to the total incurred minutes of delay by changing the line weight mapping. Althouh this network diagram is not very readable because there is a very dense cluster of major cities in the center, I can already see Alaskan cities on the periphery of the large central blob. This means that airports like Wrangell, AK, are at least 4 stops away from any other airport in the US. However, a good way to break apart a complex diagram like this is to filter the data down by carrier. In the graphic below I’ve extracted route information for three airlines by simply using the filter capabilities at the bottom of the user interface. You can clearly see the differences in carriers size of network, hubs and geographic region. In this case I’ve used a network visualization to show connections between cities where the data contained both origin city and destination city fields. In the same way, you could use Watson Analytics to show a diagram of connections between friends in a social network, or if that interests you, connections between genes in a genetic network. A different but also useful case for network visualization is where you have two different sets of items, and you want to show how one type of item such as a customer relates to another type of item such as a product. These types of networks are technically called bipartite networks because their set of nodes falls into two sets and you only see connections between items in different sets, not among items in the same set. In this case, I can choose to map carriers and destination cities, to highlight geographic differences between carriers. I can easily modify my visualization by dragging the “Destination City” attribute to the ‘From’ slot in the visualization and dragging “Carrier” to the ‘To’ slot. One group of nodes (green) represents the carriers, the other (blue) represents cities. This visualization clearly shows geographical differences between airlines. State based airlines like Alaska or Hawaiian are obviously catering to these remote states. SkyWest caters to most small airports in the northwestern states, while American Eagle and ExpressJet operate in smaller airports in the south east. Bipartite graphs are useful if you want to see how different data items associate with other types of data items. These could be age groups and products, tweets and locations or stores and products. Computing layouts for network diagrams is very computationally expensive, but if there are at most a few thousand items, the network diagram can be applicable to almost all datasets. We also deployed two other visualizations that help users quickly determine the most salient items in a set. The wordcloud is a well known visualization type that uses a compact arrangement of words to show the most salient terms in a larger set. Individual words can be colored by a different attribute if needed. In this case we can use the wordcloud to show the carriers with the highest average arrival delay: The packed bubble is a highly scalable visualization that compactly represents data items as circles, where the radius of the circle represents a value of interest. Here we show the largest arrival delays for all 300 departure cities. The below visualization shows the average departure delay in Macon, GA, for December 2014 was 67 (!) minutes. I hope I have shown you how to get more out of your data using these new visualization types. They’re freely available to try for everyone, so please let us know what you think of them. Frank Van Ham Master Inventor. Information Visualization and Visual Interaction Expert. IBM Watson Analytics Architect
If you're a fan of the lighter side of using Watson Analytics, such as using it to pick fantasy soccer teams or learn more about craft beer preferences, the IBM Watson Analytics team has created a new blog just for that kind of information. It will also be the place where you can learn about Watson Analytics events or even where Watson Analytics might be appearing in your area. This new Watson Analytics Blog will help you get quick answers to the following questions: What’s happening? Visit the blog to find out when the latest Watson Analytics webinars, presentations, events, expert demonstrations, and live sessions are being held. What’s new? Visit the blog to learn about new additions to Watson Analytics and other product announcements. How do I…? If you're looking for videos and other resources that show the lighter side of using Watson Analytics, you can find them on the blog. What can I do? The new blog has all kinds of articles that show you how Watson Analytics enables people like you to find new insights and hidden patterns in your data--from sales win-loss data to regional firework sales. Come check it all out here. And, don't forget to keep coming back here to the Community to get in-depth information on how to use Watson Analytics and all of its great features and add-ons. Are you ready for easy analytics? How can you start finding new insights and hidden patterns in your data? Sign up for Watson Analytics for free here.
Watson Analytics for Social Media was introduced recently and has certainly created a buzz. We have had a few requests for a quick tour on creating a social project. Ask and you will be answered! We have just posted a little video that takes you through a whirlwind tour of using Social Media analysis within Watson Analytics. The tour should get you started very quickly. It is fun, easy and you can follow along the tour yourself! Need more? To help you with along with your own analysis with Watson Analytics for Social Media, we have provide a few tips. When first approaching Watson Analytics for Social Media, you may be confused how to use Topics and Themes; if this is the case, have a look at this video! If you are curious as how or when to use Context Terms with Topics or Themes, the video tip here may be useful! Similarly, if you are interested in how to refine your Topic or Theme with Exclude Terms this “vidtip” is for you! Topic Suggestions is a great feature, but if you are using your inside voice and asking, “I am not getting any Topic Suggestions, what can I do?” We have put together 4 videos to help you get Topic Suggestions working for you. If you are looking to improve Include Terms and are hoping to get Topic Suggestions working better with improved Include Terms check out this video. Changing the dates configuration can help you with Topic Suggestion, see how here! Trying to improve Context Terms in light of Topic Suggestions? If so, check out this tip. Exclude Terms can be a little funky with Topic Suggestions, the video here may help “de-funk” Topic Suggestions in light of Exclude Terms. Need more tips? Just ask, post a topic or ask a question in the Discussion Forum!
Realizing goals with Watson Analytics! The general session on day 3 of IBM InterConnect 2016 started with the theme of “Personal Transformation” by introducing a human interest story. Simon Wheatcroft inspired the audience with his own personal story of how a person who is blind developed into an Ultra Marathon runner. The audience was captivated with how Simon used Runkeeper’s fitness app to help him with the challenges of his runs, which included safety, and navigation. This story explained Runkeeper’s approach to technology, which includes how they put data to work to make a runner’s life better. The discussion talked about the data and technology infrastructure used to achieve this feat. At the end of the segment with Simon and Runkeeper, both were asked “What next?” Simon responded that his next challenge was to tackle new terrains in Namibia’s desert landscape. Runkeeper described their “new terrain” is to move toward personalization. When you attend general sessions like these, the art of the possible usually seems a little too futuristic. In this case, Runkeeper’s next “terrain,” as it was stated, is very realistic considering what they said they're already doing. Let’s look at what was discussed and what could be attainable. Know your peeps Runkeeper is targeting personalization, which effectively means know your “peeps” (people). Simon Wheatcroft illustrated how diversified Runkeeper’s peeps can be. Simon started by running a soccer pitch, which is less than a mile, progressed to a mile, to 10 miles to 100 miles and beyond. Knowing your peeps is more than just other runners. It could be a person walking, biking or just working out. So, there are various degrees of fitness and various fitness interests. In the session, Runkeeper’s CEO Jason Jacobs described that they were using analytics from IBM Cloud Data Services through dashDB, IBM Graph and IBM Cloudant. Over a 100 terabytes of data powered the analytics on a customer base of more than 50 million globally. Data being tracked included run times, routes, preference, distances and locations, which allows for compelling geolocation data and enables Runkeeper to analyze routes and make route recommendations. The recommendations how challenging a route will be based on elevation changes and distance, geographic heat maps and geospatial data analysis. Runkeeper already has registration data to combine with the geospatial and activity data. With these key information resources already in place, Runkeeper is set up very well with its goals. Runkeeper also talked about using music data as a form of personalization. By adding features based on music playlists in their app, Runkeeper can easily accommodate for these music data goals. Also described was social media analytics through Watson Analytics as to who was talking about their Runkeeper activities. The data presented in the general session was demographic data identifying predominantly males speaking about the Global5K run. This analysis was just the tip of the iceberg! What was not shown in the session was the function of what can be learned from social data. As stated earlier, people involved with fitness are varied and evolving and you need technology that addresses the evolution in the form of discovery. Social data enables Runkeeper to at their whole audience, not just the individual, and even those audiences beyond the app. Using social data and behavioral data, the technology needs to be dynamic to truly deliver a full personalized experience. Let’s explore this discovery with Watson Analytics for Social Media Watson Analytics for Social Media allowed for easy capture of demographic data, but what were these people talking about? It has to be more than just Runkeeper or the Global5K run. The initial analysis is broken down as a base line on running and jogging and really did not capture the true nature of the conversations. We used the topic suggestions feature to add hashtag keywords, and ways to breakdown the conversations in the form of themes in Watson Analytics for Social Media. By including these additional metrics, we gain greater insight into what people are talking about on subsequent runs on this topic. We now know what the audience is talking about. These insights drive at the heart of personalization. What matters to your peeps? How do you connect them? We see that the top five discussions in these conversations are “completing a challenge, time or run,” “walking,” “biking,” “working out,” “marathon” and conversations on “personal best.” While some of these topics are about running, it spans a variety of interests and goals. Additionally, you can assess sentiment with Watson Analytics for Social Media. By combining these social insights with the registration, behavior and route data, the possibilities really start to get interesting. There are opportunities to segment the users on conversation and behaviors. Using the predictive capabilities of Watson Analytics for a new level of personalization By using Watson Analytics and weather data, Runkeeper is looking at a very rich opportunity to personalize experiences and recommendations for users. Runkeeper can pull the data together in Watson Analytics to push to new insights for personalization. This matches directly with what Runkeeper described around being a trusted advisor including biometrics, training data, how music affects performance, social response, route data and weather data to bring it into the user for the ultimate experience. This type of acumen will certainly accelerate Runkeeper’s ambition to “enable everyone to take part in fitness.” It makes the next step to “transformational personalization” very attainable and brings the experience for Simon and his fitness cohorts to new levels.
Did you notice the new visualization types in Explore? These are available to all users and are recommended to you based on your data. Watson Analytics will generate starting points for you, or you can pick them manually. Take a closer look at the visualizations that have been added to Explore. Packed bubble You can use a packed bubble visualization when you want to show relationships among columns that contain numeric values, such as revenue. It is similar to the bubble visualization but the bubbles are tightly packed instead of spread over a grid. A packed bubble visualization shows a large amount of data in a small space. The bubbles are in different sizes and colors. Because a packed bubble visualization uses area to represent numbers, it is best for positive values. If your data set includes negative values, use Color by to show the positive and negative numbers in different colors. If your data set has many negative numbers, consider using a bar visualization. For example, this packed bubble visualization shows how each product type is performing in terms of revenue. Each bubble is a different product type. The size of each bubble is determined by the revenue for that product type Word cloud Use a word cloud visualization when you want to see a visual representation of text values. The size of each text value indicates its frequency or importance. For example, this word cloud visualization shows revenue for all product types. The Eyewear product type brings in the most revenue. Summary A summary visualization can be used when you want to see the total for the measure that is shown in the visualization. This is ideal when you want to accentuate a very large or small number. For example, this summary visualization shows total revenue for all product types. Network The network visualization is used when you want to see the connections among columns in your data set. A network visualization is a good choice when the data is hierarchical in nature. For example, this network visualization shows which positions are in each department. And the categorical visualization is now called the heat map visualization. A categorical heat map visualization groups a column and uses a shape for each item in the column. It then uses size and color to show the relationship between two numeric columns. If you want a categorical visualization, add a column to the Points data slot. For example, this categorical heat map visualization shows revenue for each product line. You see that revenue varies within each quarter with revenue for some product lines being flat while the Personal Accessories product line is consistently bringing in more revenue. Use a heat map visualization to visualize the relationship between columns and you want it to be represented in a matrix type view. A heat map visualization uses color and intensity of the color to show the relationship between two columns. For example, this heat map visualization shows revenue for each product type and quarter. You can see how Eyewear compares to other products across quarters. It consistently brings in the most revenue every quarter. In Watson Analytics you can see the visualizations in the "Change the visualization" menu: And also in the starting points dialog: Let us know what you think, start a Discussion in our forum.