Blog

What's happening? What's new? What can I do? Find answers to these questions in the blog.

Archive Results

Blog

VIDEO: How to upload data to #WatsonAnalytics (Tutorial)

Uploading data to Watson Analytics for easy data analysis is a snap

Blog

Using Customer Behavior Data to Improve Customer Retention

Telco Customer Dataset This demo uses the the sample data within Watson Analytics. Please use the sample dataset.   What’s in the Protect Your Customer data set? This data set provides info to help you predict behavior to retain customers. You can analyze all relevant customer data and develop focused customer retention programs. A telecommunications company is concerned about revenue and the number of customers leaving their landline business for cable competitors. They need to understand who is leaving. Imagine that we are analysts at this company and we have to find out who is leaving and why. The data set includes information about: Customers who left within the last month –this column is called Churn Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies Customer account information – how long they’ve been a customer, contract type, payment method, paperless billing, monthly charges, and total charges Demographic info about customers – gender, age range, and if they have partners and dependents Getting the data                          Under the Data tab in Watson Analytics, tap + New Data button. Tap Import > Sample Data and then select and import the Protect Your Customer dataset. The data set appears as a tile under the Personal folder within the Data tab and you’re ready to get to work. It may take a couple of seconds as Watson Analytics is analyzing the data to aid your journey in using this dataset. Which customers have high value? To find the answer to this question, tap the Protect Your Customers CSV data set tile. You want to know where the revenue comes from and what you want to protect.   To better understand the business you may want to look at total charges by internet service type by asking “What is the average TotalCharges by InternetService?” You will want to select the first tile as best represents the line of inquiry.  Note that the image on the tile indicates you should expect a bar chart for this comparison.  In looking at the results, we see that Fiber Optic is clearly  the main internet service that gets the bulk of the revenue. Next, you want to find out about the total charges by contract type. Press the Plus button circled below and we will add another tab to your discovery set.   We want to investigate the total charges by contract type.  Enter the question “What are the average TotalCharges by contact?” and select the first suggestion (tile) from Watson Analytics.  We see the result that 2 year contracts generate more income whereas month-to-month is the lowest.  Typically, I would have thought average charges would be lower with longer contracts.  This is a little surprising. Clearly we want to protect customers with Fiber Optic and longer service contracts.  Lets add to the discovery set again with the plus button and find out how long customers stay with the services for each contract type by asking “What is the average tenure by contract type?”  Again the first suggestion from Watson Analytics is exactly the line of inquiry we want to explore, so we will select the first tile. When reviewing the results, we see that month-to-month contracts stay with the service on average 18 months whereas customers stay 42 months and 56 months on average for one year and two year contracts respectively.  You can hover over the bars of the chart to get the actual numbers.  The month-to-month contracts are not leaving immediately, but we should be thinking about how we can move these customers into longer term contracts. What drives customer tenure and churn? In thinking this through, we want to nail down the factors that drive customer tenure.  Let's add to the discovery set and ask “What drives Tenure?”.  The first suggestion fromWatson Analytics brings you to a spiral diagram which highlights TotalCharges and InternetService as the key factors for Tenure with a predictive strength of 91%. Looking at the relationships further down the list, I see that churn also affects tenure. This makes a lot of sense. Let’s see what drives churn by adding to the discovery set and asking “What drives Churn?”.  This time we will look at the second tile as it shows a decision tree for Churn.  By scrolling downward on the decision tree and hovering over each of the tree nodes, we can see that  customers with a month-to-month contract with less than six months tenure and Fiber Optic services churn 75% of the time. This occurrence is very high and we need to understand this better.  Perhaps the service is weaker than what our competition is providing and these new customers see the difference.  In any case, we need to speak to customer services and our hardware team with this finding as this directly impacts Fiber Optic revenue which is key to our business. Again, you can watch the narrated video for this use case here.  

Blog

Exploring Banking Loss Event data with Watson Analytics

Download the Dataset   This IBM Watson Analytics use case shows you how you can analyze loss event data from IBM OpenPages GRC using the updated Watson Analytics user experience.  (If you haven’t already signed up for Watson Analytics, you can do so here for free.) For the purposes of this use case, We are working on the risk team of a financial services enterprise and we need to review and analyze 7 years of loss events recorded in OpenPages which can be download from here. After we login we will see three main tabs and am currently positioned within the Data tab.  The Discover tab is where you will explore and discover the data you have in Watson Analytics.  The Display tab takes the discoveries and assembles them into rich stories, dashboard and infographics to share.  It all begins with your data, so the first thing to do is import the spreadsheet in Watson Analytics. 1.    Tap the New Data button. 2.    Tap the Local file tab, then tap the Browse button to select the spreadsheet you downloaded for the win/loss analysis. 3.    Tap the Import button. After importing the spreadsheet into Watson Analytics, where we can directly access the Excel file that I exported from the Loss Events pages in OpenPages . We can analyze the data in five steps. Step 1. Discover your Data When we click the tile created by the import and Watson Analytics immediately positions me into the “Discovery” functionality which provides me with a set of suggested questions or starting points that you can use right away.  You could also type your own question here too. If I am seeing a question in the tiles that make sense for me, I could simply click on the tile to get the result.  Note that each tile has a graphic showing you what to expect in the result. Let’s start by looking into the trend of the net loss by year. Watson Analytics presents a list of possible interpretations of what you wanted.  As it turns out, the first tile identifies exactly what we want, let’s select this question.  When I look at the figures, it appears that the budget safeguard we put in place in early 2014 worked as expected after that big loss in 2013. Next, we should check the trend of net loss by region by dragging the “Region” field (circled above in red) from the data tray directly onto the data visualization. Now we see the safeguard also worked as expected for all the countries: a great result. Step 2. Be open-minded and take suggestions from Watson Analytics Watson Analytics provides suggested lines of inquiry on the right based on interesting data distributions it finds adjacent to our current analysis.  These suggestions change as I change my line of inquiry.  I note that Net loss by business is very relevant, so lets evaluate this discovery by clicking on this tile. When we do that, we learn that our Corporate Finance and Retail Banking businesses account  for close to half of the net loss of my company.  Yikes! Lets tweak the visualization to use a treemap by clicking on the left “Visualization” icon”. And then select the Treemap visualization. We now see an interesting view of net loss by business. Step 3. Use Predict to review a model with net loss as the target Watson Analytics discovery capabilities also apply to predictive analytics.  Next, we ask the question: “What drives Net Loss”. Watson Analytics creates a spiral diagram where the factors most likely to correlate with the outcome (called “predictors” or ”drivers”) appear closest to the center. Here, business unit and risk sub-category are the top predictors with a predictive strength of 75%. By clicking the   icon next to an item in the list of drivers we are able to zoom into the details of this model.   We can see that the top issue with Net Loss is the relationship with Vendors or Suppliers with our Corporate Finance.  Mouse over the cells to get details as shown below. Let’s rename the tabs for our Discovery Set and then save it: 1.    Click on a tab name and then click the “pencil” icon to edit the tab name. 2.    Click on the disk icon on the top right (   ) and provide a name for the Discovery Set. 3.    Close the Discovery Set using the drop menu as shown below. Step 4. Assemble the data within a Display We can quickly put these findings into an interesting Display such as a dashboard or Infographic to share with others.  Click on the Display tab and then click “+ New display”. Select the Dashboard option and then select the four quadrant display template. On the left, we will be able to locate the previously saved Discovery set in the personal folder assuming we saved in the default location.  Expand your Discovery Set and select a visualization. Drag or click the four visualizations from the Discovery set onto each of the quadrants of the template (the blue box will glow to show you when to drop) and save the Display using the disk icon as you had done before. Step 5. Share the new insights! These findings are significant and we will want to share them with the VP of Risk.  We could use shared folders if the VP is also a user in the same Watson Analytics account or we can share with anyone using PDF, Powerpoint or Image files via email of download. Click on the share    icon, select Email and then select “PDF”.  Using download or email, the person you are sharing with does not even need Watson Analytics to benefit from my analysis. Great Job!  Don’t stop there apply these analytics to your own data!

Blog

What will a graduate degree give me? Exploring the American Time Use Survey data set

American Time Survey data The American Time Survey data is included within Watson Analytics as a sample data set called American Time Use Survey.csv Imagine you’re a university student thinking about going to graduate school and wondering what the impact would be on your income and how this affects your free time over the long term. The American Time Use Survey data set contains data about the amount of time people spend doing various activities, such as paid work, volunteering, childcare, and socializing. This demographic data is about a subset of Americans but can be applied more widely.   It all starts with your Data!   In Watson Analytics, click the New Data button. Click Sample Data icon.   Select American Time Use Survey.csv, scroll down and then click the Import button. The data set appears as a tile in your Personal data folder. Watson Analytics analyzes the data and metadata when uploading the csv file to provide smarter data discovery and analysis. In this process, Watson Analytics identifies field names and concepts, possible measurements and hierarchies in your data and captures metadata including data quality, data distributions, skewness and missing values. Let’s ask our first question.   Does higher education lead to higher earnings? Tap the American Time Use Survey data set tile. You are taken into a new Discovery set. This is where you start interacting with the data. That single tap gave you a list of Starting points, which are different ways to launch yourself into data analysis and visualizations.   Let’s enter our question: does higher education lead to higher earnings, and then press Enter. You now see different Starting points based on your question and these are ranked by relevance.  The most relevant inquiries bubble to the top of the list.   Select the Starting point: What is the breakdown of Weekly Earnings by Education Level? The results are shown in a treemap visualization. The size of each rectangle below indicates the relative size of weekly earnings by education level. The largest rectangles are for those with advanced degrees. This visualization is for all ages.  Let’s see how weekly earnings by education level breaks down when ages are added in. At the very bottom of the window is the Data Tray showing all the column headings in the data set. Add Age Range to the visualization. Just drag it from the data tray (the grey strip on the bottom) and drop it anywhere on the visualization. Note: you can also drop it on the Data Slot beside the drop down for Education Level on the bottom left just below the visualization. There’s a lot more detail in the visualization now, perhaps too much. Let’s focus in on people with college or university degrees. Below the visualization, you can modify what is displayed. Select Education Level and tap the items listed from 9th grade down to Some College to remove them from the visualization. You may need to scroll down in the box to complete this. Some of the smaller rectangles are for age groups that aren’t really relevant to the question that we’re exploring. People aged 0-19 have generally not completed university or college, and those aged 70 and older have generally retired from paid work. Let’s filter out these groups: Tap Age Range at the bottom of the Visualization Select 0-19, 70-79, and 80+ to remove them.   Then tap Done or outside the Age Range list to close it.   Try a different visualization type Different visualization types communicate information about data in different ways. Let’s see what else we can learn by using a different visualization type. Tap  to the left of your visualization to see what Watson Analytics recommends. You can, of course, pick any type you want. Tap the first recommended visualization: the Bar chart.   You see that earnings peak when people are in their 30s and 40s, regardless of education level. But what about work-life balance? Earnings is one way to look at it.  However, life is about more than how much money you earn. Does someone with more education work longer hours? Do they have time to spend with their families and friends? Lets add to this discovery set with a simple click on the plus button circled below and then ask the question “How do weekly hours worked compare by education level?” By clicking on the insight tile circled above you will see the treemap.  We can see that people with more advanced education level spending more time working.   In the previous inquiry on weekly hours worked by education level, I see that there are other questions we could ask that are more predictive in nature. Similar to Step 8 lets add to the Discovery Set and determine “What drives Weekly Earnings?”  Select the circled insight tile. It may take a few minutes for this insight to process as it is going through many predictive models to determine what drives weekly earnings.   Once it evaluates thousands of models, it will present us with a short list of predictive relationships. Not surprising -based on what we have already seen that weekly hours worked and education level have relatively strong relationship with weekly earnings with a predictive strength of 45%. If you wanted to see more drivers, you can tap the link for “Show more drivers”.  If we tap the button to the right of the driver we can see more details on the driver. As we mouse over the blue blocks in this heatmap, which show the key elements of the relationship, the cell values for weekly income (shown as color intensity) are generally higher earnings as you move your cursor up and to the right. What did we learn? These findings show us that working hard to get good marks in school to attain a higher education does not stop there.  We will need to keep working after we have attained our advanced degree to continue in building up the weekly earnings.  This of course affects our free time. Don’t stop there - Try this type of analysis with your own data set!

Blog

Starting with your Data in Watson Analytics

You may have already noticed there is a new look and feel to Watson Analytics. The new look includes three major functions: Data, Discovery and Display.   But - you will always need to start with data. Once you load your data into Watson, then you can proceed to Discovery and Display functions garnering data-driven insights.  There are several ways you can get data into Watson Analytics.  This video describes your data as being your starting point within Watson Analytics and covers many ways you can bring your data into Watson Analytics.  You will see how you can start small with the sample data within Watson Analytics and then proceed to show how you can bring other more interesting datasets, including local data, social and cloud based data into Watson Analytics. Enjoy this video!

Blog

Using NYPD Data and Watson Analytics to Better Understand Vehicle Accidents

What can data about vehicle accidents tell us? I recently was able to take data from the New York Police Department that spanned 2013-2015, refine it and upload it into Watson Analytics to see what I could learn about accidents and the injuries and deaths they cause. I was especially interested in discerning the latent patterns. Before we get started The real-world data set I used is courtesy of NYPD Motor Vehicle Collisions open data and based on motor vehicle accidents in the New York area. This data set was later merged with hourly weather data based on latitude and longitude and time of accident. The data was shaped and customized by removing all rows where “Borough” was blank and “Contributing Factor (for Vehicle 1)” was either blank or unspecified, because my plan was to analyze only those accidents that had information about the borough and the contributing factor for the primary vehicle. This calculation was also created: # Persons killed or injured = Persons Killed + Persons Injured It begins with a simple question The question I asked Watson Analytics was “I want to understand accidents by borough.” As you can see here, Manhattan was the most accident prone area for the three-year period I was analyzing, while Staten Island the least. Looking at those accidents over the years, I immediately realize that not only has there been a steady increase in accidents for all boroughs, but also that the increase is pretty steep for Brooklyn and Queens (compared to the others).  When I replaced year with quarters, we see the same pattern of gradual increase in accidents as the quarters progressed. Essentially, accidents are increasing by the day! More accidents do not necessarily mean more injuries or deaths But not every accident results in an injury or a fatality! It’s really interesting when I made that distinction. So, although Manhattan had the most number of accidents, Brooklyn and Queens had more injuries and deaths due to them. Further, it’s clear that a passenger vehicle resulted in a majority of those accidents that led to injuries or deaths. If I want to bolster the value of my analysis, the transactional data can be augmented with relevant contextual data. In this case, that could be the make and model of the specific vehicle used, highway information, demographic details of the driver and other related data. Digging deeper for validation and new factors not easily seen Although a few important characteristics were easy to spot, the questions remained about what drove injuries or deaths during an accident. So, I turned to another aspect of Watson Analytics, the simplified predictive model that can be used to: Validate existing understanding, knowledge or gut feel Discover novel factors that are not obvious when you first examine data. Contributing factors (for the primary vehicle involved in the accident) seems to be the most important driver of injuries or deaths in an accident and its combination with Boroughs seemed to have the most significant bearing. I can not only see all the factor combinations ranked in order of their bearing to the target but I can also drill down into each one of them to learn more. Interestingly, weather plays a pivotal role too. Delving deeper and looking at the top 5 Contributing Factors across the three Boroughs (with the highest injuries and deaths from accidents), it’s evident that Driver inattention/distraction tops the list followed by Failure to Yield Right-of-Way. This is ironic given that either can be avoided and yet it ended up taking so many lives from accidents in New York boroughs in the last three years! But this is good insight because it makes it possible to fine-tune existing policies and focus areas with more education on the topic along with higher penalties for distracted or inattentive driving. An analyst, an office manager or any other regular user with the dataset can arrive at these insights within moments. No pre-requisite understanding of statistics required. How’s the weather there? Based on what I learnt from the predictive insight, I decided to learn more about the 66% impact that the combination of Wind Direction and Weather Conditions had on injuries/deaths resulting from an accident. Although, it might be common sense that higher the wind speed, the higher the chances of accidents (and hence more chances of injuries or deaths), it is nice to have statistical proof that specific wind direction could a bearing on high injury or deaths during an accident. The NYPD can use these insights to monitor and control such situations with alerts and real-time dashboards. Displaying what I’ve learned With Watson Analytics, I brought my insights together in easy-to-build, interactive displays. The daily view of both accidents and injuries or deaths resulting can be filtered to show any month we want. We can further break down Injuries and deaths into cyclists, pedestrians and motorists to understand them individually by region, time and other factors, to yield actionable insights. For example, my displays show that pedestrians in all boroughs run a higher risk of an accident related injury than cyclists. Try Watson Analytics If you haven’t used Watson Analytics yet, today’s a great day to try it. Visit www.watsonanalytics.com for more details.

Blog

VIDEO: Asking questions about your data in IBM Watson Analytics

See this video to understand how you can ask questions to discover your data in IBM Watson Analytics.

Blog

VIDEO: Adding Data to IBM Watson Analytics

This video shows you how to add data in IBM Watson Analytics.

Resources

VIDEO: Using Customer Behavior to Improve Customer Retention

Using Watson Analytics, you can predict behavior to retain your customers. You can analyze all relevant customer data and develop focused customer retention programs. Watson Analytics Sample Dataset - Telco Customer Churn Accompanying video  

Resources

SAMPLE DATA: Operations - Bike Sharing

Update 9/2015: Watson Anlaytics Sample Dataset Manage the process of matching supply and demand. Explore information based on usage and external factors to understand the effects on a bike share program in a major U.S. city to analyze patterns such as the relationship of humidity and ridership by month or how temperature and holidays drive usage demand. Download here: WA_Fn-UseC_-Operations-Dem-Planning_-BikeShare.xlsx About the data: 12 columns, 17,379 rows.