Analytics for all

Basketball, brackets, and dataviz: Time to take offense

Blog Home > Basketball, brackets, and dataviz: Time to take offense

Basketball, brackets, and dataviz: Time to take offense


Reading Time: 5 minutes

It’s mid-March. For most of the world, this means either spring or fall is just around the corner. But in the US, it marks the start of an intense period of craziness known as the “NCAA Tournament.” For the last weeks in March and the first weekend in April, those who love men’s college basketball go nuts. And I’m one of them.

How it works

March 2017 calendar leaf - Illustration

The tournament works like this: 68 teams, either by virtue of winning their conference’s tournament or winning an at-large bid, vie for the national championship. It starts on a Tuesday in the middle of March with the “First Four” games. Four of the 68 teams play that day to determine who will play higher seeded teams for what most fans call “the first round” and what Duke a few other fans call “the second round.” The other four play the next day. After those games, the rest of the tournament play starts on Thursday.

You mad? I am!

In one of the biggest nods to the human love affair with predicting things, people take these brackets and fill them out with who they think will win each match-up. You can do this on your own, in groups, or in sponsored contests. There are points for each win and the person with the most points (naturally) wins.

I’ve been completing tournament challenge brackets for fun for 20 years now. I’ve been watching basketball games since the 1970s, so I actually am knowledgeable about the sport. In addition, I went to a university known for basketball and I was there when we won a national championship. There’s nothing quite like it.

Sadly, I’ve only won a bracket challenge once, and that was in 1997. I won it because, back then, I had time and lived in a viewing area for three college basketball conferences. As a result, I could watch a whole lot of games and judge with my own eyes how teams played.

I’ve gotten busier since then, so for the last 10 years or so, I’ve relied on analysts’ opinions. That has been a mistake. Unintentional bias, a hot streak in a tournament and hype tinge their predictions. Therefore, for the third year in a row, I’m ditching the opinions of the basketball “experts” and turning to Watson Analytics.

Sticking with what I know: Pomeroy’s basketball stats

There are three major college basketball sets of statistics: the RPI, Sagarin, and Pomeroy. Each has its merits, but I always seem to gravitate to Pomeroy. Sagarin and RPI put a lot of weight on stats that are subjective. For example, they base rankings on strength of schedule. Unfortunately, strength of schedule assumes that beating or losing to a team that has been a basketball powerhouse for decades or has a beloved coach is better than defeating or losing to a team that doesn’t have either. In some cases, that isn’t always fair.

Pomeroy also is somewhat subjective. However, his statistics include ratings of how each team plays the game, such as offensive efficiency, defensive efficiency, and tempo. The NCAA tournament is a “win or go home” tourney. So, how a team played over a long period against teams ranked higher or lower than them is not as relevant as the mechanics of the game in my opinion. But is this the right tactic? Should I consider the others?

Watson Analytics to the rescue

To get started, I downloaded Pomeroy’s data from his website (I subscribe). I removed all the teams that weren’t in the tournament and added wins and losses. Then, I did something that was very bold for me: I added RPI rankings. Last year I did complicated things like add win-loss ratio in addition to the wins and losses. The lesson learned? All that eye-crossing effort didn’t get me any closer to a great performance in the bracket challenge than my simpler tactics in 2015. So I didn’t do it this year.

I clicked Discover, skipped the automatic starting points and asked Watson Analytics “What drives wins?” Watson Analytics, which has had a major interface revamp since last year’s tournament, showed me an impressive array of results. This was much more than last year. (For a better view, click the image and it will open as a larger image in a new tab.)

basketball bracket drivers

As I looked over the results, I noticed that RPI only popped up once in combination with offensive efficiency. So, I now know I can discount it next year. After losses, next on the list of predictive drivers was the combination of AdjEM (adjusted overall efficiency margin, which is the difference between a team’s offensive and defensive efficiency) and OE (offensive efficiency). Watson Analytics told me that that combination had a 70% chance of affecting wins. That was different from last year, so it was time to check it out.

Pop the bubbly: The efficiency association dataviz

Asking Watson Analytics for more information was easy. I saved my predictive spiral and then clicked the + symbol next to it and typed in my question: “How are AdjEM and OE associated by teamname?” Watson Analytics immediately responded with a bubble chart. This is not the easiest dataviz to see, but it suited my purposes.

basketball dataviz

With this dataviz as my guide, I made my choices and this is the result:

basketball dataviz real final brackets

How I did it

After looking at the dataviz, I found that OE and AdjEM generally ran along the same lines. In most cases, a team with a high OE was more likely to have a high adjusted efficiency margin. When I had to choose between a team with a higher OE or a higher efficiency margin, I used the offensive efficiency number. This is because almost all the drivers identified (after I threw out “losses” as the main driver), included offensive efficiency in some form. And that worked out because I got two requisite (according to legend) 12 seed upsets of 5 seeds.

I have one caveat. For UNC and UCLA, I did something I shouldn’t have. My personal bias affected my choice. Even though UCLA had better offensive efficiency (#1 of the 68 teams), I chose UNC, my alma mater. I did this mostly because I’m a homer. In my defense, I want to note that UNC’s adjusted efficiency margin was very close to the two top contenders, Villanova and Gonzaga. And despite its impressive offensive efficiency, UCLA looked like an outlier when I factored in the adjusted efficiency margin.

You, too, can use dataviz for your brackets – and your business

Watson Analytics is available as a free trial and you also get Watson Analytics for Social Media with it. You can use it to analyze basketball stats for your bracket or, better yet, you can use it in your business. It’s easy to ask what drives business outcomes and then research their associations. Visit www.watsonanalytics.com to learn more.

For those who are interested in using stats or a combination of stats, here are the links to the three I’ve mentioned:

  • Pomeroy (Pomeroy provides HTML stats on his website, but to get his full set of statistics, you have to subscribe to his site for a fee.)
  • Sagarin (Sagarin’s statistics are only available in HTML format, it seems).
  • RPI (This is the NCAA’s RPI site and is the one most free of ads. Again, it appears that HTML is the only format.)

I downloaded my interactive and printable bracket from this website. Good luck to all the teams in the tournament and may the Tar Heels best team win!

Leave a Comment
2 Comments

Leave a Reply

Your email address will not be published.Required fields are marked *

Paul Bard Mar 15, 2017

Great article, and very illuminating as to the benefits of using Watson Analytics to take the subjectivity out the bracket-building process. However, statistics notwithstanding, I take issue with your favoring Middle Tennessee to win the first round over the University of Minnesota Golden Gophers. I’m hoping to see the Gophers up against your North Carolina Tar Heels in the third round. And may the best team (Gophers–not the Tar Heels!) win!

1 Replies

Your email address will not be published.Required fields are marked *

Forsyth Alexander Mar 17, 2017

Although this was a very good year for Minnesota, I’m pretty sure that they would have to go home after meeting UNC. 🙂

0 Replies

Your email address will not be published.Required fields are marked *

More Analytics for all Stories

Analytics for all

Timothy Walker

Join us at these March 2019 IBM Business Analytics partner events!

IBM Business Analytics partner events offer a great opportunity to learn more about our products, get questions answered, and connect with industry peers.

Analytics for all

Timothy Walker

Get a jump on Think 2019 with “Smart Starts Here” sessions

The IBM Think 2019 conference includes a set of exclusive limited-seating sessions and hands-on labs in our Smart Starts Here track. Sign up now!

Analytics for all

Timothy Walker

Time to register for Think 2019!

IBM's blockbuster Think 2019 conference is coming up! Join us in San Francisco on February 12–15 to sharpen your skills, see the latest technology, and extend your professional network.