Analytics for all

How data science conquered baseball – and why fantasy baseball is next

Blog Home > How data science conquered baseball – and why fantasy baseball is next

How data science conquered baseball – and why fantasy baseball is next

Reading Time: 7 minutes

How data science conquered baseball – and why fantasy baseball is next

You just finished your fantasy baseball draft. You can’t believe after 10 years of playing you still had butterflies as the clock ticked down each time it was your turn to pick.

You’re feeling super confident because you really studied up this year — Yahoo!, Rotoworld, ESPN, the more obscure draft kits and rankings.

You came in second last year. But you were so close. That rash of injuries down the stretch caused you to stumble at the finish (Mark Teixeira, I knew you were too good to be true).

This year is going to be different.

Or will it be?

Like many fantasy baseball players, I’m looking for an edge — something to really take my team to the next level. As a relatively new IBMer, I’m starting to learn more and more about data science, and I’ve begun wondering: Could it provide the fantasy help I’m looking for?

My search for the answer led me on a crazy link-hopping expedition. It began with what was supposed to be a cursory look for some useful data science-fueled tools and ended up becoming an exploration of the evolution of data science in baseball. Eventually, my curiosity led to conversations with some experts, including IBM Watson Analytics Vice President Marc Altshuller, former Los Angeles Dodgers Executive Vice President and General Manager Fred Claire and Fred’s business partner Ari Kaplan, who’s one of the top data analysts in baseball. More on those conversations in a bit.

The evolution of data science in baseball

Stats and data have always been a huge part of the game, but I think the beginnings of data science in baseball trace back to the 1970s. Since then, we’ve moved through three phases as data science has grown and taken hold.

Phase 1. Data science steps up to the plate: The sabermetrics revolution moneyball-poster

Sabermetrics, made famous by Michael Lewis’Moneyball and the subsequent 2011 movie based on the book, stand as baseball’s toe in the data science waters. Essentially, sabermetrics looks at a whole bunch of nontraditional baseball stats and uses them to make player comparisons and, to a degree, predict player performance.Sabermetrics was more for stat geeks until Billy Beane brought it into the major league baseball managing office at the turn of the century. But I won’t rehash that whole story here. The Moneyball movie recounting Beane’s data-driven approach to managing the Oakland Athletics did garner six Academy Award nominations, so I assume many know the basics.

Phase 2. The crowd (and data collection and analysis) goes wild!

In the short 15 years or so since Billy Beane brought the book of Bill James to baseball, data collection and analytics capabilities have grown exponentially and are being used in all industries, with baseball arguably chief among them. basebaball-poster

All 30 MLB stadiums now sport high-resolution cameras and radar that record everything players and balls do on the field. Services like Statcast and Sportvision process all this data to calculate runners’ base-to-base speed, pitch velocity, spin and location, where every batted ball lands and way more. Many major league teams draw from these and several other data sources, including their own scouting data, player medical information and contract data.

Some baseball data services even get a bit predictive: FIELDf/x, for example, uses data it collects from the field to calculate the probability that a given player will make a catch.

Television networks carrying the games use some of this robust data to enhance the viewing experience. Teams use the data to help them make tons of decisions, like how to align their defenses from batter to batter or even from pitch to pitch.

Brought to you by MBA@Syracuse: Tools of Baseball Analytics

Phase 3. Data science begins to round the bases

But that all just scratches the surface of leveraging the billions of baseball data points available and unlocking their truly predictive potential. This brings us to my conversation with Ari Kaplan, who’s on a mission to accomplish such feats.

Kaplan has worked with two-thirds of MLB teams to improve player acquisition and performance through data and analytics. He built scouting and player development database systems for four teams, including the Baltimore Orioles and Houston Astros, and served as the Manager of Statistical Analysis for the Chicago Cubs.

Kaplan already provides his MLB clients with a “secret analytics sauce” (as IBM Watson Analytics VP Marc Altshuller describes it) that helps management teams make data-driven decisions. Announcers like Orel Hershiser use Ari’s stats on air to beef up their play-by-play and in-game analysis.

Kaplan’s always looking to crunch ever-larger and more diverse data sets and find more ingredients for the sauce, as well as more recipes for his clients’ success. Just recently, he fed about 800,000 records on every pitch thrown in baseball in 2015 into IBM Watson Analytics. He’s hoping Watson Analytics can recommend how pitching variables (speed, spin, etc.) might be related.

“What I like about Watson Analytics is that there are lots of analytics tools based on a simple regression analysis, but with Watson Analytics, it’s more machine learning and more different types of analysis,” Kaplan told me. “And because of that, Watson Analytics can come up with insights that your traditional engines can’t. That’s where you win in fantasy.”

Altshuller pointed out that Watson Analytics also will help Kaplan work a lot faster since he won’t have to spend a ton of time testing a bunch of variables to see if they’re meaningful.

Can data science help me (and you) win?

Can I be an armchair data scientist and use Watson Analytics to help me with fantasy baseball? The answer is yes. You or I can sign up for Watson Analytics for free and start geeking out with data to our heart’s content. There are plenty of raw baseball data sources available and lots of instructional resources that show us how to use Watson Analytics. It’s a pretty intuitive tool, so we can likely dive right in and start playing around. And then maybe the secret formula will be ours (cue evil cackle), and perhaps we’ll come in first in our leagues this year.

For those who prefer pre-crunched data, Kaplan is now unleashing some that was once only available to MLB teams. Through Scoutables, his new venture with his partner/former Dodgers GM Fred Claire and some advisory help from IBM’s Altshuller, Kaplan will provide deep views into how a player performs and how that performance evolves as the season progresses.


The Scoutables application officially launches on opening day, but there’s a beta version online right now where you can enter any player name and get some pretty interesting facts. Some don’t help with fantasy, but they’re cool nonetheless.

For example, I entered one of the Mets (big fan — don’t hate on me), Curtis Granderson. Those of us who follow him know he has walked a ton recently, but did you know that in 2015:

Granderson avoided swinging at pitches in the upper left of the hitting zone. Granderson only swung at 52 percent of 513 pitches there (67 percent is average). He also laid off pitches above the zone and away. Granderson only swung at 7 percent of 325 pitches there (31 percent is average).
Plus he saw a lot of pitches in each at bat, 4.37 pitchers per plate appearance (3.8 is average).

Whew, I have to come up for air — that’s deep!

Data that might help fantasy players a bit more includes the information in the “Compared to Recent” section, which could provide direction when selecting daily fantasy lineups. This data shows where a player has done better and worse in the last few weeks. And the “Compared to Last Year” sections could have some value in the draft. There are upgrades on tap throughout the season, with more visuals — like spray charts, hot zones and cold zones — to be added to player pages.

Kaplan wants to do a ton more, though. For example, right now you might look at a batter’s history against a particular pitcher — one whom he has faced in only, say, eight at-bats. Maybe he’s 0-for-8, but that’s a tiny sample. It doesn’t really help you figure out what’s going to happen with this batter today. But, what if you know that recently this pitcher throws a lot of pitches soft, low and away, and this particular hitter eats up those types of pitches? This would be much more helpful, and you can’t quite find that level of predictive depth in the fantasy data today. Kaplan hopes to bring this to us soon, and in a visual way.

For now, good luck to all my fellow players (except those in my league) as we start the season. I’d like to know: What data sources and analysis tools are your favorites right now?

And what data would you love to have to help you with drafting and in-season moves? I know I want to see some predictive data on who’s most likely to get injured, the “baseball black swan,” as Billy Beane calls it.

Learn more about Watson Analytics

To learn more about Watson Analytics and to register for the free edition, which includes the use of Watson Analytics Professional for free for 30 days, click here.


This blog was originally published in the Center For Applied Insights blog on 30 March 2016.

Image sources and credits:

Moneyball poster: Image credit: IMDB

1988 edition of Bill James’ Annual Historical Baseball Abstract: Image credit: Amazon

Tools infographic: Brought to you by MBA@Syracuse: Tools of Baseball Analytics




Leave a Comment

Leave a Reply

Your email address will not be published.Required fields are marked *

More Analytics for all Stories

Analytics for all

Timothy Walker

Join us at these March 2019 IBM Business Analytics partner events!

IBM Business Analytics partner events offer a great opportunity to learn more about our products, get questions answered, and connect with industry peers.

Analytics for all

Timothy Walker

Get a jump on Think 2019 with “Smart Starts Here” sessions

The IBM Think 2019 conference includes a set of exclusive limited-seating sessions and hands-on labs in our Smart Starts Here track. Sign up now!

Analytics for all

Timothy Walker

Time to register for Think 2019!

IBM's blockbuster Think 2019 conference is coming up! Join us in San Francisco on February 12–15 to sharpen your skills, see the latest technology, and extend your professional network.