Fantasy golf at The Masters with Watson

By , Jenna Miller, Gray Cannon, and Nick Wilkin | 9 minute read | April 6, 2021

With contributions by: Monica Ellingson, Stephen Hammer, Karl Schaffer, and Lee Tilt.

The Masters is one of the most anticipated sporting events of the year for the world’s best golfers and millions of sports fans across the globe. Even though the world is recovering from unprecedented circumstances, one thing remains for IBM and the Masters, and that is delivering one of the best digital experiences in the world of sports.

This year, IBM and the Masters have found a way to elevate the digital experience with “Masters Fantasy.” This new fantasy game for 2021 will allow users to select golfers from four categories to build up a fantasy foursome for the Tournament as well as compete for daily prizes. IBM Watson is infusing insights about each player to help fans select players for their foursome. The AI-powered factoids are generated using IBM Watson Discovery and its natural language processing capability, which has access to thousands of articles and blogs. AI is also used to generate text from historical statistical data. Providing users with factoids and statistics about all 90+ participating golfers in the field can help them make informed decisions based on a player’s previous performance at the Masters, how they are currently playing in the tournaments that led up to our favorite April event, and who is looking hot in the pursuit of the 2021 Green Jacket.

Fantasy golf

The Masters Fantasy experience enables fans to pick their own lineup of golfers from four categories: First Timers, Past Champions, United States, and International. Each golfer’s real-world performance on the golf course translates to fantasy points. For example, each player will receive 8 points for a double eagle, 5 points for an eagle, and 2 points for a birdie. On the flip side, players lose 1 point for a bogey and 3 points for a double bogey or worse. Over the course of the Masters Tournament, each golf team manager’s goal is to have the highest score among teams in their league.

Selecting golfers based on name recognition and high rankings is not enough to assemble a league-winning team. The context of play and the selection of players from all 4 categories require a new level of insight to be competitive in fantasy golf. In partnership with the Masters, we built a system that uses natural language processing and artificial intelligence techniques to distill a high volume and variety of data into natural language insights. This information is particularly critical for fans who need to select a replacement player after the cut is made. In addition, after play is over for a round, we create a fantasy golf round recap video to help team managers understand each golfer’s level of play going into the next round.

Looking at the overall architecture in Figure 1, the creation of insights is from two parallel paths. First, we create factoids by mining thousands of arguments from Watson Discovery and At the same time, we generate natural language from the statistics of play during the Tournament. A series of Watson AI services are then used to join the factoids and natural language-generated sentences (NLG stats) from statistics through optimization techniques. A package of factoids and NLG stats is created so that the content provides diverse insights about a player. The insights and packages are saved to IBM Cloudant, which powers the Insights Human Review Tool where content is moderated.

Fantasy golf architecture
Figure 1. The overall fantasy golf system architecture

While natural language processing technologies are processing data, our AI Highlights video solution is creating round recaps for the fantasy game. Video is sliced into individual shots for every player on every hole. These clips are sent to our deep learning algorithms. The system measures the excitement levels of the clips based on gestures and sound. A second crowd excitement score is generated by IBM AutoAI and Watson Machine Learning based on debiased historical sound scores. The resulting metadata is stored within Cloudant that references a clip within a dynamic content store. Humans review the highlights before they are approved.

Now, an IBM Functions serverless application retrieves approved factoids, statistics, and fantasy golf videos. The data is then merged into a JavaScript Simple Object Notation (JSON) structure and saved to IBM Cloud Object Storage. All of the content is available through a Content Delivery Network (CDN) for fantasy golf experiences around the world.

When a fan first joins the experience, they fill out their 4 roster slots and pick a tiebreaker answer such as estimating the aggregate number of pars across the Tournament. After picking their roster, the fan can view their players and watch their performance as play progresses. For example, in the following figure, you can see the landing page for your roster within fantasy golf. After the cut, if a team manager’s player does not continue in the Tournament, a replacement player can be chosen.

Four slots on a roster
Figure 2. The four slots on a roster for fantasy golf

When each fan is making their player selections, they can view insights about each player. Within the In the Media tab, AI factoids from news sources are discovered and delivered for each golfer. The Statistics tab shows NLG stats that were produced that provide simple sentences that summarize their play. The additional context gives every fan the knowledge they need to make informed picks.

Factoids and NLG stats
Figure 3. Factoids and NLG stats delivered for each golf player

Let’s get a deeper understanding about the three phases of natural language processing and AI that create a contextualized fantasy golf experience.

Phase 1. Player insights natural language generation

NLG stats
Figure 4. NLG stats system architecture

The action on the course at Augusta National generates dozens of distinct statistics for Masters Digital users to explore. With the launch of the Masters Fantasy experience in 2021, there was an even greater need for fans to have relevant statistics at their fingertips. Fantasy participants consider the relative strengths and weaknesses among players to select a winning roster. Following each round, participants seek information on how each player fared and contributed to the overall fantasy score. Our team decided that the best way to present this information was through natural language. While experienced golf analysts enjoy diving into data tables, more casual fans are often more comfortable consuming information in natural language. Additionally, a natural language format is more flexible than rigid data structures, and therefore, allowed us to present the statistics alongside supporting context.

IBM maintains databases that store Masters statistics and other relevant information using IBM Db2 on Cloud. From the database, the relevant data is analyzed using the pandas Python package. Each statistic is converted to a rank value with respect to that statistic among the entire Tournament field. The most extreme values in rank terms are the items that will be most interesting to the audience, so these are the statistics we select for natural language generation.

To convert structured data into natural language, the IBM team leveraged techniques from both IBM Research and the open source community. The starting point is a basic sentence pattern such as “He had 3 sand saves.” To improve the basic sentence, the IBM team trained a deep learning model to generate new sentences that transferred the language style of a golf commentator onto the basic sentence such as “He managed par saves from the sand 3 times.” Using PyTorch, the team fine-tuned a T5 transformer language model to learn golf phrasing from examples. The model generated multiple stylized sentence variants for each statistic. After generating variants for each of a player’s statistics, the optimal package of sentences was selected using IBM Decision Optimization. Specifically, the sentences were selected such that the number of phrase repetitions across the package was minimized subject to the constraint that each statistic type must be present in the package. The final task of the Natural Language Statistics web service was to persist each player’s statistical package and corresponding metadata to a Cloudant NoSQL database on IBM Cloud, which feeds into the Insights Human Review Tool.

Phase 2. Player insights factoids

Factoid system architecture
Figure 5. Factoid system architecture

Next, we decided to focus on core media outlets to answer key questions. What makes a player interesting? What happened in their career to lead them to play at the Masters Tournament? Player Insights with Watson seeks to uncover the answer to these types of questions, along with any other facets of a player’s background that makes them stand out from the field and helps users make decisions for their roster.

To achieve this, Watson searches for information on a given player across millions of news articles, blog posts, and other online media, supplemented by deep dives on a targeted selection of golf sources, such as Watson has a deeper understanding of the editorial content through natural language processing enrichments. Articles are categorized by their prevalent topics/concepts and relationships are drawn between entities such as people and places. Articles deemed relevant to both the player and the topic domain are then summarized using state-of-the-art extractive algorithms.

The nature of extracting sentences from a body of text means that a degree of context is lost in the process. Pronouns and any time-relative references such as “two years ago” might be disconnected from their roots, leaving the summarized sentence difficult to understand. To mitigate this, we attempt to resolve orphaned coreferences using sentences within +/- 2 of the extracted summary.

Having collected relevant articles, extracted salient information and resolved any lingering coreferences, the next stage is to assess each of the sentence’s quality. Two dimensions are used to determine the quality of a given snippet: its grammatical coherence, determined by Scikit-learn surface form parse rules and decision tree, and a trained machine learning model that measures topic alignment. Sentences that pass a quality threshold are determined to be insightful and are stored in our Cloudant natural language processing store as factoids.

Phase 3. Insights optimization

Finally, we need to relate the factoids and NLG stats together. Each day, a diverse set of factoids and NLG stats is produced about each player at the Masters. The large amount of content is still too much information. A process known as bin packing, associates factoids and NLG stats together into small packages. The association creates digestible content that has complementary components.

A Red Hat OpenShift Python application retrieves all of the factoids and NLG stats that have been approved. The optimization algorithm is defined and built through the IBM Decision Optimization service. The resulting model is deployed within IBM Watson Machine Learning. A payload containing both NLG stats and factoids is sent to Watson Machine Learning for packaging. The packages are stored and saved within Cloudant. The process repeats the packaging logic when each factoid and NLG stats job has finished. The status about each job is stored within Redis.

Natural language processing optimization architecture
Figure 6. Natural language processing optimization architecture

Let’s play

In our current and future experiences, technology helps us to engage with those we might not be able to see in person and to extend physical events to millions of fans around the world. Now you can enjoy the Masters with your own team to immerse yourself into the golfing action. Create your own fantasy team to compete against your family, friends, and the entire Masters Fantasy community. Get your bragging rights ready, the first round of the Tournament starts Thursday, April 8.