Don’t let a sea of unstructured data hide good sake
What makes a good Japanese sake? Sake is sensitive to temperature. Flavor changes as it is served cold, room temperature, heated warm or hot. Particularly, heating sake to a preferred temperature in a precise amount of time is no easy task to master. Oh, and knowing what type of sake to pair with tempura versus sushi makes the rice brew taste even better.
IBM Research-Tokyo’s Tetsuya Nasukawa is known as the inventor of TAKMI, Text Analysis and Knowledge Mining. It is a cognitive technology that mines unstructured data to identify hidden knowledge and give insights to businesses for making better decisions. Today, TAKMI is the core technology for the IBM Watson Explorer Content Analytics.
And it’s also behind Tetsuya’s search for Tokyo’s best sake bistro.
“The benefit of text mining technology is to make you aware of what you have not yet noticed, and that’s how I found this bistro which I became a regular customer” said Tetsuya… with a sad look in his face.
In late 2014, Tetsuya’s favorite bistro near Tokyo Station quietly closed. The bistro, Yanagi, was run by a couple with a small counter and three tables. Just the right size for the husband-and-wife team of Otosan (which translates as “father/darling”) and Okasan (or “mother/honey”) to manage. And they served the best Kanzake (warm sake) with a taste of Okasan’s home cooking.
The bistro gradually made fans like Tetsuya through word-of-mouth as a precious place that you want to introduce to close friends. So, Tetsuya was shocked to receive the call from Otosan with the news that Yanagi was closing. He wasn’t the only regular customer sad about the closure. During his last visit, Otosan softly muttered that if they could have this many customers on a regular basis, perhaps, they would not have had to make such a difficult decision.
Closing, though, came down to numbers. Customer review numbers. Yanagi had good reviews for its quality sake and homemade cooking. Just not enough of them to rank on popular restaurant review sites. It was a hidden treasure that unfortunately stayed hidden.
With Otosan’s words in his ear, Tetsuya decided to unveil the way how he found Yanagi to shed light on quality-conscious bistros like Yanagi.
The social sentiment of a good sake bistro
First, Tetsuya analyzed tweets with bistro names to determine if there is information that indicates a good bistro. He looked not only at tweets about bistros, but what kind of people tweet about bistros from millions of tweets and tens of thousands of candidates to narrow down the definition of a good bistro.
He also gathered tweets which contained either “nihonshu (sake)” or “beer,” while eliminating tweets with industry terms and expressions, such as “goraiten (formal way to say “look forward to your visit”).” He included tweets contain the word “beer” because he wanted to gather broader information in regards to sake –and oftentimes people start toasting with beer, before drinking sake; so there might be a greater possibility of finding information on sake by including these tweets.
He then referenced reviews and public information such as location, size, ambiance and menu against the definitions that emerged from his Twitter analysis. To do this, he mined the text of 4 million tweets to identify hidden bistros that may be good, and then cross-referenced reviews and public information to further narrow down which bistros might be good. For example, a tweet of “going to [bistro name] for some sake” might not seem significant, but it’s a good lead to connect with other findings.
Finally, Tetsuya matched bistros with low-rankings on review sites, but favorable tweets. In about 30 minutes, his system could deliver a potential bistro near a specific location, like Tokyo Station.
To determine if his analytics technique worked well or not, he decided to go with an old fashion yet reliable way to confirm. As he identifies a good bistro, he tried it out after work with colleagues who also love sake. They quickly became members of the Japanese Sake club hosted by Tetsuya.
Based on his advanced analytics – and field trips to 15 excellent bistros from Tokyo to Kyoto – on how to discover a good sake bistros (plus field work by the Japanese Sake club), Tetsuya wrote Mining a large amount of tweets for discovering bistro serving good sake: an attempt for using micro blogs as knowledge, an academic paper (Japanese) that he and his club members presented at the 21st annual meeting of the Association for National Language Processing of Japan, last March.
Today, Tetsuya continues working on solving ambiguity problems, not losing sight of the fact that noisy data may hold hidden insights.
For example, bistros often have the family name of the owners. This makes identification challenging when using natural language processing because it’s difficult to identify if the name indicates the bistro, some family or something else.
He wants to further analyze the people tweeting about sake and sake bistros, using IBM Watson Personality Insights. Tetsuya also wants to integrate image analysis to better-identify bistros and locations. And he is always trying to add some fun into his research, with his Japanese Sake Club actively supporting him, particularly on field trips.