Big Data Analytics

Serving up sake bistro analysis

Share this post:

Don’t let a sea of unstructured data hide good sake

What makes a good Japanese sake? Sake is sensitive to temperature. Flavor changes as it is served cold, room temperature, heated warm or hot. Particularly, heating sake to a preferred temperature in a precise amount of time is no easy task to master. Oh, and knowing what type of sake to pair with tempura versus sushi makes the rice brew taste even better.

IBM Research-Tokyo’s Tetsuya Nasukawa is known as the inventor of TAKMI, Text Analysis and Knowledge Mining. It is a cognitive technology that mines unstructured data to identify hidden knowledge and give insights to businesses for making better decisions. Today, TAKMI is the core technology for the IBM Watson Explorer Content Analytics.

And it’s also behind Tetsuya’s search for Tokyo’s best sake bistro.

“The benefit of text mining technology is to make you aware of what you have not yet noticed, and that’s how I found this bistro which I became a regular customer” said Tetsuya… with a sad look in his face.

In late 2014, Tetsuya’s favorite bistro near Tokyo Station quietly closed. The bistro, Yanagi, was run by a couple with a small counter and three tables. Just the right size for the husband-and-wife team of Otosan (which translates as “father/darling”) and Okasan (or “mother/honey”) to manage. And they served the best Kanzake (warm sake) with a taste of Okasan’s home cooking.

The bistro gradually made fans like Tetsuya through word-of-mouth as a precious place that you want to introduce to close friends. So, Tetsuya was shocked to receive the call from Otosan with the news that Yanagi was closing. He wasn’t the only regular customer sad about the closure. During his last visit, Otosan softly muttered that if they could have this many customers on a regular basis, perhaps, they would not have had to make such a difficult decision.

Closing, though, came down to numbers. Customer review numbers. Yanagi had good reviews for its quality sake and homemade cooking. Just not enough of them to rank on popular restaurant review sites. It was a hidden treasure that unfortunately stayed hidden.

With Otosan’s words in his ear, Tetsuya decided to unveil the way how he found Yanagi to shed light on quality-conscious bistros like Yanagi.

The social sentiment of a good sake bistro

First, Tetsuya analyzed tweets with bistro names to determine if there is information that indicates a good bistro. He looked not only at tweets about bistros, but what kind of people tweet about bistros from millions of tweets and tens of thousands of candidates to narrow down the definition of a good bistro.

He also gathered tweets which contained either “nihonshu (sake)” or “beer,” while eliminating tweets with industry terms and expressions, such as “goraiten (formal way to say “look forward to your visit”).” He included tweets contain the word “beer” because he wanted to gather broader information in regards to sake –and oftentimes people start toasting with beer, before drinking sake; so there might be a greater possibility of finding information on sake by including these tweets.

He then referenced reviews and public information such as location, size, ambiance and menu against the definitions that emerged from his Twitter analysis. To do this, he mined the text of 4 million tweets to identify hidden bistros that may be good, and then cross-referenced reviews and public information to further narrow down which bistros might be good. For example, a tweet of “going to [bistro name] for some sake” might not seem significant, but it’s a good lead to connect with other findings.

Finally, Tetsuya matched bistros with low-rankings on review sites, but favorable tweets. In about 30 minutes, his system could deliver a potential bistro near a specific location, like Tokyo Station.

To determine if his analytics technique worked well or not, he decided to go with an old fashion yet reliable way to confirm. As he identifies a good bistro, he tried it out after work with colleagues who also love sake. They quickly became members of the Japanese Sake club hosted by Tetsuya.

Based on his advanced analytics – and field trips to 15 excellent bistros from Tokyo to Kyoto – on how to discover a good sake bistros (plus field work by the Japanese Sake club), Tetsuya wrote Mining a large amount of tweets for discovering bistro serving good sake: an attempt for using micro blogs as knowledge, an academic paper (Japanese) that he and his club members presented at the 21st annual meeting of the Association for National Language Processing of Japan, last March.

Today, Tetsuya continues working on solving ambiguity problems, not losing sight of the fact that noisy data may hold hidden insights.

For example, bistros often have the family name of the owners. This makes identification challenging when using natural language processing because it’s difficult to identify if the name indicates the bistro, some family or something else.

He wants to further analyze the people tweeting about sake and sake bistros, using IBM Watson Personality Insights. Tetsuya also wants to integrate image analysis to better-identify bistros and locations. And he is always trying to add some fun into his research, with his Japanese Sake Club actively supporting him, particularly on field trips.

Field work: Tetsuya and some of the Japanese Sake club members giving a toast at a newly discovered bistro after work (From left: Tetsuya, Yohei, Hideo, Shohei, Risa, Yachiko and Takayuki).


More stories

Gauteng Province Launches COVID-19 Dashboard Developed by IBM Research, Wits University and GCRO – Now Open to the Public

The Gauteng Province has been using data and cloud technologies to monitor and respond to Covid-19, and now they are sharing access with the public. As of 20 August the Gauteng Province in South Africa has 33% of the national cases for COVID-19 with 202,000 confirmed cases — and the numbers continue to rise. To address […]

Continue reading

IBM Research AI Advances Speaker Diarization in Real Use Cases

In a recent publication, IBM researchers describe a novel speaker diarization algorithm that can consider not only speaker information, but also identifying clues about individual recording environments that help differentiate between the speakers, resulting in improved diarization accuracy for our in-house, real test cases as well as public benchmark data.

Continue reading

Largest Dataset for Document Layout Analysis Used to Ingest COVID-19 Data

Documents in Portable Document Format (PDF) are ubiquitous with over 2.5 trillion available from insurance documents to medical files to peer-review scientific articles. It represents one of the main sources of knowledge both online and offline. For example, just recently The White House made available the COVID-19 Open Research Dataset, which included links to 45,826 papers […]

Continue reading