IBM’s Squawk Bot AI helps make sense of financial data flood

Share this post:

Analysts’ reports, corporate earnings, stock prices, interest rates. Financial data isn’t an easy read. And there’s a lot of it.

Typically, teams of human experts go through and make sense of financial data. But as the volume of sources keeps surging, it’s becoming increasingly difficult for any human to read, absorb, understand, correlate, and act on all the available information.

We want to help.

In our recent work, “The Squawk Bot”: Joint Learning of Time Series and Text Data Modalities for Automated Financial Information Filtering, we detail an AI and machine learning mechanism that helps to correlate a large body of text with numerical data series describing financial performance as it evolves over time. Presented at the 2021 International Joint Conferences on Artificial Intelligence Organization (IJCAI), our deep learning-based system pulls from vast amounts of textual data potentially relevant textual descriptions that explain the performance of a financial metric of interest — without the need of human experts or labelled data.

Dubbed The Squawk Bot, the technology falls within the sub-field of AI and machine learning known as multimodal learning. This type of learning attempts to combine and model the data obtained from multiple data sources, potentially represented in different data types and forms.

While multimodal learning has been extensively used for video captioning, audio transcription, and other applications that combine video, images, and audio with text, there have been much fewer studies on linking texts and numerical time series data.

It’s this gap that has sparked our interest, along with real-world discussions on popular financial commentary shows, like CNBC’s “The Squawk Box” and other financial news programs. There, financial and business experts attempt to explain the performance of a financial asset — say, a stock price — or some activity in an economic sector through commentary and information from a variety of sources based on their domain expertise.

The Squawk Bot automatically filters large amounts of textual information and extracts specific bits that might be related to the performance of an entity of interest as it evolves over time. The AI does so without the strict requirement of pre-aligning the text and time series data, or even explicitly labeling the data. The model automatically finds these cross-modality correlations and ranks their importance, providing user guidance for understanding the results.

Initial evaluation of our mechanism has shown promising results on large-scale financial news and stock prices data spanning several years. Our model automatically retrieved more than 80 percent of the relevant textual information (such as news articles) related to stock prices of interest that we had selected for our experiments. The model did so without prior knowledge of any human expert, without any application domain expertise, or the usage of any specific keywords or phrases.

The research is still ongoing, and we are now exploring various approaches to reduce the amount of data required by the model through data augmentation and few-shot learning. We are also looking into enriching the model with domain knowledge to make the retrieval of relevant content more targeted, to further improve the explainability of the learning process.

The next step is to broaden the applications of our model in the wider world of investment management, particularly for the analysis of financial text data and insights on investment decisions. We are also investigating how the model could be used as a “noise reduction” system — meaning getting the AI to retrieve only the most-relevant text for a given financial asset so that the text could be used for extraction of trading signals.

Another interesting application of the Squawk Bot would be in marketing campaigns, in particular for the discovery of the content that would best resonate with a given marketing performance metric. Soon, financial data won’t be that difficult to make sense of — for anyone.


Inventing What’s Next.

Stay up to date with the latest announcements, research, and events from IBM Research through our newsletter.


Principal Research Staff Member & Manager, IBM Research

Xuan-Hong Dang

Research Staff Member, IBM Research

Syed Yousaf Shah

Research Staff Member, IBM Research

More AI stories

We’ve moved! The IBM Research blog has a new home

In an effort better integrate the IBM Research blog with the IBM Research web experience, we have migrated to a new landing page:

Continue reading

Pushing the boundaries of human-AI interaction at IUI 2021

At the 2021 virtual edition of the ACM International Conference on Intelligent User Interfaces (IUI), researchers at IBM will present five full papers, two workshop papers, and two demos.

Continue reading

From HPC Consortium’s success to National Strategic Computing Reserve

Founded in March 2020 just as the pandemic’s wave was starting to wash over the world, the Consortium has brought together 43 members with supercomputing resources. Private and public enterprises, academia, government and technology companies, many of whom are typically rivals. “It is simply unprecedented,” said Dario Gil, Senior Vice President and Director of IBM Research, one of the founding organizations. “The outcomes we’ve achieved, the lessons we’ve learned, and the next steps we have to pursue are all the result of the collective efforts of these Consortium’s community.” The next step? Creating the National Strategic Computing Reserve to help the world be better prepared for future global emergencies.

Continue reading