January 14, 2021 | Written by: Petros Zerfos, Xuan-Hong Dang, and Syed Yousaf Shah
Share this post:
Analysts’ reports, corporate earnings, stock prices, interest rates. Financial data isn’t an easy read. And there’s a lot of it.
Typically, teams of human experts go through and make sense of financial data. But as the volume of sources keeps surging, it’s becoming increasingly difficult for any human to read, absorb, understand, correlate, and act on all the available information.
We want to help.
In our recent work, “The Squawk Bot”: Joint Learning of Time Series and Text Data Modalities for Automated Financial Information Filtering, we detail an AI and machine learning mechanism that helps to correlate a large body of text with numerical data series describing financial performance as it evolves over time. Presented at the 2021 International Joint Conferences on Artificial Intelligence Organization (IJCAI), our deep learning-based system pulls from vast amounts of textual data potentially relevant textual descriptions that explain the performance of a financial metric of interest — without the need of human experts or labelled data.
Dubbed The Squawk Bot, the technology falls within the sub-field of AI and machine learning known as multimodal learning. This type of learning attempts to combine and model the data obtained from multiple data sources, potentially represented in different data types and forms.
While multimodal learning has been extensively used for video captioning, audio transcription, and other applications that combine video, images, and audio with text, there have been much fewer studies on linking texts and numerical time series data.
It’s this gap that has sparked our interest, along with real-world discussions on popular financial commentary shows, like CNBC’s “The Squawk Box” and other financial news programs. There, financial and business experts attempt to explain the performance of a financial asset — say, a stock price — or some activity in an economic sector through commentary and information from a variety of sources based on their domain expertise.
The Squawk Bot automatically filters large amounts of textual information and extracts specific bits that might be related to the performance of an entity of interest as it evolves over time. The AI does so without the strict requirement of pre-aligning the text and time series data, or even explicitly labeling the data. The model automatically finds these cross-modality correlations and ranks their importance, providing user guidance for understanding the results.
Initial evaluation of our mechanism has shown promising results on large-scale financial news and stock prices data spanning several years. Our model automatically retrieved more than 80 percent of the relevant textual information (such as news articles) related to stock prices of interest that we had selected for our experiments. The model did so without prior knowledge of any human expert, without any application domain expertise, or the usage of any specific keywords or phrases.
The research is still ongoing, and we are now exploring various approaches to reduce the amount of data required by the model through data augmentation and few-shot learning. We are also looking into enriching the model with domain knowledge to make the retrieval of relevant content more targeted, to further improve the explainability of the learning process.
The next step is to broaden the applications of our model in the wider world of investment management, particularly for the analysis of financial text data and insights on investment decisions. We are also investigating how the model could be used as a “noise reduction” system — meaning getting the AI to retrieve only the most-relevant text for a given financial asset so that the text could be used for extraction of trading signals.
Another interesting application of the Squawk Bot would be in marketing campaigns, in particular for the discovery of the content that would best resonate with a given marketing performance metric. Soon, financial data won’t be that difficult to make sense of — for anyone.
Inventing What’s Next.
Stay up to date with the latest announcements, research, and events from IBM Research through our newsletter.