May 1, 2020 | Written by: JR Rao and Xiaokui Shu
Share this post:
“Modern computing systems act as black boxes in that they accept inputs and generate outputs but provide little to no visibility of their internal workings. This greatly limits the potential to understand cyber behaviors at the level of detail necessary to detect and counter some of the most important types of cyber threats, particularly advanced persistent threats (APTs).” — DARPA Transparent Computing Program.
Cybersecurity is one of the most complex and pervasive issues of our time. The IBM Research Security team is focused on building trust in all of IBM’s technologies, services and solutions. Security is a collaborative endeavor — we work with clients, partners, the industry and government entities to build a good offense and play a good defense against attackers. Four years ago, the Defense Advanced Research Projects Agency (DARPA) put out a call to academics and researchers to explore and to create a suite of technologies in a program to investigate Transparent Computing (TC) and in essence, to create solutions to stave off Advanced Persistent Threats (APTs) which are extremely well-planned organized and detrimental – and costly to an organization from both an economic and reputational standpoint.
The TC program was segmented into several teams, targeting two questions that DARPA had put forth including (i) how to provide visibility into the insidious APTs as they target enterprise and cloud environments (ii) how to make use of the big data to discover hidden threats and take prompt actions and (iii) how to do this at enterprise and cloud-scale manner.
IBM Research was selected to participate in the TC program. With IBM Research as the prime researcher, the team comprised distinguished and leading security researchers from Stony Brook University, University of Illinois at Chicago and Northwestern University. The team, called MARPLE, for “Mitigating APT Reasoning with Provenance in Large Enterprise networks” was charged with developing an umbrella of technologies and systems to answer the second question and would achieve a leadership position in multiple DARPA red team evaluations over a period of four years. Ranging from the development of new cyber reasoning methodology to the realization of a multi-aspect detection system, more than 20 researchers, professors and students in the MARPLE team blueprinted feasible directions into the future for cyber reasoning over big security data.
How does all of this translate to today’s digital world and applicable to winning the battle of cybersecurity? Here’s a fictional case study to illustrate APTs and the need to have deeper visibility into these so-called black boxes that MARPLE investigated:
A well-known app company has a data breach with suspicious credit card information that is leaking from the mobile app. The app company’s technical team investigates the incident, reviews the design of the app and its back-end database but reaches no findings. Then, they dig further into the app source code and it takes weeks to unearth a stealthy attack targeting the company – the chief engineer’s laptop was hacked, and the attacker implanted some malicious logic into the source code of their app. But it remains hidden and undetected. The attacker is able to penetrate multiple systems of the company, understand their development cycle, retrieve their source code, and pinpoint what lines to change and where. Security monitoring using industry standard tooling lacks visibility into such attacks. It could have picked up on and raised sporadic attack alerts on some steps of APTs but these are unfortunately diluted in the millions of alerts that are customarily raised by such devices and are never analyzed. The forensic analysis team needs more data to track the provenance. Real-time detection systems need more visibility and new algorithms to dig around alerts, associate them with larger plots, and connect with human intelligence to perform sophisticated reasoning procedures to uncover the threat before it reaches its final stage.
A New Cyber Reasoning Paradigm
One example of a technology that we developed to efficiently exploit the connectivity in big security data and perform comprehensive reasoning on top of a heterogeneous information network, was the design and development of a new methodology for cyber reasoning named Threat Intelligence Computing . The methodology bridges the practice of threat hunting and traditional security software development into a uniform programming paradigm. It transforms existing ad-hoc threat hunting practices into systematic knowledge codification, sharing, and application procedures, which enables us to understand system behaviors, build on learned knowledge of behaviors and threats, and compose hunting playbooks during red team engagements with little prior knowledge of the systems that need to be protected and the attacks that need to be discovered. IBM calls the paradigm τ-calculus, a graph computation platform consisting of a domain-specific language with syntax tailored for cyber reasoning, a distributed graph database, an interactive console, and a graph visualization and inspection tool. Exploring system behaviors and developing new detection strategies on-the-fly with the new paradigm, MARPLE is extremely agile in responding to small traces of unmodeled attacks, discovering the story behind them, and automating the detection.
Eliminating Detection Dead Angle
There is no silver bullet for cybersecurity. Every detection model is designed and optimized for a purpose. MARPLE achieved high threat detection coverage by leveraging multiple pre-built detection models as well as dynamically threat hypothesis creation and verification capabilities using τ-calculus. IBM has also developed a scalable anomaly detection system using sketch algorithms to alert behavior variants. Stony Brook University developed the SLEUTH  system to conduct detection with provenance tags in rules. University of Illinois at Chicago developed the HOLMES  system to connect low-level security data with known APT knowledge in the MITRE framework. Northwestern University collected and extracted behaviors from hundreds of Windows malware for detection. With several more pioneering detection modules developed within MARPLE and tuned by IBM as well as a Kafka-based security information exchange service, the complete MARPLE system combines the strength of multiple advanced detection modules and dynamically performed threat hunting procedures. It minimizes blind spots in detection, as evidenced in the series of red team engagements and successfully discovers a wide range of attack pieces from port scan to process exploits, DLL injection, data exfiltration, ransomware, stolen credentials, and uncovers several APT campaigns behind the steps.
Not only was the TC program successful in terms of developing basic technologies that are separable and usable in isolation but it was also a long-term project where IBM researchers were able to collaboratively work with colleagues from academia to find security solutions in the digital age.
This project was sponsored by the Air Force Research Laboratory (AFRL) and the Defense Advanced Research Agency (DARPA) under the award number FA8650-15-C-7561. The views, opinions, and/or findings contained in this article are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.
 Xiaokui Shu, Frederico Araujo, Douglas L. Schales, Marc Ph. Stoecklin, Jiyong Jang, Heqing Huang, and Josyula R. Rao. 2018. Threat Intelligence Computing. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS ’18). ACM, New York, NY, USA, 1883-1898. DOI: https://doi.org/10.1145/3243734.3243829
 Hossain MN, Milajerdi SM, Wang J, Eshete B, Gjomemo R, Sekar R, Stoller S, Venkatakrishnan VN. SLEUTH: Real-time Attack Scenario Reconstruction from COTS Audit Data. In Proceedings of the 26th USENIX Security Symposium. USENIX Security, 2017.
 Milajerdi SM, Gjomemo R, Eshete B, Sekar R, Venkatakrishnan VN. HOLMES: Real-time APT Detection through Correlation of Suspicious Information Flows. In Proceedings of the IEEE Symposium on Security and Privacy (S&P). IEEE, May 2019.