A group of professionals collaborate in a high-tech control room featuring multiple large screens displaying data, system interfaces, and floor plans. The setting suggests a focus on monitoring, analysis, and decision-making. The environment is brightly lit with a professional atmosphere.

They stole… what? Identifying exfiltrated data: Part 1

When a cyberattack leaves 5,000 files from a company share deleted, the challenge isn’t just detecting the breach, but recovering the lost data. In this three-part series, we’ll guide you through a realistic scenario on data exfiltration and recovery that occurs during a cyber incident in which a threat actor has taken these files, compressed them into archive files for upload staging and subsequently deleted them.

This series will provide an overview of the disk and memory forensic techniques needed to identify and recover lost data. In Part 1, we look at some of the challenges in getting visibility into this crucial area of response to an incident. Part 2 covers a more technical overview of opportunities using disk forensics. Finally, Part 3 explores opportunities using memory forensics.

Where is the evidence?

When a breach occurs, one of the most critical questions to answer is what data the threat actor accessed after gaining entry to the environment. During an incident response investigation, it often becomes clear how the threat actor achieved initial access, how they moved laterally and even the volume of data exfiltrated. What is usually unclear, however, is the nature of the data itself.

Were sensitive patient records involved? Did the data include company or government secrets, credit card details or, merely, publicly accessible information?

The answers to these questions directly inform many response decisions, including legal obligations and regulatory considerations, mandatory reporting requirements, risks to users and customers, and the potential impact on ransom negotiations. Given how critical this information is, why is it so difficult to determine, even for experienced forensic professionals?

Why is this so hard?

The challenge is twofold. First, operating systems limit the verbosity of activity logging, mainly due to storage and resource limitations. For instance, a connection from one system to another might generate and store a few network logs and some authentication logs, but the access to, and subsequent copying and writing of tens of thousands of files, could generate tens of thousands of log entries that need to be processed and written to disk somewhere. The resulting impact on storage and performance is why operating systems, like Microsoft Windows, often disable these object (file) activity logs by default.

Second, threat actors can delete or tamper with potential evidence on the compromised system. This deletion will generate events or even alerts, but even if they are noticed, the exfiltration will likely already have taken place.

What can we do?

So what options are there to capture and retain this critical information? The first approach is proactive. Organizations can address the visibility around object specific access by implementing a Windows audit policy, which is covered in this Microsoft article. This policy will enable audit logs and store them on the local machine, just like other Windows event logs. If a centralized logging solution is in place, which is a critical security control, these additional audit logs end up at the same location.

However, this additional verbosity will require a lot more storage. Based on the average time to detect a compromise and the time to identify and acquire the required log files, it is important to ensure log retention remains at a minimum of 90 days. If retention is defined by using storage limitations, that capacity needs to be greatly increased to ensure that object specific audit policies do not reduce the retention times of already existing logs.

Another proactive approach is the use of security tools such as data leak prevention (DLP) solutions or some endpoint detection and response (EDR) products. These solutions are capable of logging and also alerting or blocking suspicious file-related activity.

Cloud solutions such as Microsoft SharePoint and OneDrive already have very detailed logs. These logs cover actions like view, edit, copy, move and download. What happens with the visibility, particularly once a threat actor downloads the files to a local device to stage them for exfiltration? A cloud solution such as OneDrive does not record this; endpoint-based object specific visibility is often still needed.

Based on numerous IBM X-Force Incident Response (IR) investigations, this level of visibility rarely exists in organizations that have yet not experienced large-scale data exfiltration incidents. There are several reasons for this, but the most likely factor is a lack of awareness. Many organizations do not realize that this visibility does not exist by default.

Given the importance of this information, the corresponding logs should be recorded in any operating system. Investigators often end up in a situations where they can see everything a threat actor did, recorded within many different evidence sources, yet are unable to determine what files were in a significant batch of uploaded data. Performance overheads may be the main reason this logging is not enabled by default in many operating systems.

Another reason for the visibility gap is the cost associated with storing significantly larger volumes of log data. However, it is important to weigh any perceived cost savings against the risk of not knowing what data was taken from the organization. This can force the difficult, and often costly, assumption that all accessible data was exfiltrated.

The case for a forensic approach

Even without existing file object logs or DLP and EDR solutions, options exist for an experienced digital forensics team such as IBM X-Force. Under the right circumstances, it’s possible to find evidence of exfiltration, even at a very detailed level, by performing memory and disk forensic analysis. In this case, time is a critical factor. Many of the potential artifacts are stored in volatile memory and memory-like artifacts on disk, such as the page file and the hibernation file. Data stored in these files is continuously reused and if it takes a week to obtain an image or the system is rebooted, crucial evidence could be lost.

The process of disk and memory forensics is time consuming, making it hard to scale. IBM X-Force recently worked on a ransomware incident in which the threat actor directly exfiltrated data to several cloud storage locations from nearly two dozen compromised systems. While deep‑dive analysis is justifiable for a few systems, doing so at larger scale becomes costly and time‑consuming unless an automation platform is available.

Regardless of how it is implemented, a forensic approach is often the only remaining option when other forms of visibility are unavailable. Given the importance of the question at hand, “What data was taken?” the effort is easily justified.

Frank Siemons

Senior Managing Consultant - IBM X-Force

Related solutions
IBM Guardium®

Protect your most critical data—discover, monitor and secure sensitive information across environments while automating compliance and reducing risk.

    Explore IBM Guardium
    Data security solutions

    Protect data everywhere—discover, classify, monitor and secure sensitive information across your environment.

      Explore data security solutions
      Data security services

      IBM provides comprehensive data security services to protect enterprise data, applications and AI.

      Explore data security services
      Take the next step

      Secure sensitive data and strengthen privacy controls across hybrid environments with centralized monitoring and automated risk reduction.

      1. Explore IBM Guardium
      2. Explore data security solutions