A group of professionals collaborate in a high-tech control room featuring multiple large screens displaying data, system interfaces, and floor plans. The setting suggests a focus on monitoring, analysis, and decision-making. The environment is brightly lit with a professional atmosphere.

They stole… what? Identifying exfiltrated data: Part 1

By Frank Siemons

Published 27 May 2026

When a cyberattack leaves 5,000 files from a company share deleted, the challenge isn’t just detecting the breach, but recovering the lost data. In this three-part series, we’ll guide you through a realistic scenario on data exfiltration and recovery that occurs during a cyber incident in which a threat actor has taken these files, compressed them into archive files for upload staging and subsequently deleted them.

This series will provide an overview of the disk and memory forensic techniques needed to identify and recover lost data. In Part 1, we look at some of the challenges in getting visibility into this crucial area of response to an incident. Part 2 covers a more technical overview of opportunities using disk forensics. Finally, Part 3 explores opportunities using memory forensics.

Where is the evidence?

When a breach occurs, one of the most critical questions to answer is what data the threat actor accessed after gaining entry to the environment. During an incident response investigation, it often becomes clear how the threat actor achieved initial access, how they moved laterally and even the volume of data exfiltrated. What is usually unclear, however, is the nature of the data itself.

Were sensitive patient records involved? Did the data include company or government secrets, credit card details or, merely, publicly accessible information?

The answers to these questions directly inform many response decisions, including legal obligations and regulatory considerations, mandatory reporting requirements, risks to users and customers, and the potential impact on ransom negotiations. Given how critical this information is, why is it so difficult to determine, even for experienced forensic professionals?

Why is this so hard?

The challenge is twofold. First, operating systems limit the verbosity of activity logging, mainly due to storage and resource limitations. For instance, a connection from one system to another might generate and store a few network logs and some authentication logs, but the access to, and subsequent copying and writing of tens of thousands of files, could generate tens of thousands of log entries that need to be processed and written to disk somewhere. The resulting impact on storage and performance is why operating systems, like Microsoft Windows, often disable these object (file) activity logs by default.

Second, threat actors can delete or tamper with potential evidence on the compromised system. This deletion will generate events or even alerts, but even if they are noticed, the exfiltration will likely already have taken place.

What can we do?

So what options are there to capture and retain this critical information? The first approach is proactive. Organizations can address the visibility around object specific access by implementing a Windows audit policy, which is covered in this Microsoft article. This policy will enable audit logs and store them on the local machine, just like other Windows event logs. If a centralized logging solution is in place, which is a critical security control, these additional audit logs end up at the same location.

However, this additional verbosity will require a lot more storage. Based on the average time to detect a compromise and the time to identify and acquire the required log files, it is important to ensure log retention remains at a minimum of 90 days. If retention is defined by using storage limitations, that capacity needs to be greatly increased to ensure that object specific audit policies do not reduce the retention times of already existing logs.

Another proactive approach is the use of security tools such as data leak prevention (DLP) solutions or some endpoint detection and response (EDR) products. These solutions are capable of logging and also alerting or blocking suspicious file-related activity.

Cloud solutions such as Microsoft SharePoint and OneDrive already have very detailed logs. These logs cover actions like view, edit, copy, move and download. What happens with the visibility, particularly once a threat actor downloads the files to a local device to stage them for exfiltration? A cloud solution such as OneDrive does not record this; endpoint-based object specific visibility is often still needed.

Based on numerous IBM X-Force Incident Response (IR) investigations, this level of visibility rarely exists in organizations that have yet not experienced large-scale data exfiltration incidents. There are several reasons for this, but the most likely factor is a lack of awareness. Many organizations do not realize that this visibility does not exist by default.

Given the importance of this information, the corresponding logs should be recorded in any operating system. Investigators often end up in a situations where they can see everything a threat actor did, recorded within many different evidence sources, yet are unable to determine what files were in a significant batch of uploaded data. Performance overheads may be the main reason this logging is not enabled by default in many operating systems.

Another reason for the visibility gap is the cost associated with storing significantly larger volumes of log data. However, it is important to weigh any perceived cost savings against the risk of not knowing what data was taken from the organization. This can force the difficult, and often costly, assumption that all accessible data was exfiltrated.

The case for a forensic approach

Even without existing file object logs or DLP and EDR solutions, options exist for an experienced digital forensics team such as IBM X-Force. Under the right circumstances, it’s possible to find evidence of exfiltration, even at a very detailed level, by performing memory and disk forensic analysis. In this case, time is a critical factor. Many of the potential artifacts are stored in volatile memory and memory-like artifacts on disk, such as the page file and the hibernation file. Data stored in these files is continuously reused and if it takes a week to obtain an image or the system is rebooted, crucial evidence could be lost.

The process of disk and memory forensics is time consuming, making it hard to scale. IBM X-Force recently worked on a ransomware incident in which the threat actor directly exfiltrated data to several cloud storage locations from nearly two dozen compromised systems. While deep‑dive analysis is justifiable for a few systems, doing so at larger scale becomes costly and time‑consuming unless an automation platform is available.

Regardless of how it is implemented, a forensic approach is often the only remaining option when other forms of visibility are unavailable. Given the importance of the question at hand, “What data was taken?” the effort is easily justified.

Read on for Part 2: Disk forensics to uncover exfiltrated file details

Would your team catch the next zero-day in time?

Join security leaders who rely on the Think Newsletter for curated news on AI, cybersecurity, data and automation. Learn fast from expert tutorials and explainers—delivered directly to your inbox twice weekly. See the IBM Privacy Statement.

Senior Managing Consultant - IBM X-Force

Achieve continuous compliance in a hybrid data world with IBM® Guardium® Data Protection

Register for this webinar to learn how AI governance helps organizations manage risk, meet evolving regulations and build trusted, responsible AI at scale.

Related

Explore IBM X-Force Incident Response services

Resources

Smarter AI governance and security solutions

Learn how to turn governance and security into drivers of resilience, smarter decision-making and confident growth with practical strategies from this buyer’s guide.

IBM X-Force Threat Intelligence Index 2026

Gain insights to prepare and respond to cyberattacks with greater speed and effectiveness with the IBM X-Force® Threat Intelligence Index.

Agent ops and responsible AI

Join this webinar to explore practical strategies for operating and governing AI agents responsibly at scale, with expert insights on observability, risk management and accountable AI operations.

See why KuppingerCole ranks IBM as a leader

The KuppingerCole data security platforms report offers guidance and recommendations to find sensitive data protection and governance products that best meet clients’ needs.

Guardium webinars

Learn how to protect your data at every stage of its lifecycle in our webinars.

The total economic impact (TEI) of Guardium Data Protection

Discover the benefits and ROI of IBM Guardium® Data Protection in this Forrester TEI study.

Gartner® Market Guide for AI TRiSM

Access this Gartner guide to learn how to manage the complete AI inventory and secure your AI workloads with guardrails. It also shows how to reduce risk and manage the governance process to achieve AI trust for all AI use cases in your organization.

Expand your skills with free security tutorials

Follow clear steps to complete tasks and learn how to effectively use technologies in your projects.

What is identity and access management (IAM)?

Identity and access management (IAM) is a cybersecurity discipline that deals with user access and resource permissions.

Related solutions

IBM Guardium®

Protect your most critical data—discover, monitor and secure sensitive information across environments while automating compliance and reducing risk.

Explore IBM Guardium

Data security solutions

Protect data everywhere—discover, classify, monitor and secure sensitive information across your environment.

Explore data security solutions

Data security services

IBM provides comprehensive data security services to protect enterprise data, applications and AI.

Explore data security services

Take the next step

Secure sensitive data and strengthen privacy controls across hybrid environments with centralized monitoring and automated risk reduction.