Disk forensics to uncover exfiltrated file details
As outlined in Part 1 of this three-part series, one of the most important, and most challenging, questions to answer after confirming data exfiltration is: “What data was exfiltrated?” The answer to this question will determine whether customers, authorities or legal representatives need to be notified. It will also inform us of what risk the leaked data poses to the organization. Let’s look beyond the often-disabled object specific audit logs and the often-missing EDR coverage to see what can be obtained from a timely collected disk image.
When operating systems, such as Microsoft Windows, delete a file or folder from the disk, they don’t remove the data. Instead they mark that area of the disk as available for reuse and stop showing this marked data to the users and applications. As a result, with the right software and skills, this data can still potentially be recovered from the disk. This often carried out by searching for known file headers in the disk areas that are marked as free (or unallocated). This process, known as file carving, is a well-known forensic technique and could lead to the discovery of unexpected files. For example, those used for staging, compression or encryption, steps that are usually part of the data exfiltration process.
However, it is important to note, the increasing use of solid state and NVME disks has made the timely acquisition of evidence more important than ever. Due to the way these disks work, it has become much harder to recover deleted files from free space. The quicker a disk image is created, the greater the likelihood of recovering meaningful data.
There is another artifact that can be of use, called slack space. Operating systems fill disk space in the form of clusters and sectors of a specific size. If the file or part of the file written to disk is smaller than the minimum space allocated, part of that last cluster or sector will not be overwritten. This links back to the free space artifact; some of the old data is still there until it is overwritten. Now, it is unlikely that there will be complete files in that slack space, unless they are very small. However, given the vast number of clusters and sectors on a disk, there are often still remnants of log files or other files in there. We can find this data with certain forensic software.
The test scenario described below shows X-Ways Forensics, however other applications such as OpenText EnCase and Magnet Forensics Axiom can also be used to find this remnant data. It is important to realize that while the tool can do some of the work, experience and skills are required to extract and interpret the data. This is especially true when dealing with potential legal proceedings, as the evidence may need to hold up in court one day.
The example below shows the matches for an IP address (the source where the data was copied from) in the free space of the disk. Although the exact context is lost because the files were deleted and some of the disk areas were reused, it is some kind of progress output of files transferred from a share called “Company” on a host with IP address “192.168.1.98”. Given the data exists on the disk of the investigated host, it is likely (although unproven) that the data ended up on this host at some point. It will be relatively easy to create a script that can extract the filenames from this data. We can also dig deeper into the disk to discover other artifacts.
Using some of the extracted file names, we can start a new search through the image. The search below shows some of the matches for the string “Sales_Strategy_”, which was extracted from the previous search results. This leads to the $MFT file.
What exactly is this $MFT file? The $MFT file is the core database of the New Technology File System (NTFS). While its full implementation is beyond the scope of this article, simply put, it is a single large database with metadata of every file on disk. It links the filename to the creation and modification times on disk, permissions, the physical location of the file on disk and many other pieces of information.
Given the files with the names listed in the search results no longer exist on disk, why do corresponding entries still appear in the $MFT file? Again, the answer lies in how data is deleted. When a file is deleted, the entry in the $MFT database for that file is marked as available for reuse. If the copy of the $MFT, or preferably the entire disk image containing it, is captured in a timely manner, it can still contain orphaned MFT entries for recently deleted files. Along with providing a list of file names, it can include both file creation and deletion times. This is invaluable information to obtain when trying to piece together which data was exfiltrated and when the activity occurred.
The screenshots below show the output of $MFT Browser, a free tool developed by Costas K. (Kakos2000). Through many hours of processing, the tool carves out these orphaned MFT entries and presents them in a easily viewable format. Although dependent on the scenario, this technique can yield significant results. For example, out of the 5,000 files in the test setup, about 2000 filenames and their associated timestamps were still recoverable from the MFT. However, due to the deletion of the original files and their parent directory structures, reconstructing full file paths may be difficult and in some cases, impossible.
The $MFT file can also contain embedded files. For performance optimization reasons, very small files are embedded directly within the $MFT file, rather than having an entry pointing to a location on disk. This means that if the MFT record for small, deleted files is not reused yet, they still reside within the slack space of the $MFT file. Often small scripts of log files are embedded here, which can be found using string searches with forensic software or YARA rules. The screenshot below shows a 7-Zip command output, providing valuable file names, locations and file sizes related to the compression of data before exfiltration.
Many modern filesystems, including the Windows NTFS file system, possess journaling capabilities. At high level, journaling allows an operating system to track file and folder system operations such as writes, copies and renames. Although this is to protect data integrity, recovery and system stability, it is also a very useful artifact used by forensic investigators trying to identify file movements.
The NTFS file system utilizes two different journal files: the $LogFile and the USN Journal (UsnJrnl). Depending on the application, one or both files will be used and the amount of detail within these logs can be significant. File timestamps, sizes and operations are all recorded over time.
However, there is a downside being that this level of granularity requires a lot of space due to the default size limits of the journal files. Depending on how active the system is, with regards to disk operations (usually very active during staging and exfiltration), these journal entries might only go back a few days. This further emphasizes the importance of timely disk image acquisition. In some cases, it might be possible to use volume shadow copy files to reconstruct a longer time window. However, this takes additional effort and requires the shadow copy function to be enabled. The screenshot below shows some string matches for earlier discovered filenames pointing to the $LogFile. Based on this information, the NTFS $LogFile Parser tool, by Joakim Schicht, was used to export information from this journal file into various .csv files.
Once completed, the information is presented in a manageable and readable format (i.e., exported CSV files).
While disk forensics offers valuable insight into potential data theft, Part 3 will build on this by exploring how memory forensics can provide a deeper look into attacker activity.
Join security leaders who rely on the Think Newsletter for curated news on AI, cybersecurity, data and automation. Learn fast from expert tutorials and explainers—delivered directly to your inbox twice weekly. See the IBM Privacy Statement.