My IBM

Log in

Subscribe

In the protobuf: Web browser artifacts using Google's data interchange format

Tags

22 July 2025

Author

Chris Tappin

APAC Lead for X-Force Incident Response

I knew the DB was trouble from the start

Like many people working in Digital Forensics and Incident Response, I’m drawn to a good mystery.

The X-Force IR team was recently called in by a client to investigate why their Endpoint Detection & Response (EDR) tool was alerting on a file within the system files of a user’s Microsoft Edge web browser. The file in question was ‘Network Action Predictor’, which is a database storing web browsing artifacts in Chromium-based browsers such as Chrome, Edge and Brave.

In this article, I’m going to cover how I managed to parse the useful information from the Resource Prefetch Predictor tables in the Network Action Predictor database, and how it could be of use in your Digital Forensics, Incident Response or Threat Hunting workflow.

That is so Resource Prefetch Predictor!

Using everyone’s favourite SQLite DB browser, DB Browser for SQLite (DB4S), allowed us to very quickly confirm the domain names that the EDR tool in question alerted on.

Googling ‘Network Action Predictor’ led me to Kevin Pagano’s 2021 blog on the topic, which is a great overview, but the table I was most interested in, resource_prefetch_predictor_origin, wasn’t mentioned.

There is protobuf data stored within a blob in each record within four of the six tables in the database:

lcp_critical_path_predictor
lcp_critical_path_predictor_initiator_origin
resource_prefetch_predictor_host_redirect
resource_prefetch_predictor_origin

of the other two:

resource_prefetch_predictor_metadata in our test data contains a single key-value pair. Interestingly, this six-table version of the database states ‘version=11’, but the four-table version analyzed by Kevin (from Android 11) also states ‘version=11’.
network_action_predictor is the table of interest to most investigations, which is covered in Kevin’s blog above, but also check out Ryan Benson’s blog post, too, if you love a Sankey diagram as much as I do.

Loading the blob from the resource_prefetch_predictor_origin table that was causing the EDR alerts into the “Cyber Swiss Army Knife”, CyberChef, as suggested in Kevin’s blog, gave the following:

{ "1": "www.starwars.com", "2": 13246294457060644, "3": [ { "1": "https://www.starwars.com/", "2": 2, "4": 0, "5": 4607182418800017400, "6": 0, "7": 1 }, [Some entries removed for readability -CJT] { "1": "https://connect.facebook.net/", "2": 1, "3": 1, "4": 1, "5": 4618441417868444000, "6": 0, "7": 0 }, { "1": "https://static-mh.content.disney.io/", "2": 2, "4": 0, "5": 4619004367821865000, "6": 0, "7": 0 }, { "1": "https://cdn.registerdisney.go.com/", "2": 1, "5": 4621256167635550000, "6": 0, "7": 0 } ] }

I’m using the starwars.com entry from the same sample Android 11 image as Kevin’s blog as sample data above, but in the real investigation there was a mix of malicious and legitimate URLs featured.

Trying to make Resource Prefetch Predictor happen

There is something there, but what? I played around with CyberChef, trying to build a ‘.proto’ schema file to help decode this, but ended up labelling lots of fields UnknownDataA, UnknownDataB, etc. While doing this, I was skimming Google’s Protocol Buffers documentation and realized that the browser would have needed such a file to read and write the data for its own purposes, and it occurred to me that:

This file might be part of the Chromium source code, and
Given that Chromium is open source, it might be fairly easy to track down.

This turned out to be the case, and once I’d provided the relevant sections of resource_prefetch_predictor.proto to CyberChef, the results were a lot more readable:

{ "origins": [ { "origin": "https://www.starwars.com/", "numberOfHits": 2, "numberOfMisses": 0, "consecutiveMisses": 0, "averagePosition": 1, "alwaysAccessNetwork": false, "accessedNetwork": true }, [Some entries removed for readability -CJT] { "origin": "https://connect.facebook.net/", "numberOfHits": 1, "numberOfMisses": 1, "consecutiveMisses": 1, "averagePosition": 6, "alwaysAccessNetwork": false, "accessedNetwork": false }, { "origin": "https://static-mh.content.disney.io/", "numberOfHits": 2, "numberOfMisses": 0, "consecutiveMisses": 0, "averagePosition": 6.5, "alwaysAccessNetwork": false, "accessedNetwork": false }, { "origin": "https://cdn.registerdisney.go.com/", "numberOfHits": 1, "numberOfMisses": 0, "consecutiveMisses": 0, "averagePosition": 9, "alwaysAccessNetwork": false, "accessedNetwork": false } ], "host": "www.starwars.com", "lastVisitTime": 13246294457060644 }

Drawing the rest of the owl

The next goal was parsing these URLs with a Python script so that they can be sent to a reputation checking API or added to a timeline.

Google provides guidance for working with protobufs in Python, but it’s a touch fiddly and probably too time-consuming for the harried forensicator during a case. Luckily, once the module has been generated for a ‘.proto’ file, it can be reused. So here’s one I prepared earlier, adapting the Google tutorial to my purposes. I also made a file for the resource_prefetch_predictor_host_redirect blobs.

Note: to use these _pb2 files, you will need to install the protobuf library for Python. The easiest way to do this is with pip:

pip install protobuf

All that’s left is importing the respective file and parsing your Network Action Predictor database of choice:

import sqlite3 import RPPO_pb2 sqlite_db_file = "Network Action Predictor" table='resource_prefetch_predictor_origin' con = sqlite3.connect(sqlite_db_file) cur = con.cursor() res = cur.execute(f"SELECT * FROM {table}") records = res.fetchall() RPPO = RPPO_pb2.OriginData() for record in records: RPPO.ParseFromString(record[1]) print(RPPO)

From there, you can work on accessing only the data you need. You may want to print the value in ‘RPPO.host’ and then loop through all of the origins in origin and print those URLs too:

for i in RPPO.origins: print (i.origin)

You may also want to decode the timestamp (which uses the same epoch as WebKit) with a function such as this:

import datetime def parse_webkit_timestamp(timestamp): time = datetime.timedelta(microseconds=int(timestamp)) time = datetime.datetime(1601,1,1) + time return (time)

Head over to ChrisTappin/Make-Resource-Prefetch-Predictor-Happen on GitHub if you want an example script that can parse the records from a database, either to read or produce a CSV.

Industry newsletter

The latest tech news, backed by expert insights

Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement.

Inquire about IBM X-Force Premier Threat Intelligence

Receive real-time, actionable insights to help your organization combat evolving cyber threats. With seamless integration via a flexible API, you can incorporate the threat intelligence feed into your preferred platforms for maximum adaptation.

Resources

See why KuppingerCole ranks IBM as a leader

The KuppingerCole data security platforms report offers guidance and recommendations to find sensitive data protection and governance products that best meet clients’ needs.

IBM X-Force® threat intelligence index 2025

Gain insights to prepare and respond to cyberattacks with greater speed and effectiveness with the IBM X-Force threat intelligence index.

The total economic impact (TEI) of Guardium® data protection

Discover the benefits and ROI of IBM® Security Guardium data protection in this Forrester TEI study.

Navigating the regulatory landscape and the impact on data protection and storage

Learn about strategies to simplify and accelerate your data resilience roadmap while addressing the latest regulatory compliance requirements.

Cost of a data breach report 2024

Data breach costs have hit a new high. Get essential insights to help your security and IT teams better manage risk and limit potential losses.

Expand your skills with free security tutorials

Follow clear steps to complete tasks and learn how to effectively use technologies in your projects.

What is identity and access management (IAM)?

Identity and access management (IAM) is a cybersecurity discipline that deals with user access and resource permissions.

Related solutions

Related solutions

Data security and protection solutions

Protect data across multiple environments, meet privacy regulations and simplify operational complexity.

Explore data security solutions

IBM Guardium

Discover IBM Guardium, a family of data security software that protects sensitive on-premises and cloud data.

Explore IBM Guardium

Data security services

IBM provides comprehensive data security services to protect enterprise data, applications and AI.

Explore data security services

Take the next step

Protect your data across its lifecycle with IBM Guardium. Secure critical enterprise data from both current and emerging risks, wherever it lives.

Explore IBM Guardium

Book a live demo