My IBM Log in Subscribe

In the protobuf: Web browser artifacts using Google's data interchange format

22 July 2025

Author

Chris Tappin

APAC Lead for X-Force Incident Response

I knew the DB was trouble from the start

Like many people working in Digital Forensics and Incident Response, I’m drawn to a good mystery.

The X-Force IR team was recently called in by a client to investigate why their Endpoint Detection & Response (EDR) tool was alerting on a file within the system files of a user’s Microsoft Edge web browser. The file in question was ‘Network Action Predictor’, which is a database storing web browsing artifacts in Chromium-based browsers such as Chrome, Edge and Brave.

In this article, I’m going to cover how I managed to parse the useful information from the Resource Prefetch Predictor tables in the Network Action Predictor database, and how it could be of use in your Digital Forensics, Incident Response or Threat Hunting workflow.

That is so Resource Prefetch Predictor!

Using everyone’s favourite SQLite DB browser, DB Browser for SQLite (DB4S), allowed us to very quickly confirm the domain names that the EDR tool in question alerted on.

Googling ‘Network Action Predictor’ led me to Kevin Pagano’s 2021 blog on the topic, which is a great overview, but the table I was most interested in, resource_prefetch_predictor_origin, wasn’t mentioned.

There is protobuf data stored within a blob in each record within four of the six tables in the database:

  • lcp_critical_path_predictor
  • lcp_critical_path_predictor_initiator_origin
  • resource_prefetch_predictor_host_redirect
  • resource_prefetch_predictor_origin

of the other two:

  • resource_prefetch_predictor_metadata in our test data contains a single key-value pair. Interestingly, this six-table version of the database states ‘version=11’, but the four-table version analyzed by Kevin (from Android 11) also states ‘version=11’.
  • network_action_predictor is the table of interest to most investigations, which is covered in Kevin’s blog above, but also check out Ryan Benson’s blog post, too, if you love a Sankey diagram as much as I do.

Loading the blob from the resource_prefetch_predictor_origin table that was causing the EDR alerts into the “Cyber Swiss Army Knife”, CyberChef, as suggested in Kevin’s blog, gave the following:

{     "1": "www.starwars.com",     "2": 13246294457060644,     "3": [         {             "1": "https://www.starwars.com/",             "2": 2,             "4": 0,             "5": 4607182418800017400,             "6": 0,             "7": 1         }, [Some entries removed for readability -CJT]         {             "1": "https://connect.facebook.net/",             "2": 1,             "3": 1,             "4": 1,             "5": 4618441417868444000,             "6": 0,             "7": 0         },         {             "1": "https://static-mh.content.disney.io/",             "2": 2,             "4": 0,             "5": 4619004367821865000,             "6": 0,             "7": 0         },         {             "1": "https://cdn.registerdisney.go.com/",             "2": 1,             "5": 4621256167635550000,             "6": 0,             "7": 0         }     ] }

I’m using the starwars.com entry from the same sample Android 11 image as Kevin’s blog as sample data above, but in the real investigation there was a mix of malicious and legitimate URLs featured.

Trying to make Resource Prefetch Predictor happen

There is something there, but what? I played around with CyberChef, trying to build a ‘.proto’ schema file to help decode this, but ended up labelling lots of fields UnknownDataA, UnknownDataB, etc. While doing this, I was skimming Google’s Protocol Buffers documentation and realized that the browser would have needed such a file to read and write the data for its own purposes, and it occurred to me that:

  1. This file might be part of the Chromium source code, and
  2. Given that Chromium is open source, it might be fairly easy to track down.

This turned out to be the case, and once I’d provided the relevant sections of resource_prefetch_predictor.proto to CyberChef, the results were a lot more readable:

{     "origins": [         {             "origin": "https://www.starwars.com/",             "numberOfHits": 2,             "numberOfMisses": 0,             "consecutiveMisses": 0,             "averagePosition": 1,             "alwaysAccessNetwork": false,             "accessedNetwork": true         }, [Some entries removed for readability -CJT]         {             "origin": "https://connect.facebook.net/",             "numberOfHits": 1,             "numberOfMisses": 1,             "consecutiveMisses": 1,             "averagePosition": 6,             "alwaysAccessNetwork": false,             "accessedNetwork": false         },         {             "origin": "https://static-mh.content.disney.io/",             "numberOfHits": 2,             "numberOfMisses": 0,             "consecutiveMisses": 0,             "averagePosition": 6.5,             "alwaysAccessNetwork": false,             "accessedNetwork": false         },         {             "origin": "https://cdn.registerdisney.go.com/",             "numberOfHits": 1,             "numberOfMisses": 0,             "consecutiveMisses": 0,             "averagePosition": 9,             "alwaysAccessNetwork": false,             "accessedNetwork": false         }     ],     "host": "www.starwars.com",     "lastVisitTime": 13246294457060644 }

Drawing the rest of the owl

The next goal was parsing these URLs with a Python script so that they can be sent to a reputation checking API or added to a timeline.

Google provides guidance for working with protobufs in Python, but it’s a touch fiddly and probably too time-consuming for the harried forensicator during a case. Luckily, once the module has been generated for a ‘.proto’ file, it can be reused. So here’s one I prepared earlier, adapting the Google tutorial to my purposes. I also made a file for the resource_prefetch_predictor_host_redirect blobs.

Note: to use these _pb2 files, you will need to install the protobuf library for Python. The easiest way to do this is with pip:

pip install protobuf

All that’s left is importing the respective file and parsing your Network Action Predictor database of choice:

import sqlite3 import RPPO_pb2 sqlite_db_file = "Network Action Predictor" table='resource_prefetch_predictor_origin' con = sqlite3.connect(sqlite_db_file) cur = con.cursor() res = cur.execute(f"SELECT * FROM {table}") records = res.fetchall() RPPO = RPPO_pb2.OriginData() for record in records:       RPPO.ParseFromString(record[1])       print(RPPO)

From there, you can work on accessing only the data you need. You may want to print the value in ‘RPPO.host’ and then loop through all of the origins in origin and print those URLs too:

for i in RPPO.origins:       print (i.origin)

You may also want to decode the timestamp (which uses the same epoch as WebKit) with a function such as this:

import datetime def parse_webkit_timestamp(timestamp):     time = datetime.timedelta(microseconds=int(timestamp))     time = datetime.datetime(1601,1,1) + time     return (time)

Head over to ChrisTappin/Make-Resource-Prefetch-Predictor-Happen on GitHub if you want an example script that can parse the records from a database, either to read or produce a CSV.

The latest tech news, backed by expert insights

Stay up to date on the most important—and intriguing—industry trends on AI, automation, data and beyond with the Think newsletter. See the IBM Privacy Statement.

Thank you! You are subscribed.

Your subscription will be delivered in English. You will find an unsubscribe link in every newsletter. You can manage your subscriptions or unsubscribe here. Refer to our IBM Privacy Statement for more information.

Related solutions

Related solutions

Data security and protection solutions

Protect data across multiple environments, meet privacy regulations and simplify operational complexity.

    Explore data security solutions
    IBM Guardium

    Discover IBM Guardium, a family of data security software that protects sensitive on-premises and cloud data.

     

      Explore IBM Guardium
      Data security services

      IBM provides comprehensive data security services to protect enterprise data, applications and AI.

      Explore data security services
      Take the next step

      Protect your data across its lifecycle with IBM Guardium. Secure critical enterprise data from both current and emerging risks, wherever it lives.

      Explore IBM Guardium Book a live demo