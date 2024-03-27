One approach to estimating Scope 3 emissions is to leverage financial transaction data (for example, spend) as a proxy for emissions associated with goods and/or services purchased. Converting this financial data into GHG emissions inventory requires information on the GHG emissions impact of the product or service purchased.

The US Environmentally-Extended Input-Output (USEEIO) is a lifecycle assessment (LCA) framework that traces economic and environmental flows of goods and services within the United States. USEEIO offers a comprehensive dataset and methodology that merges economic IO analysis with environmental data to estimate the environmental consequences associated with economic activities. Within USEEIO, goods and services are categorized into 66 spend categories, referred to as commodity classes, based on their common environmental characteristics. These commodity classes are associated with emission factors used to estimate environmental impacts using expenditure data.

The Eora MRIO (Multi-region input-output) dataset is a globally recognized spend-based emission factor set that documents the inter-sectoral transfers amongst 15.909 sectors across 190 countries. The Eora factor set has been modified to align with the USEEIO categorization of 66 summary classifications per country. This involves mapping the 15.909 sectors found across the Eora26 categories and more detailed national sector classifications to the USEEIO 66 spend categories.

This is where LLMs come into play. In recent years, remarkable strides have been achieved in crafting extensive foundation language models for natural language processing (NLP). These innovations have showcased strong performance in comparison to conventional machine learning (ML) models, particularly in scenarios where labelled data is in short supply. Capitalizing on the capabilities of these large pre-trained NLP models, combined with domain adaptation techniques that make efficient use of limited data, presents significant potential for tackling the challenge associated with accounting for Scope 3 environmental impact.

Our approach involves fine-tuning foundation models to recognize Environmentally-Extended Input-Output (EEIO) commodity classes of purchase orders or ledger entries which are written in natural language. Subsequently, we calculate emissions associated with the spend using EEIO emission factors (emissions per $ spent) sourced from Supply Chain GHG Emission Factors for US Commodities and Industries for US-centric datasets, and the Eora MRIO (Multi-region input-output) for global datasets. This framework helps streamline and simplify the process for businesses to calculate Scope 3 emissions.

Figure 1 illustrates the framework for Scope 3 emission estimation employing a large language model. This framework comprises four distinct modules: data preparation, domain adaptation, classification and emission computation.