Designing mappings for OpenLineage events

When you design mappings for OpenLineage events, decide on the conditions and actions of each rule.

OpenLineage event structure

An OpenLineage event might have the following structure:

{
  "eventTime": "2025-08-21T10:03:33.616Z",
  "schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent",
  "eventType": "COMPLETE",
  "job": {
    "namespace": "custom_etl_tool",
    "name": "workspace_name/folder_1/folder_2/job_name_1"
  },
  "inputs": [
    {
      "namespace": "s3://mybigbucket.com",
      "name": "sales/public/orders",
      "facets": {
        "schema": {
          "fields": [
            {
              "name": "my_one_field",
              "type": "integer"
            }
          ]
        }
      }
    }
  ],
  "outputs": [
    {
      "namespace": "mongodb://analytics-db.company.com:27017",
      "name": "customerdb.mycollection.sales_summary",
      "facets": {
        "schema": {
          "fields": [
            {
              "name": "my_one_field",
              "type": "integer"
            }
          ]
        }
      }
    }
  ]
}

The event consists of the following sections:

  • inputs that refer to datasets that are read.
  • job that refer to processes that transform data.
  • outputs that refer to datasets that are written.

Each section contains a namespace that identifies the technology and location, a name that identifies the object path, and optional facets that provide additional metadata like column lineage.

The namespace is used to set a condition in the mapping. It can be static, or dynamic.

  • A static namespace, for example, the job namespace custom_etl_tool represents logical grouping for jobs that are run by a custom ETL tool. It is used to define the workspace or application to track on lineage.
  • A dynamic namespace contains a prefix that represents a technology type and does not change. It also contains dynamic host and port values that represent specific endpoints. For example, in the namespace mongodb://analytics-db.company.com:27017, the prefix mongodb:// does not change, and the host analytics-db.company.com and port 27017 represent one of many endpoints.

Before you create mappings

Before you start creating mappings, review the structure of your event and identify the technologies that are referenced. The events might contain references to technologies that are already supported. Default mappings are created for such technologies. They are listed in the Active mappings tab, and they are identified by the prefix value of a namespace. For example, the s3:// mapping refers to the Amazon S3 technology.

For these technologies that are not supported by default, create your own custom mappings.

Tip: If the event contains references to technologies that are supported by default connectors, run lineage metadata for these data sources first. It ensures that correct source and target assets are discovered. If this step is skipped, deduced placeholder assets and uknown servers might be included in lineage instead.

When a mapping for a specific event is not created yet, a placeholder is created, and all assets that are referenced by a specific namespace are located under this placeholder system. You can create a matching mapping later, and after you reimport the event, the lineage is updated with the information from the new mapping.

Mapping conditions

In the mapping conditions section, you define the type of the namespace and the namespace matching method.

Rule scope

The mapping rule can be based on either the dataset namespace, or the job namespace. If you want to create a mapping based on both the dataset and job namespaces, create two individual mappings.

For more information about OpenLineage naming conventions, see Naming Conventions.

Namespace matching method and namespace value

The mapping can be based either on a namespace prefix, or on the entire namespace value. The matching method impacts how data source definitions are later assigned to imported assets.

Namespace prefix
The namespace contains a prefix that is common for many events. The prefix specifies a technology type, for example s3://, or mongodb://, and this value does not change. The host and port values are dynamic, they distinguish specific endpoints of the same technology type. When you select this method, data source defintionts can be assigned automatically.
In the Namespace prefix value field, specify the prefix value, including the delimiter :// or ., for example s3://, or mongodb://.
Namespace exact value
The namespace is static and it contains the host and port details. It uniquely identifies a specific endpoint. When you select this method, you later select a specific data source definition manually. In the Namespace exact value field, specify the entire namespace value, for example custom_etl_tool, or s3://mybigbucket.com.

Mapping actions

Actions define what happens when the conditions of the rule are met. Configure the following actions:

  • Assign the correct technology type.
  • Define how the name element from an event is interpreted into the asset hierarchy.
  • Associate correct data source definitions so that assets are located under the correct system.

Technology type

You can assign either the default technology from a list, or create a custom technology. Each technology contains information about asset types and hierarchy, as well as branches.

Branch
Branch is required because a technology might contain many asset types on the same asset level. For example, a database might contain tables or views. Branch names must be unique. You can create two mappings for the same event, but based on diferent namespace type and with a unique branch. See the following example:

Mapping rule 1:

  • Rule scope: Dataset namespace
  • Namespace matching method: Namespace exact match with value Custom_tool
  • Technology type:
    • Technology name: Custom_tool
    • Branch: Table
    • Asset hierarchy: Database > Schema > Table

Mapping rule 2:

  • Rule scope: Job namespace
  • Namespace matching method: Namespace exact match with value Custom_tool
  • Technology type:
    • Technology name: Custom_tool
    • Branch: Procedure
    • Asset hierarchy: Database > Schema > Procedure
Asset hierarchy
When you define the asset hierarchy, specify how many asset levels are in your data source, and provide the names for each level. The first level represents the highest parent type in the hierarchy, for example a database or a folder. Optionally, select the asset type that can occur recursively. For example, in the hierarchy Collection > Folder > Folder > File, the Folder asset type is recursive.

Data source definitions

Data source definitions specify the location of the data. In the lineage visualization, they define under which system the data is displayed. Each event refers to three systems, one in each section in the event. Therefore, three data source definitions are required. For the most optimal process, create all data source definitions before you start creating mappings.

For more information, see Data protection with data source definitions.

When you create a mapping, you need to decide how data source definitions are applied to the events. You can choose between automatic and manual assignment.

Assign automatically
Data source definitions are assigned automatically, based on the prefix and dynamic host and port values from the namespace. This option is available when the namespace matching method is based on prefix. When one matching data source definition is found, it is assigned. When more than one matching data source definition is found, or no match is found, a deduced system is created and all assets are associated with this deduced system.

When deduced systems are created, you might need to create alias assignments. For more information, see Configuring alias assignments.

Select manually
Select a particular data source definition from the list so that all assets are located under this system. It is the only option when the mapping is based on the exact namespace value, but you can also use it when your mapping is based on the namespace prefix.

Common mapping configurations

Refer to the following table to review the most common scenarios and see how mappings are configured in each case.

Situation Matching method Data source definition assignment
Known technology with many endpoints, for example s3://… Namespace prefix Assign automatically (preferred)
One specific logical job group, for example custom_etl_tool Namespace exact value Select manually
Database technology for which a default connector is available Namespace prefix Assign automatically
Run metadata import with the default connector before you import the event

In some cases, lineage might not be as accurate as expected. See the following examples:

  • When the technology type in the mapping and in the data source definition do not match, deduced system is created and lineage might be disconnected.
  • When the data source definition is not created for a namespace that is imported with the event, deduced system is created. As a result, configuring aliases might be necessary.
  • When the asset hierarchy that is defined in the mapping does not match the name pattern from the event, assets are incorrectly displayed on lineage, or they are added to placeholder assets.