Designing mappings for OpenLineage events
When you design mappings for OpenLineage events, decide on the conditions and actions of each rule.
OpenLineage event structure
An OpenLineage event might have the following structure:
{
"eventTime": "2025-08-21T10:03:33.616Z",
"schemaURL": "https://openlineage.io/spec/2-0-2/OpenLineage.json#/$defs/RunEvent",
"eventType": "COMPLETE",
"job": {
"namespace": "custom_etl_tool",
"name": "workspace_name/folder_1/folder_2/job_name_1"
},
"inputs": [
{
"namespace": "s3://mybigbucket.com",
"name": "sales/public/orders",
"facets": {
"schema": {
"fields": [
{
"name": "my_one_field",
"type": "integer"
}
]
}
}
}
],
"outputs": [
{
"namespace": "mongodb://analytics-db.company.com:27017",
"name": "customerdb.mycollection.sales_summary",
"facets": {
"schema": {
"fields": [
{
"name": "my_one_field",
"type": "integer"
}
]
}
}
}
]
}
The event consists of the following sections:
inputsthat refer to datasets that are read.jobthat refer to processes that transform data.outputsthat refer to datasets that are written.
Each section contains a namespace that identifies the technology and location, a name that identifies the object path, and optional facets that provide additional metadata like column lineage.
The namespace is used to set a condition in the mapping. It can be static, or dynamic.
- A static namespace, for example, the job namespace
custom_etl_toolrepresents logical grouping for jobs that are run by a custom ETL tool. It is used to define the workspace or application to track on lineage. - A dynamic namespace contains a prefix that represents a technology type and does not change. It also contains dynamic host and port values that represent specific endpoints. For example, in the namespace
mongodb://analytics-db.company.com:27017, the prefixmongodb://does not change, and the hostanalytics-db.company.comand port27017represent one of many endpoints.
Before you create mappings
Before you start creating mappings, review the structure of your event and identify the technologies that are referenced. The events might contain references to technologies that are already supported. Default mappings are created for such technologies.
They are listed in the Active mappings tab, and they are identified by the prefix value of a namespace. For example, the s3:// mapping refers to the Amazon S3 technology.
For these technologies that are not supported by default, create your own custom mappings.
When a mapping for a specific event is not created yet, a placeholder is created, and all assets that are referenced by a specific namespace are located under this placeholder system. You can create a matching mapping later, and after you reimport the event, the lineage is updated with the information from the new mapping.
Mapping conditions
In the mapping conditions section, you define the type of the namespace and the namespace matching method.
Rule scope
The mapping rule can be based on either the dataset namespace, or the job namespace. If you want to create a mapping based on both the dataset and job namespaces, create two individual mappings.
For more information about OpenLineage naming conventions, see Naming Conventions.
Namespace matching method and namespace value
The mapping can be based either on a namespace prefix, or on the entire namespace value. The matching method impacts how data source definitions are later assigned to imported assets.
- Namespace prefix
- The namespace contains a prefix that is common for many events. The prefix specifies a technology type, for example
s3://, ormongodb://, and this value does not change. The host and port values are dynamic, they distinguish specific endpoints of the same technology type. When you select this method, data source defintionts can be assigned automatically.
In the Namespace prefix value field, specify the prefix value, including the delimiter://or., for examples3://, ormongodb://. - Namespace exact value
- The namespace is static and it contains the host and port details. It uniquely identifies a specific endpoint. When you select this method, you later select a specific data source definition manually. In the Namespace exact value field, specify the entire namespace value, for example
custom_etl_tool, ors3://mybigbucket.com.
Mapping actions
Actions define what happens when the conditions of the rule are met. Configure the following actions:
- Assign the correct technology type.
- Define how the name element from an event is interpreted into the asset hierarchy.
- Associate correct data source definitions so that assets are located under the correct system.
Technology type
You can assign either the default technology from a list, or create a custom technology. Each technology contains information about asset types and hierarchy, as well as branches.
- Branch
- Branch is required because a technology might contain many asset types on the same asset level. For example, a database might contain tables or views. Branch names must be unique. You can create two mappings for the same event, but based on diferent namespace type and with a unique branch. See the following example:
Mapping rule 1:
- Rule scope: Dataset namespace
- Namespace matching method: Namespace exact match with value Custom_tool
- Technology type:
- Technology name: Custom_tool
- Branch: Table
- Asset hierarchy: Database > Schema > Table
Mapping rule 2:
- Rule scope: Job namespace
- Namespace matching method: Namespace exact match with value Custom_tool
- Technology type:
- Technology name: Custom_tool
- Branch: Procedure
- Asset hierarchy: Database > Schema > Procedure
- Asset hierarchy
- When you define the asset hierarchy, specify how many asset levels are in your data source, and provide the names for each level. The first level represents the highest parent type in the hierarchy, for example a database or a folder. Optionally,
select the asset type that can occur recursively. For example, in the hierarchy
Collection > Folder > Folder > File, theFolderasset type is recursive.
Data source definitions
Data source definitions specify the location of the data. In the lineage visualization, they define under which system the data is displayed. Each event refers to three systems, one in each section in the event. Therefore, three data source definitions are required. For the most optimal process, create all data source definitions before you start creating mappings.
For more information, see Data protection with data source definitions.
When you create a mapping, you need to decide how data source definitions are applied to the events. You can choose between automatic and manual assignment.
Assign automatically
Data source definitions are assigned automatically, based on the prefix and dynamic host and port values from the namespace. This option is available when the namespace matching method is based on
prefix. When one matching data source definition is found, it is assigned. When more than one matching data source definition is found, or no match is found, a deduced system is created and all assets are associated with this deduced system.
When deduced systems are created, you might need to create alias assignments. For more information, see Configuring alias assignments.
Select manually
Select a particular data source definition from the list so that all assets are located under this system. It is the only option when the mapping is based on the exact namespace value, but you can also
use it when your mapping is based on the namespace prefix.
Common mapping configurations
Refer to the following table to review the most common scenarios and see how mappings are configured in each case.
| Situation | Matching method | Data source definition assignment |
|---|---|---|
Known technology with many endpoints, for example s3://… |
Namespace prefix | Assign automatically (preferred) |
One specific logical job group, for example custom_etl_tool |
Namespace exact value | Select manually |
| Database technology for which a default connector is available | Namespace prefix | Assign automatically Run metadata import with the default connector before you import the event |
In some cases, lineage might not be as accurate as expected. See the following examples:
- When the technology type in the mapping and in the data source definition do not match, deduced system is created and lineage might be disconnected.
- When the data source definition is not created for a namespace that is imported with the event, deduced system is created. As a result, configuring aliases might be necessary.
- When the asset hierarchy that is defined in the mapping does not match the name pattern from the event, assets are incorrectly displayed on lineage, or they are added to placeholder assets.