Lineage for unstructured data
Visualize the journey of your unstructured data, understand where it came from, what processes were applied to transform it and gain insights into its quality.
View data lineage for unstructured data to achieve the following goals:
- Establish data trustworthiness by tracing the origin and transformation of input data.
- Support regulatory compliance by detecting and tracing personal or sensitive data and ensuring proper data handling.
- Facilitate data auditing and governance by providing a transparent trail of data movement and processing for internal reviews or external audits.
Asset types
Lineage for unstructured data contains the following types of assets:
- Connection assets
- Document sets
- Document libraries
- Flows with operators details
- Operators
For more information about asset types and data transformation, see Working with unstructured data.
Supported data sources
For information about supported data sources for lineage for unstructured data, see Supported connectors for unstructured data curation.
Adding unstructured data to the lineage repository
To view data lineage for unstructured data, you must import lineage metadata to projects by creating the unstructured data curation flow. For more information, see Setting up curation flows for unstructured data.
Viewing lineage for unstructured data
You can access lineage when you view details for assets, for example document sets or document libraries. Additionally, you can go to Data > Data lineage and view lineage for the selected assets.
Disabling lineage for unstructured data
Lineage for unstructured data is enabled by default. You can disable it to control when lineage is generated. Go to Administration > Configuration and settings > Data lineage settings > Set up lineage > Enable lineage and set Lineage for unstructured data to Off. When you disable lineage, you can still access the existing graphs for the assets for which the lineage was generated earlier.