Working in projects and data assets

To discover and analyze meaningful sections of your data, organize it into projects and data assets.

Required roles
  • To access projects and data assets, you must have the following roles:
    • A role to access Information Governance Catalog (Information Governance Catalog User role or higher)
    • One of Information Analyzer roles: Data Administrator, Project Administrator, or User to access the Quality tab
    • A Business Analyst or Data Operator role on the project level to access the project
  • To create a project, you must have the Project Administrator role.
  • To add data assets to a project, or remove them, you must have the Data Administrator role.
  • To mark a data asset as reviewed, you must be assigned to a project as Business Analyst.
  • To create, edit and remove an SQL virtual table and a virtual column, you must have the Data Administrator role. Additionally, you must be assigned to a project as Business Analyst or Data Operator.

Projects

Use projects to analyze a meaningful set of data assets. The project overview contains information about the overall quality of the data in the project, if the data assets were analyzed. The metrics contain information like data quality threshold, quality score distribution, the number of analyzed data assets, identified relationships, and other. To make sure you view the latest data, refresh your web browser.

All projects are available in the Quality tab. The following projects are available by default:
  • UGDefaultWorkspace - This is the default project with the default settings.
  • PIIWorkspace - This project is optimized to search for personally identifiable information (PII). The analysis runs on a sample and skips non-PII data classes.
  • DataLakeWorkspace - This project is optimized for quick analysis and data quality assessment of a large number of assets. The analysis runs on a sample.
  • InDepthAnalysisWorkspace - This project is optimized to run an in-depth analysis of a small number of assets. The analysis runs on all data.

The project dashboard contains basic information about the project, such as the number of data assets, who created it and when, and other. It also contains information about two settings: whether the sampling is used, and whether the project is governed. A project is governed when you enable automation rules to modify your data. For more information about project settings, see the Project settings topic.

Creating projects
To create a project, complete the following steps:
  1. Open the Quality tab.
  2. Click New project.
  3. Provide the name and optionally the description.
Deleting projects
To delete a project, complete the following steps:
  1. Open the Quality tab.
  2. Find the project that you want to delete.
  3. From the menu, select Delete.

Data assets

Data assets are used to analyze your data. Only the data assets that are added to a project can be analyzed. You can run data quality analysis, column analysis and primary key analysis. The details of a data asset contain analysis results. You can publish the analysis results to the catalog from the data asset details page so that other users can see them.

The following data sources are supported:
  • Amazon S3
  • Db2®
  • Greenplum
  • HDFS (correctly formatted delimited files, ORC, Parquet, and Avro formats)
  • Hive (Avro, ORC, and Parquet file formats)
  • IBM® Data Virtualization Manager
  • Local flat files (on the engine tier)
  • MongoDB
  • Netezza®
  • Oracle
  • PostgreSQL
  • Snowflake
  • SQL Server
  • Sybase
  • Teradata
Adding data assets to projects
You can add data assets to a project in the following ways:
  • When you discover assets. For details, see the Running automated discovery topic.
  • From the Data assets tab in a project. Click Add data asset, find data assets that you want to add, and from the menu select Add to projects. You can switch to a hierarchy view to search for assets that contain data assets, for example schemas. Click such asset to display all data assets that it contains.
You can also create SQL virtual tables and use them as any other data asset to organize and analyze your data. For more information, see the Creating SQL virtual tables topic.
Reviewing data assets
Data assets that were analyzed contain useful insights into the quality of the data. After you review the analysis results, you can mark the data asset as reviewed.
  • Open the project and all data assets. Switch from tile view to list view. Find the data asset that you want to mark as reviewed, and set the switch in the Reviewed column to on.
  • Open the details page of the data asset, and click Edit to open the edit mode. Select the Reviewed check box. The changes are saved automatically. To exit the edit mode, click Done.