Asset types and properties

An asset is an item that contains metadata about data, other types of information, or operational code. You work with assets throughout the Cloud Pak for Data platform, including the main workspaces: projects, catalogs, and deployment spaces.

To understand assets, you must know the different types of assets, their properties, and where you can find them:

Workspaces for assets

You can find any asset in any of the workspaces for which you are a collaborator by searching for it from the global search bar. See Searching for assets across the platform.

What you can do with assets depends on the type of asset and the type of workspace.

Projects Where you collaborate with others to work with data. For example, you can prepare data, analyze data, or create models in projects. You can create all types of assets in projects and you can run operational assets. See Projects.

Catalogs Where you store assets to share with your organization. You can copy assets from catalogs into projects to work with them, or publish assets from projects into the catalog. You can publish all types of data assets and some types of operational assets into a catalog. You can edit asset metadata in a catalog, but you can't run operational assets. See Catalogs.

Deployment spaces Where you deploy models or other assets into production. You copy deployable assets from projects into deployment spaces and then create deployments from those assets. See Deployment spaces.

Data virtualization Where you create virtual tables by combining or segmenting one or more tables. You publish virtual tables as data assets into a catalog. See Virtualizing data.

Data assets

Data assets contain metadata about data, including how to access the data.

How you create a data asset depends on where your data is:

Data asset types

Data asset from a file A data asset from a file points to a file that you uploaded from your local system. The file is stored in the your storage in Cloud Pak for Data. The contents of the file can include structured data, unstructured textual data, images, and other types of data. You can create a data asset with a file of any format. However, you can do more actions on CSV files than other file type.

Connected data asset A connected data asset points to a table, file, or folder that is accessed through a connection to a remote data source. The connection is defined in the connection asset that is associated with the connected data asset. When you access a connected data asset, the data is dynamically retrieved from the data source.

A folder data asset is special case of a connected data asset. It points to a folder in IBM Cloud Object Storage. You create a folder data asset by specifying the path to the folder and the IBM Cloud Object Storage connection asset. You can view the files and subfolders that share the path with the folder data asset. The files that you can view within the folder data asset are not themselves data assets. For example, you can create a folder data asset for a path that contains news feeds that are continuously updated.

Connection asset A connection asset is considered a type of data asset. It contains the information necessary to create a connection to a data source. You can choose to provide shared credentials for all users who have access to the connection asset to use, or you can specify that each user must enter their personal credentials when they use the connection. The projects and catalogs support many connection types to both IBM and third part data sources.

See Adding data to a project, Adding assets to a catalog, and Adding data assets to a deployment space.

Operational assets

Operational assets are how you work with data in projects by using tools that prepare data, analyze data, or build models. You create operational assets with tools in projects. For example, a Jupyter notebook is an operational asset that you can create with the notebook editor tool to analyze data.

Running operational assets

Every time you run an operational asset, it's considered a job. You can monitor and schedule jobs. See Jobs.

Types of operational assets

Many operational assets are provided by the core services. However, some operational assets require other services.

You can create these types of operational assets without additional services:

Some operational assets require extra services. If your administrator installed the services, you can add these assets:

Configuration assets

Configuration assets are reusable templates in projects to configure other assets or jobs.

With the DataStage service, you can create these types of configuration assets:

If the data quality feature is enabled, you can create data quality definitions to build data quality rules from.

Descriptive assets

Descriptive assets describe the structure of business reports or data models that are managed in external tools and data transformations.

Cobol copybooks

Cobol copybooks describe the data structure of a COBOL program. They cannot be profiled, enriched through metadata enrichment, or used in data refinery.

Business intelligence assets

In business intelligence (BI) reporting, BI tools are used to gather, analyze, and present data. Business intelligence assets are used to organize reports that provide a business view of that data.

If the advanced metadata import feature is installed, you can use metadata import to create these types of business intelligence assets in catalogs:

Business intelligence assets cannot be profiled, enriched through metadata enrichment, or used in Data Refinery or Data Virtualization.

Transformation scripts

Transformation scripts describe data transformations that change the format, structure, or values of data and that usually are part of the ETL (extract, transform, and load) processes in data integration tools.

If the advanced metadata import feature is installed, you can use metadata import to create such assets in the catalog.

The transformation types currently supported are:

Transformation expressions are used in data operations, such as manipulating, converting, and cleansing.

Transformation scripts cannot be previewed or downloaded.

Data model assets

Data models are available in 4.5.2 and later.

A data model visualizes data elements, called entities, and their relationships and describes the attributes that are associated with each entity. Data model assets cannot be profiled, enriched through metadata enrichment, or used in Data Refinery or Data Virtualization.

If the advanced metadata import feature is installed, you can use metadata import to add a copy of a data model that is defined and maintained in a data modeling tool to a catalog.

For a logical data model, the following asset types are created:

Physical data models are available in 4.5.3 and later.

For a physical data model, the following asset types are created:

Depending on the size of the imported data model, a large amount of assets might be created in the catalog. To find the root of the model as a starting point, filter the catalog assets on the Logical data model or Physical data model asset type.

Asset properties, metadata, and relationships

All assets have common metadata that is visible everywhere. Other asset properties vary by the type of asset and where the asset is.

Common properties

All assets have common properties that are visible and editable in projects, catalogs, and deployment spaces.

Name Can contain up to 100 characters. Supports multibyte characters. Cannot be empty, contain Unicode control characters, or contain only blank spaces. Asset names do not need to be unique within a project or deployment space. Whether asset names must be unique in a catalog depends on the duplicate handling method set for the catalog.

Description Optional. Can contain up to 245 characters, not including blank spaces. Supports multibyte characters and hyperlinks.

Automatically generated or detected metadata can include other information, depending on the asset type, such as, date added, size, created by, last editor, last modified, scheduled, shared, language, model type, and status.

Some asset types can have tags, which are ungoverned metadata that makes searching for the asset easier. Tags can contain only blank spaces, letters, multibyte characters, numbers, underscores, dashes, and the symbols # and @. Project, catalog, or deployment space collaborators with the admin or editor role can create tags and add them to assets.

More information for assets

Assets can have more properties, relationships, and metadata.

Asset privacy Set to public by default. This setting can restrict access to an asset in a catalog when it's set to private. Only the owner and members of the asset can view and use private assets.

Asset owner and asset members By default, the asset owner is the user who added the asset to the catalog. The asset members can view and use the asset when it's marked private.

Governance artifacts Can be assigned automatically, by the asset owner, or by data stewards. Governance artifacts can add metadata and relationships to assets, or mask sensitive data within data assets. In general, this information is available in catalogs. For some asset types, this information is also available in projects.

Custom attributes Optional. Custom attributes are shown in the Details section on the asset's Overview tab in the catalog. You can create custom attributes for assets with APIs. Some asset types also have predefined custom attributes.

Asset preview A preview of an asset. The content that you see in a preview depends on the type of asset, file, or data.

Reviews and ratings All catalog collaborators can rate and review assets.

Technical data lineage If the advanced metadata import feature is enabled, certain types of assets added to the catalog through metadata import can have a link that points to additional lineage information in MANTA Automated Data Lineage.

More information for data assets

Depending on the format of the data in data assets, you view more information when you open the asset.

The path to the data The information necessary to access the data. A connected data asset for a table in a database has a reference to the connection asset for the database, the schema or other path information, and the table name. A data asset for an upload file has a reference to the file location in the object storage container for the project, catalog, or deployment space.

File format The MIME type of a file. Automatically detected.

Lineage If the knowledge graph feature is enabled, business data lineage information is available.

Data profile A profile of the data, for data from relational data sources as well as for CSV, TSV, Avro, Parquet, and Microsoft Excel sheets (only the first sheet in a workbook).

Activities The history of activities performed on the asset in projects and catalogs.

Learn more

Parent topic: Getting started with Cloud Pak for Data