Asset types and properties
An asset is an item that contains metadata about data, other types of information, or operational code. You work with assets throughout the Cloud Pak for Data platform, including the main workspaces: projects, catalogs, and deployment spaces.
To understand assets, you must know the different types of assets, their properties, and where you can find them:
- Workspaces for assets
- Data assets
- Operational assets
- Configuration assets
- Descriptive assets
- Asset properties, metadata, and relationships
Workspaces for assets
You can find any asset in any of the workspaces for which you are a collaborator by searching for it from the global search bar. See Searching for assets across the platform.
What you can do with assets depends on the type of asset and the type of workspace.
Projects Where you collaborate with others to work with data. For example, you can prepare data, analyze data, or create models in projects. You can create all types of assets in projects and you can run operational assets. See Projects.
Catalogs Where you store assets to share with your organization. You can copy assets from catalogs into projects to work with them, or publish assets from projects into the catalog. You can publish all types of data assets and some types of operational assets into a catalog. You can edit asset metadata in a catalog, but you can't run operational assets. See Catalogs.
Deployment spaces Where you deploy models or other assets into production. You copy deployable assets from projects into deployment spaces and then create deployments from those assets. See Deployment spaces.
Data virtualization Where you create virtual tables by combining or segmenting one or more tables. You publish virtual tables as data assets into a catalog. See Virtualizing data.
Data assets
Data assets contain metadata about data, including how to access the data.
How you create a data asset depends on where your data is:
- If your data is in a file, you upload the file from your local system to a project, catalog, or deployment space.
- If your data is in a remote data source, you first create a connection asset that defines the connection to that data source. Then you create a data asset by selecting the connection, the path or other structure, and the table or file that contains the data. This type of data asset is called a connected data asset.
Data asset from a file A data asset from a file points to a file that you uploaded from your local system. The file is stored in the your storage in Cloud Pak for Data. The contents of the file can include structured data, unstructured textual data, images, and other types of data. You can create a data asset with a file of any format. However, you can do more actions on CSV files than other file type.
Connected data asset A connected data asset points to a table, file, or folder that is accessed through a connection to a remote data source. The connection is defined in the connection asset that is associated with the connected data asset. When you access a connected data asset, the data is dynamically retrieved from the data source.
A folder data asset is special case of a connected data asset. It points to a folder in IBM Cloud Object Storage. You create a folder data asset by specifying the path to the folder and the IBM Cloud Object Storage connection asset. You can view the files and subfolders that share the path with the folder data asset. The files that you can view within the folder data asset are not themselves data assets. For example, you can create a folder data asset for a path that contains news feeds that are continuously updated.
Connection asset A connection asset is considered a type of data asset. It contains the information necessary to create a connection to a data source. You can choose to provide shared credentials for all users who have access to the connection asset to use, or you can specify that each user must enter their personal credentials when they use the connection. The projects and catalogs support many connection types to both IBM and third part data sources.
See Adding data to a project, Adding assets to a catalog, and Adding data assets to a deployment space.
Operational assets
Operational assets are how you work with data in projects by using tools that prepare data, analyze data, or build models. You create operational assets with tools in projects. For example, a Jupyter notebook is an operational asset that you can create with the notebook editor tool to analyze data.
Running operational assets
Every time you run an operational asset, it's considered a job. You can monitor and schedule jobs. See Jobs.
Types of operational assets
Many operational assets are provided by the core services. However, some operational assets require other services.
You can create these types of operational assets without additional services:
- Data Refinery flows to refine data with the Data Refinery tool.
- Jupyter notebooks to analyze data or build models. By default, you edit notebooks in the Jupyter notebook editor. However, when you create a project, you can choose to edit notebooks in the JupyterLab IDE instead.
- Python scripts to develop interactive, exploratory analytics scripts with Python in the JupyterLab IDE.
- Metadata imports to import asset metadata into a project or a catalog.
- Metadata enrichments to enrich data assets in a project with results from profiling and data quality analysis and with business terms.
Some operational assets require extra services. If your administrator installed the services, you can add these assets:
- SPSS Modeler flows to automate the flow of data through a model with SPSS algorithms in the SPSS Modeler. Requires the SPSS Modeler service.
- AutoAI experiments to build a model without coding in the AutoAI tool. Requires the Watson Machine Learning service.
- Deep learning experiments to train deep learning models in the Experiment builder. Requires the Watson Machine Learning service and integration with Watson Machine Learning Accelerator.
- Decision Optimization models to solve scenarios in the Decision Optimization model builder. Requires the Decision Optimization and the Watson Machine Learning services.
- R Shiny apps to develop interactive web applications. Requires the RStudio Server with R 3.6 service.
- Dashboards to visualize data without code in the Dashboard editor. Requires the Cognos Dashboards service.
- DataStage flows to transform and integrate data in projects. Requires the DataStage service.
- Data quality rules to perform data quality analysis. The data quality feature must be enabled. Requires the DataStage service.
Configuration assets
Configuration assets are reusable templates in projects to configure other assets or jobs.
With the DataStage service, you can create these types of configuration assets:
- DataStage subflows to collect a set of stages and connectors to reuse in DataStage flows.
- Data definitions to specify the column metadata of a data asset to reuse in DataStage flow jobs.
- Parameter sets to collect multiple job parameters with specified values to reuse in jobs.
If the data quality feature is enabled, you can create data quality definitions to build data quality rules from.
Descriptive assets
Descriptive assets describe the structure of business reports or data models that are managed in external tools and data transformations.
Cobol copybooks
Cobol copybooks describe the data structure of a COBOL program. They cannot be profiled, enriched through metadata enrichment, or used in data refinery.
Business intelligence assets
In business intelligence (BI) reporting, BI tools are used to gather, analyze, and present data. Business intelligence assets are used to organize reports that provide a business view of that data.
If the advanced metadata import feature is installed, you can use metadata import to create these types of business intelligence assets in catalogs:
- Report: Represents the definition of a report, for example, a monthly sales report based on the information in a reporting database.
- Report query: Is a child asset of a report. Queries fetch data from views or tables within a reporting database to render the report.
- Report query items: Is a child asset of a report query and is defined within the report for intermediate processing of data.
Business intelligence assets cannot be profiled, enriched through metadata enrichment, or used in Data Refinery or Data Virtualization.
Transformation scripts
Transformation scripts describe data transformations that change the format, structure, or values of data and that usually are part of the ETL (extract, transform, and load) processes in data integration tools.
If the advanced metadata import feature is installed, you can use metadata import to create such assets in the catalog.
The transformation types currently supported are:
- Procedure
- Trigger
- Function
- Script
Transformation expressions are used in data operations, such as manipulating, converting, and cleansing.
Transformation scripts cannot be previewed or downloaded.
Data model assets
Data models are available in 4.5.2 and later.
A data model visualizes data elements, called entities, and their relationships and describes the attributes that are associated with each entity. Data model assets cannot be profiled, enriched through metadata enrichment, or used in Data Refinery or Data Virtualization.
If the advanced metadata import feature is installed, you can use metadata import to add a copy of a data model that is defined and maintained in a data modeling tool to a catalog.
For a logical data model, the following asset types are created:
- Logical model: A logical representation of data objects that are related to a business domain. The model consist of a set of logical entities and their attributes and relationships that can be organized in groups. A logical model can be implemented by a physical data model or a database schema.
- Logical model attribute: A logical model attribute defines the meaning and purpose of a unit of data.
- Logical model entity: Logical model entities are assets that represent the data structure in the logical data model
Physical data models are available in 4.5.3 and later.
For a physical data model, the following asset types are created:
- Physical model: The physical model defines the physical structures and relationships of data within a subject domain or application.
- Physical model schema: A design schema for data assets that defines the physical structures and relationships of data within a subject domain or application. Each physical model can contain one or more physical model schemas.
- Physical model table: An asset that represents a table structure in the physical model.
- Physical model column: An asset that defines the relevant properties or characteristic of a column in a table in the physical model.
Depending on the size of the imported data model, a large amount of assets might be created in the catalog. To find the root of the model as a starting point, filter the catalog assets on the Logical data model or Physical data model asset type.
Asset properties, metadata, and relationships
All assets have common metadata that is visible everywhere. Other asset properties vary by the type of asset and where the asset is.
Common properties
All assets have common properties that are visible and editable in projects, catalogs, and deployment spaces.
Name Can contain up to 100 characters. Supports multibyte characters. Cannot be empty, contain Unicode control characters, or contain only blank spaces. Asset names do not need to be unique within a project or deployment space. Whether asset names must be unique in a catalog depends on the duplicate handling method set for the catalog.
Description Optional. Can contain up to 245 characters, not including blank spaces. Supports multibyte characters and hyperlinks.
Automatically generated or detected metadata can include other information, depending on the asset type, such as, date added, size, created by, last editor, last modified, scheduled, shared, language, model type, and status.
Some asset types can have tags, which are ungoverned metadata that makes searching for the asset easier. Tags can contain only blank spaces, letters, multibyte characters, numbers, underscores, dashes, and the symbols # and @. Project, catalog, or deployment space collaborators with the admin or editor role can create tags and add them to assets.
More information for assets
Assets can have more properties, relationships, and metadata.
Asset privacy Set to public by default. This setting can restrict access to an asset in a catalog when it's set to private. Only the owner and members of the asset can view and use private assets.
Asset owner and asset members By default, the asset owner is the user who added the asset to the catalog. The asset members can view and use the asset when it's marked private.
Governance artifacts Can be assigned automatically, by the asset owner, or by data stewards. Governance artifacts can add metadata and relationships to assets, or mask sensitive data within data assets. In general, this information is available in catalogs. For some asset types, this information is also available in projects.
Custom attributes Optional. Custom attributes are shown in the Details section on the asset's Overview tab in the catalog. You can create custom attributes for assets with APIs. Some asset types also have predefined custom attributes.
Asset preview A preview of an asset. The content that you see in a preview depends on the type of asset, file, or data.
Reviews and ratings All catalog collaborators can rate and review assets.
Technical data lineage If the advanced metadata import feature is enabled, certain types of assets added to the catalog through metadata import can have a link that points to additional lineage information in MANTA Automated Data Lineage.
More information for data assets
Depending on the format of the data in data assets, you view more information when you open the asset.
The path to the data The information necessary to access the data. A connected data asset for a table in a database has a reference to the connection asset for the database, the schema or other path information, and the table name. A data asset for an upload file has a reference to the file location in the object storage container for the project, catalog, or deployment space.
File format The MIME type of a file. Automatically detected.
Lineage If the knowledge graph feature is enabled, business data lineage information is available.
Data profile A profile of the data, for data from relational data sources as well as for CSV, TSV, Avro, Parquet, and Microsoft Excel sheets (only the first sheet in a workbook).
Activities The history of activities performed on the asset in projects and catalogs.
Learn more
Parent topic: Getting started with Cloud Pak for Data