Storing and persisting metrics

You can use the metrics repository to store metrics in a database.

With the ds-metrics service, you can set up metrics storage at the project level for all your DataStage® flows. While metrics are enabled by default for your flows, you must manually enable persisting metrics at a project level.

Creating a connection

Under the Manage tab of your Cloud Pak for Data project, go to DataStage > Metrics Repository and enable persisting metrics. Specify a connection type, configure properties and security details, and test the connection to verify that it works.

You must use a clean database, or the one that was previously initialized by ds-metrics. To ensure that a database is clean and to clear any previous ds-metrics data, run the following commands.
drop schema if exists ds_metrics cascade;
drop table if exists public.databasechangelog;
drop table if exists public.databasechangeloglock;
To use a database with ds-metrics, the database user must have both the permission to create schemas in the database, and the permission to create tables in the public schema. To check for permissions to create tables and schemas, log in to the database and run the following query as the same user who requires the permissions.
select has_database_privilege(current_database(), 'create'), has_schema_privilege('public', 'create');
if you have create-schema and create-table permissions, the query returns true, true.

Metrics retention

The Metrics retention field controls how frequently, and if at all, ds-metrics removes old data from the database. If you enable the Max limit option, you can choose between the Days or Runs settings. For example, if you set 30 days, metrics deletes all information that is related to the job and to all its runs when a job is over 30 days old. If you set 100 runs, metrics keeps the 100 most recent runs, and deletes the older runs. The Metrics retention field applies only to metrics data related to the project in which the settings are activated. If the database is shared with another project, the retention settings do not impact the data from that project.

After you save the connection settings, it triggers ds-metrics to clear its connection cache. When you run a job in the project, ds-metrics connects to the database for the first time. After a connection is made, it populates the ds-metrics schema after it is initialized, or migrates it if the schema in the database is an older version.

Database structure

In the metrics database, most tables are in the ds-metrics schema. public.databasechangelog and public.databasechangeloglock are extra internal tables in the public schema that are used to track the database schema versions.

Table schemas

For tables, ds_metrics schema is used as the main schema.

job
Stores information about the job when a job run starts.
job_run
Stores information about the job run while a job is running.
job_run_parameter
Stores parameters for a job run when a job run starts.
job_run_log
Stores non-info log lines from a job run when a job run ends.
job_run_stage_metrics
Stores metrics for stages in a job run while a job is running.
job_run_link_metrics
Stores metrics for links in a job run while a job is running.
version
Stores the version of ds-metrics that last initialized the database, unless the database was set up manually.
The following table schemas are for pod metrics. The pod metrics tables are always present, but remain empty.
instance
Stores information about the px-runtime instances.
pod
Stores information about the px-runtime conductor and compute pods.
pod_disk
Stores information about disks mounted to the pods.
pod_metrics
Stores CPU and memory metrics over time.
disk_metrics
Stores disk size metrics over time.